WorldWideScience

Sample records for association rule mining

  1. Controlling False Positives in Association Rule Mining

    CERN Document Server

    Liu, Guimei; Wong, Limsoon

    2011-01-01

    Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often suffers from a high risk of false positive errors. There is a lack of comprehensive study on controlling false positives in association rule mining. In this paper, we adopt three multiple testing correction approaches---the direct adjustment approach, the permutation-based approach and the holdout approach---to control false positives in association rule mining, and conduct extensive experiments to study their performance. Our results show that (1) Numerous spurious rules are generated if no correction is made. (2) The three approaches can control false positives effectively. Among the three approaches, the permutation...

  2. Association Rule Mining and Its Application

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Several algorithms in data mining technique have been studied recently, among which association is one of the most important techniques. In this paper, we introduce theory of association rule in data mining, and analyze the characteristics of postal EMS service. We create a data warehouse model for EMS services and give the procedure of applying association rule mining based on it. In the end, we give an example of the whole mining procedure. This EMS-Data warehouse model and association rule mining technique have been applied in a practical Postal CRM System.

  3. A Collaborative Educational Association Rule Mining Tool

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; de Castro, Carlos

    2011-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the ongoing improvement of e-learning courses and allowing teachers with similar course profiles to share and score the discovered information. The mining tool is oriented to be used by non-expert instructors in data mining so its internal…

  4. Efficient Mining of Intertransaction Association Rules

    NARCIS (Netherlands)

    Tung, A.K.H.; Lu, H.J.; Han, J.W.; Feng, L.

    2003-01-01

    Most of the previous studies on mining association rules are on mining intratransaction associations, i.e., the associations among items within the same transaction where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this s

  5. MINING ASSOCIATION RULES FROM XML DOCUMENT

    OpenAIRE

    Neha M. Shroff; G. V. Gujar

    2014-01-01

    In this work we describe an approach to mine Tree-based association rules from XML documents. Such rules provide information on both the structure and the content of XML documents; moreover, they can be stored in XML format to be queried later on. The mined knowledge is approximate, intensional knowledge used to provide: (i) quick, approximate answers to queries and (ii) information about structural regularities that can be used as dataguides for document querying. A prototype of the proposed...

  6. Mining Hesitation Information by Vague Association Rules

    Science.gov (United States)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  7. Performance Analysis of Genetic Algorithm for Mining Association Rules

    OpenAIRE

    Indira, K.; Kanmani, S.

    2012-01-01

    Association rule (AR) mining is a data mining task that attempts to discover interesting patterns or relationships between data in large databases. Genetic algorithm (GA) based on evolution principles has found its strong base in mining ARs. This paper analyzes the performance of GA in Mining ARs effectively based on the variations and modification in GA parameters. The recent works in the past seven years for mining association rules using genetic algorithm is considered for the analysis. Ge...

  8. Efficient Mining of Association Rules in Oscillatory-based Data

    OpenAIRE

    Mohammad Saniee Abadeh & Mojtaba Ala

    2011-01-01

    Association rules are one of the most researched areas of data mining. Finding frequent patternsis an important step in association rules mining which is very time consuming and costly. In thispaper, an effective method for mining association rules in the data with the oscillatory value (up,down) is presented, such as the stock price variation in stock exchange, which, just a fewnumbers of the counts of itemsets are searched from the database, and the counts of the rest ofitemsets are compute...

  9. Research on spatial association rules mining in two-direction

    Institute of Scientific and Technical Information of China (English)

    XUE Li-xia; WANG Zuo-cheng

    2007-01-01

    In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.

  10. Efficient mining of association rules based on gravitational search algorithm

    Directory of Open Access Journals (Sweden)

    Fariba Khademolghorani

    2011-07-01

    Full Text Available Association rules mining are one of the most used tools to discover relationships among attributes in a database. A lot of algorithms have been introduced for discovering these rules. These algorithms have to mine association rules in two stages separately. Most of them mine occurrence rules which are easily predictable by the users. Therefore, this paper discusses the application of gravitational search algorithm for discovering interesting association rules. This evolutionary algorithm is based on the Newtonian gravity and the laws of motion. Furthermore, contrary to the previous methods, the proposed method in this study is able to mine the best association rules without generating frequent itemsets and is independent of the minimum support and confidence values. The results of applying this method in comparison with the method of mining association rules based upon the particle swarm optimization show that our method is successful.

  11. Sampling based Association Rules Mining- A Recent Overview

    Directory of Open Access Journals (Sweden)

    V.Umarani,

    2010-03-01

    Full Text Available Association rule discovery from large databases is one of the tedious tasks in datamining.The process of frequent itemset mining, the first step in the mining of association rules, is a computational and IO intensive process necessitating repeated passes over the entiredatabase. Sampling has been often suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. Data mining literature presents with numerous sampling based approaches to speed up the process of Association Rule Mining(ARM.Sampling is one of theimportant and popular data reduction technique that is used to mine huge volume of data efficiently. Sampling can speed up the mining of associationrules. In this paper, we provide an overview of existing sampling based association rule mining algorithms.

  12. Mining association rule efficiently based on data warehouse

    Institute of Scientific and Technical Information of China (English)

    陈晓红; 赖邦传; 罗铤

    2003-01-01

    The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed.By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.

  13. Mining association rule bases from integrated genomic data and annotations

    OpenAIRE

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2008-01-01

    International audience During the last decade, several clustering and association rule mining techniques have been applied to identify groups of co-regulated genes in gene expression data. Nowadays, integrating biological knowledge and gene expression data into a single framework has become a major challenge to improve the relevance of mined patterns and simplify their interpretation by the biologists. The GenMiner approach was developed for mining association rules showing gene groups tha...

  14. Compact Weighted Class Association Rule Mining using Information Gain

    CERN Document Server

    Ibrahim, S P Syed

    2011-01-01

    Weighted association rule mining reflects semantic significance of item by considering its weight. Classification constructs the classifier and predicts the new data instance. This paper proposes compact weighted class association rule mining method, which applies weighted association rule mining in the classification and constructs an efficient weighted associative classifier. This proposed associative classification algorithm chooses one non class informative attribute from dataset and all the weighted class association rules are generated based on that attribute. The weight of the item is considered as one of the parameter in generating the weighted class association rules. This proposed algorithm calculates the weight using the HITS model. Experimental results show that the proposed system generates less number of high quality rules which improves the classification accuracy.

  15. Efficient Mining of Association Rules in Oscillatory-based Data

    OpenAIRE

    Mohammad Saniee Abadeh; Mojtaba Ala

    2011-01-01

    Association rules are one of the most researched areas of data mining. Finding frequent patterns is an important step in association rules mining which is very time consuming and costly. In this paper, an effective method for mining association rules in the data with the oscillatory value (up, down) is presented, such as the stock price variation in stock exchange, which, just a few numbers of the counts of itemsets are searched from the database, and the counts of the rest of itemsets are co...

  16. An Optimized Weighted Association Rule Mining On Dynamic Content

    CERN Document Server

    Velvadivu, P

    2010-01-01

    Association rule mining aims to explore large transaction databases for association rules. Classical Association Rule Mining (ARM) model assumes that all items have the same significance without taking their weight into account. It also ignores the difference between the transactions and importance of each and every itemsets. But, the Weighted Association Rule Mining (WARM) does not work on databases with only binary attributes. It makes use of the importance of each itemset and transaction. WARM requires each item to be given weight to reflect their importance to the user. The weights may correspond to special promotions on some products, or the profitability of different items. This research work first focused on a weight assignment based on a directed graph where nodes denote items and links represent association rules. A generalized version of HITS is applied to the graph to rank the items, where all nodes and links are allowed to have weights. This research then uses enhanced HITS algorithm by developing...

  17. Association rule mining as a support for OLAP

    OpenAIRE

    Chudán, David

    2010-01-01

    The aim of this work is to identify the possibilities of the complementary usage of two analytical methods of data analysis, OLAP analysis and data mining represented by GUHA association rule mining. The usage of these two methods in the context of proposed scenarios on one dataset presumes a synergistic effect, surpassing the knowledge acquired by these two methods independently. This is the main contribution of the work. Another contribution is the original use of GUHA association rules whe...

  18. An Ontological Approach for Mining Association Rules from Transactional Dataset

    Directory of Open Access Journals (Sweden)

    Sivanthiya.T

    2015-01-01

    Full Text Available Infrequent item sets are mined in order to reduce the cost function and to make the sale of a rare data correlated item set. In the past research, algorithms like Infrequent Weighted Item Set Miner and Minimal Infrequent Weighted Item Set Miner were used. Since, mining of infrequent item set is done by satisfying support count less than or equal to the maximum support count many number of rules were generated and the mined result do not guarantee that only interesting rules were extracted, as the interestingness is strongly depends on the user knowledge and goals. Hence, an Ontology Relational Weights Measure using Weighted Association Rule Mining approach is introduced to integrate user’s knowledge, minimize number of rules and mine the interesting infrequent item sets.

  19. Mining multilevel spatial association rules with cloud models

    Institute of Scientific and Technical Information of China (English)

    YANG Bin; ZHU Zhong-ying

    2005-01-01

    The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules.Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.

  20. A Fast Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    黄刘生; 陈华平; 王洵; 陈国良

    2000-01-01

    In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm,BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.

  1. A Survey of Association Rule Mining Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Anubha Sharma

    2012-08-01

    Full Text Available Data mining is the analysis step of the "Knowledge Discovery in Databases" process, or KDD. It is the process that results in the discovery of new patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract knowledge from an existing data set and transform it into a human-understandable structure. In data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Genetic algorithm (GA is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms, which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. In previous, many researchers have proposed Genetic Algorithms for mining interesting association rules from quantitative data. In this paper we represent a survey of Association Rule Mining Using Genetic Algorithm. The techniques are categorized based upon different approaches. This paper provides the major advancement in the approaches for association rule mining using genetic algorithms.

  2. Association Rule Mining from an Intelligent Tutor

    Science.gov (United States)

    Dogan, Buket; Camurcu, A. Yilmaz

    2008-01-01

    Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…

  3. An Optimized Weighted Association Rule Mining On Dynamic Content

    Directory of Open Access Journals (Sweden)

    P. Velvadivu

    2010-03-01

    Full Text Available Association rule mining aims to explore large transaction databases for association rules. Classical Association Rule Mining (ARM model assumes that all items have the same significance without taking their weight into account. It also ignores the difference between the transactions and importance of each and every itemsets. But, the Weighted Association Rule Mining (WARM does not work on databases with only binary attributes. It makes use of the importance of each itemset and transaction. WARM requires each item to be given weight to reflect their importance to the user. The weights may correspond to special promotions on some products, or the profitability of different items. This research work first focused on a weight assignment based on a directed graph where nodes denote items and links represent association rules. A generalized version of HITS is applied to the graph to rank the items, where all nodes and links are allowed to have weights. This research then uses enhanced HITS algorithm by developing an online eigenvector calculation method that can compute the results of mutual reinforcement voting in case of frequent updates. For Example in Share Market Shares price may go down or up. So we need to carefully watch the market and our association rule mining has to produce the items that have undergone frequent changes. These are done by estimating the upper bound of perturbation and postponing of the updates whenever possible. Next we prove that enhanced algorithm is more efficient than the original HITS under the context of dynamic data.

  4. Mining Association Rules in Students Assessment Data

    Directory of Open Access Journals (Sweden)

    Anupama Chadha

    2012-09-01

    Full Text Available Higher education, throughout the world is delivered through universities, colleges affiliated to various universities and some other recognized academic institutes. Today one of the biggest challenges, the educational institutions face, is the explosive growth of educational data and to use this data to improve the quality of managerial decisions to deliver quality education. In this paper we will perform a case study of a university that hopes to improve the quality of education by analyzing the data and discover the factors that affect the academic results so as to increase success chances of students. In this perspective we use association rules discovery techniques. Also we will show the importance of data preprocessing in data analysis which has a significant impact on the accuracy of the predicted results.

  5. Detection of Attacks on MAODV Association Rule Mining Optimization

    Directory of Open Access Journals (Sweden)

    A. Fidalcastro

    2015-02-01

    Full Text Available Current mining algorithms can generate large number of rules and very slow to generate rules or generate few results, omitting interesting and valuable information. To address this problem, we propose an algorithm Optimized Featured Top Association Rules (OFTAR algorithm, where every attack have many features and some of the features are more important. The Features are selected by genetic algorithm and processed by the OFTAR algorithm to find the optimized rules. The algorithm utilizes Genetic Algorithm feature selection approach to find optimized features. OFTAR incorporate association rules with several rule optimization techniques and expansion techniques to improve efficiency. Increasing popularity of Mobile ad hoc network users of wireless networks lead to threats and attacks on MANET, due to its features. The main challenge in designing a MANET is protecting from various attacks in the network. Intrusion Detection System is required to monitor the network and to detect the malicious node in the network in multi casting mobility environment. The node features are processed in Association Analysis to generate rules, the generated rules are applied to nodes to detect the attacks. Experimental results show that the algorithm has higher scalability and good performance that is an advantageous to several association rule mining algorithms when the rule generation is controlled and optimized to detect the attacks.

  6. Optimizing Mining Association Rules for Artificial Immune System based Classification

    Directory of Open Access Journals (Sweden)

    SAMEER DIXIT

    2011-08-01

    Full Text Available The primary function of a biological immune system is to protect the body from foreign molecules known as antigens. It has great pattern recognition capability that may be used to distinguish between foreigncells entering the body (non-self or antigen and the body cells (self. Immune systems have many characteristics such as uniqueness, autonomous, recognition of foreigners, distributed detection, and noise tolerance . Inspired by biological immune systems, Artificial Immune Systems have emerged during the last decade. They are incited by many researchers to design and build immune-based models for a variety of application domains. Artificial immune systems can be defined as a computational paradigm that is inspired by theoretical immunology, observed immune functions, principles and mechanisms. Association rule mining is one of the most important and well researched techniques of data mining. The goal of association rules is to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in thetransaction databases or other data repositories. Association rules are widely used in various areas such as inventory control, telecommunication networks, intelligent decision making, market analysis and risk management etc. Apriori is the most widely used algorithm for mining the association rules. Other popular association rule mining algorithms are frequent pattern (FP growth, Eclat, dynamic itemset counting (DIC etc. Associative classification uses association rule mining in the rule discovery process to predict the class labels of the data. This technique has shown great promise over many other classification techniques. Associative classification also integrates the process of rule discovery and classification to build the classifier for the purpose of prediction. The main problem with the associative classification approach is the discovery of highquality association rules in a very large space of

  7. Parallel mining and application of fuzzy association rules

    Institute of Scientific and Technical Information of China (English)

    LU Jian-jiang; XU Bao-wen; ZOU Xiao-feng; KANG Da-zhou; LI Yan-hui; ZHOU Jin

    2006-01-01

    Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm.Fuzzy c-means algorithm can embody the actual distribution of the data,and fuzzy sets can soften the partition boundary.Then,we improve the search technology of apriori algorithm and present the algorithm for mining fuzzy association rules.As the database size becomes larger and larger,a better way is to mine fuzzy association rules in parallel.In the parallel mining algorithm,quantitative attributes are partitioned into several fuzzy sets by using parallel fuzzy c-means algorithm.Boolean parallel algorithm is improved to discover frequent fuzzy attribute set,and the fuzzy association rules with at least a minimum confidence are generated on all processors.The experiment results implemented on the distributed linked PC/workstation show that the parallel mining algorithm has fine scaleup,sizeup and speedup.Last,we discuss the application of fuzzy association rules in the classification.The example shows that the accuracy of classification systems of the fuzzy association rules is better than that of the two popular classification methods:C4.5 and CBA.

  8. Database Reverse Engineering based on Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Nattapon Pannurat

    2010-03-01

    Full Text Available Maintaining a legacy database is a difficult task especially when system documentation is poor written or even missing. Database reverse engineering is an attempt to recover high-level conceptual design from the existing database instances. In this paper, we propose a technique to discover conceptual schema using the association mining technique. The discovered schema corresponds to the normalization at the third normal form, which is a common practice in many business organizations. Our algorithm also includes the rule filtering heuristic to solve the problem of exponential growth of discovered rules inherited with the association mining technique.

  9. Mining Video Association Rules Based on Weighted Temporal Concepts

    Directory of Open Access Journals (Sweden)

    V.Vijayakumar

    2012-07-01

    Full Text Available Discovery of video association rules has been found useful in many applications to explore the video knowledge such as video indexing, summarization, classification and semantic event detection. The traditional classical association rule mining algorithms can not apply directly to the video database. It differs in two ways such as spatial and temporal properties of the video database and significance of the items in the vide cluster sequence. The proposed paper discovers significant relationships in video sequence using weighted temporal concepts. The weights of the video items take the quality of transactions into considerations using modified link-based models. The proposed Modified HITS based weighted temporal concept did not require pre-assigned weights. The mined association rules have more practical significance. This strategy identifies the valuable rules comparing with Apriori based video sequence algorithm. We also present results of applying these algorithms to a synthetic data set, which show the effectiveness of our algorithm.

  10. Efficient Mining of Association Rules in Oscillatory-based Data

    Directory of Open Access Journals (Sweden)

    Mohammad Saniee Abadeh

    2011-12-01

    Full Text Available Association rules are one of the most researched areas of data mining. Finding frequent patterns is an important step in association rules mining which is very time consuming and costly. In this paper, an effective method for mining association rules in the data with the oscillatory value (up, down is presented, such as the stock price variation in stock exchange, which, just a few numbers of the counts of itemsets are searched from the database, and the counts of the rest of itemsets are computed using the relationships that exist between these types of data. Also, the strategy of pruning is used to decrease the searching space and increase the rate of the mining process. Thus, there is no need to investigate the entire frequent patterns from the database. This takes less time to find frequent patterns. By executing the MR-Miner (an acronym for “Math Rules-Miner” algorithm, its performance on the real stock data is analyzed and shown. Our experiments show that the MR-Miner algorithm can find association rules very efficiently in the data based on Oscillatory value type.

  11. Mining fuzzy association rules in spatio-temporal databases

    Science.gov (United States)

    Shu, Hong; Dong, Lin; Zhu, Xinyan

    2008-12-01

    A huge amount of geospatial and temporal data have been collected through various networks of environment monitoring stations. For instance, daily precipitation and temperature are observed at hundreds of meteorological stations in Northeastern China. However, these massive raw data from the stations are not fully utilized for meeting the requirements of human decision-making. In nature, the discovery of geographical data mining is the computation of multivariate spatio-temporal correlations through the stages of data mining. In this paper, a procedure of mining association rules in regional climate-changing databases is introduced. The methods of Kriging interpolation, fuzzy cmeans clustering, and Apriori-based logical rules extraction are employed subsequently. Formally, we define geographical spatio-temporal transactions and fuzzy association rules. Innovatively, we make fuzzy data conceptualization by means of fuzzy c-means clustering, and transform fuzzy data items with membership grades into Boolean data items with weights by means ofλ-cut sets. When the algorithm Apriori is executed on Boolean transactions with weights, fuzzy association rules are derived. Fuzzy association rules are more nature than crisp association rules for human cognition about the reality.

  12. An Inference Mechanism Framework for Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Kapil Chaturvedi

    2014-09-01

    Full Text Available Available approaches for Association Rule Mining (ARM generates a large number of association rules, these rules may be trivial and redundant and also such rules are difficult to manage and understand for the users. If we consider their complexity, then it consumes lots of time and memory. Sometimes decision making is impossible for such kinds of association rules. An inference approach is required to resolve this kind of problem and to produce an interesting knowledge for the user. In this paper, we present an inference mechanism framework for ARM, which would be capable enough for resolving such problems, it would also predict future possibilities using Markov predictor by analyzing available fact and inference rules.

  13. Associative Regressive Decision Rule Mining for Predicting Customer Satisfactory Patterns

    Directory of Open Access Journals (Sweden)

    P. Suresh

    2016-04-01

    Full Text Available Opinion mining also known as sentiment analysis, involves cust omer satisfactory patterns, sentiments and attitudes toward entities, products, service s and their attributes. With the rapid development in the field of Internet, potential customer’s provi des a satisfactory level of product/service reviews. The high volume of customer rev iews were developed for product/review through taxonomy-aware processing but, it was di fficult to identify the best reviews. In this paper, an Associative Regression Decisio n Rule Mining (ARDRM technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decisi on Rule Mining performs two- steps for improving the customer satisfactory level. Initial ly, the Machine Learning Bayes Sentiment Classifier (MLBSC is used to classify the cla ss labels for each service reviews. After that, Regressive factor of the opinion words and Class labels w ere checked for Association between the words by using various probabilistic rules. Based on t he probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review com ments. The Associative Regressive Decision Rule helps the service provider to take decision on imp roving the customer satisfactory level. The experimental results reveal that the Associ ative Regression Decision Rule Mining (ARDRM technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time a nd Review Detection Accuracy of similar pattern.

  14. Efficient Mining of Association Rules in Oscillatory-based Data

    Directory of Open Access Journals (Sweden)

    Mohammad Saniee Abadeh & Mojtaba Ala

    2011-12-01

    Full Text Available Association rules are one of the most researched areas of data mining. Finding frequent patternsis an important step in association rules mining which is very time consuming and costly. In thispaper, an effective method for mining association rules in the data with the oscillatory value (up,down is presented, such as the stock price variation in stock exchange, which, just a fewnumbers of the counts of itemsets are searched from the database, and the counts of the rest ofitemsets are computed using the relationships that exist between these types of data. Also, thestrategy of pruning is used to decrease the searching space and increase the rate of the miningprocess. Thus, there is no need to investigate the entire frequent patterns from the database.This takes less time to find frequent patterns. By executing the MR-Miner (an acronym for “MathRules-Miner” algorithm, its performance on the real stock data is analyzed and shown. Ourexperiments show that the MR-Miner algorithm can find association rules very efficiently in thedata based on Oscillatory value type.

  15. Secure Association Rule Mining for Distributed Level Hierarchy in Web

    Directory of Open Access Journals (Sweden)

    Gulshan Shrivastava,

    2011-06-01

    Full Text Available Data mining technology can analyze massive data and it play very important role in many domains, if it used improperly it can also cause some new problem of information security. Thus severalprivacy preserving techniques for association rule mining have also been proposed in the past few years. Various algorithms have been developed for centralized data, while others refer to distributed data scenario. Distributed data Scenarios can also be classified as heterogeneous distributed data and homogenous distributed data and we identify that distributed data could be partitioned as horizontal partition (a.k.a. homogeneous distribution and vertical partition (a.k.a. heterogeneous distribution. In this paper, we propose an algorithm for secure association rule mining for vertical partition.

  16. Aggregate Function Based Enhanced Apriori Algorithm for Mining Association Rules

    Directory of Open Access Journals (Sweden)

    Medhat H A Awadalla

    2012-05-01

    Full Text Available Association rule analysis is the task of discovering association rules that occur frequently in a given transaction data set. Its task is to find certain relationships among a set of data (itemset in the database. It has two measurements: Support and confidence values. Confidence value is a measure of rules strength, while support value corresponds to statistical significance. Traditional association rule mining techniques employ predefined support and confidence values. However, specifying minimum support value of the mined rules in advance often leads to either too many or too few rules, which negatively impacts the performance of the overall system. To replace the Aprori's user defined minimum threshold value, this paper proposes an aggregate function based on Central Limit Theorem CLT that calculates a more meaningful minimum threshold value. The paper also proposes a new function, Specified Minimum Support value function with bit mapping, which calculates a custom minimum support for each item set based on the probability of collision chance of its items. Furthermore, a modification for Apriori algorithm to accommodate this function is proposed. Experiments on large set of data bases have been conducted to validate the proposed framework. The achieved results show that there is a remarkable improvement in the overall performance of the system in terms of run time, the number of generated rules, and number of frequent items used.

  17. Classification approach based on association rules mining for unbalanced data

    CERN Document Server

    Ndour, Cheikh

    2012-01-01

    This paper deals with the supervised classification when the response variable is binary and its class distribution is unbalanced. In such situation, it is not possible to build a powerful classifier by using standard methods such as logistic regression, classification tree, discriminant analysis, etc. To overcome this short-coming of these methods that provide classifiers with low sensibility, we tackled the classification problem here through an approach based on the association rules learning because this approach has the advantage of allowing the identification of the patterns that are well correlated with the target class. Association rules learning is a well known method in the area of data-mining. It is used when dealing with large database for unsupervised discovery of local patterns that expresses hidden relationships between variables. In considering association rules from a supervised learning point of view, a relevant set of weak classifiers is obtained from which one derives a classification rule...

  18. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

    Institute of Scientific and Technical Information of China (English)

    Daniel Kunkle; Donghui Zhang; Gene Cooperman

    2008-01-01

    This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a fiat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR

  19. A New Parallel Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    DING Yan-hui; WANG Hong-guo; GAO Ming; GU Jian-jun

    2006-01-01

    Mining association rules from large database is very costly.We develop a parallel algorithm for this task on sharedmemory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to scan the database at least two times. In this article, a parallel algorithm Scan Once (SO) has been proposed for SMP,which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm Count Distribution (CD). It adopts bit matrix to store the database information and gets the support of the frequent itemsets by adopting Vector-And-Operation, which greatly improve the efficiency of generating all frequent itemsets.Empirical evaluation shows that the algorithm outperforms the known one CD algorithm.

  20. Feasibility study for banking loan using association rule mining classifier

    Directory of Open Access Journals (Sweden)

    Agus Sasmito Aribowo

    2015-03-01

    Full Text Available The problem of bad loans in the koperasi can be reduced if the koperasi can detect whether member can complete the mortgage debt or decline. The method used for identify characteristic patterns of prospective lenders in this study, called Association Rule Mining Classifier. Pattern of credit member will be converted into knowledge and used to classify other creditors. Classification process would separate creditors into two groups: good credit and bad credit groups. Research using prototyping for implementing the design into an application using programming language and development tool. The process of association rule mining using Weighted Itemset Tidset (WIT–tree methods. The results shown that the method can predict the prospective customer credit. Training data set using 120 customers who already know their credit history. Data test used 61 customers who apply for credit. The results concluded that 42 customers will be paying off their loans and 19 clients are decline

  1. AN INCREMENTAL UPDATING ALGORITHM FOR MINING ASSOCIATION RULES

    Institute of Scientific and Technical Information of China (English)

    Xu Baowen; Yi Tong; Wu Fangjun; Chen Zhenqiang

    2002-01-01

    In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.

  2. Efficient Data Mining in SAMS through Association Rule

    Directory of Open Access Journals (Sweden)

    Mr. Rahul B. Diwate

    2014-05-01

    Full Text Available We propose a protocol for secure mining of association rules in distributed databases. Previous techniques all people deals with different database, now a day’s people also deals with the distributed database. Can we develop a kind of application in which the people can access the distributed data which is already store in remote location in encrypted format? This proposes system technique is used for efficient data mining in SAMS (Student Assessment Management System through association rules in distributed databases. The current leading techniques are that of Kantarcioglu and Clifton. This proposed system is ready to implements two methods, one that computes the union of private subsets that each of the interacting users hold, and another that tests the inclusion of an element held by one user in a subset held by another .We propose a protocol for secure mining through association rule consist a different level of execution process to secure storage of data and access of data. This paper will focus on such process for secure storage plus secure access of data

  3. COLLABORATIVE NETWORK SECURITY MANAGEMENT SYSTEM BASED ON ASSOCIATION MINING RULE

    Directory of Open Access Journals (Sweden)

    Nisha Mariam Varughese

    2014-07-01

    Full Text Available Security is one of the major challenges in open network. There are so many types of attacks which follow fixed patterns or frequently change their patterns. It is difficult to find the malicious attack which does not have any fixed patterns. The Distributed Denial of Service (DDoS attacks like Botnets are used to slow down the system performance. To address such problems Collaborative Network Security Management System (CNSMS is proposed along with the association mining rule. CNSMS system is consists of collaborative Unified Threat Management (UTM, cloud based security centre and traffic prober. The traffic prober captures the internet traffic and given to the collaborative UTM. Traffic is analysed by the Collaborative UTM, to determine whether it contains any malicious attack or not. If any security event occurs, it will reports to the cloud based security centre. The security centre generates security rules based on association mining rule and distributes to the network. The cloud based security centre is used to store the huge amount of tragic, their logs and the security rule generated. The feedback is evaluated and the invalid rules are eliminated to improve the system efficiency.

  4. Study on the Customer targeting using Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Surendiran.R

    2010-10-01

    Full Text Available Data mining is one of the widest area where many researches takes place to mine desired and hidden data. There are many different approaches to find the hidden data. This paper deals with Frequent Pattern growth algorithm which follows association rule concept togroup the required data items. Using this method of mining time can be reduced to a greater extent. This paper contains implementation of a real time system; the implementation is about making a survey on the group of people and their mobile connection’s service providers.End result contains the set of people from a particular age group with their support and confidence for the service provider they have chosen. Based on which any decisions can be made by service providers to enhance their business and attain many customers.

  5. Penguins Search Optimisation Algorithm for Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Youcef Gheraibia

    2016-06-01

    Full Text Available Association Rules Mining (ARM is one of the most popular and well-known approaches for the decision-making process. All existing ARM algorithms are time consuming and generate a very large number of association rules with high overlapping. To deal with this issue, we propose a new ARM approach based on penguins search optimization algorithm (Pe-ARM for short. Moreover, an efficient measure is incorporated into the main process to evaluate the amount of overlapping among the generated rules. The proposed approach also ensures a good diversification over the whole solutions space. To demonstrate the effectiveness of the proposed approach, several experiments have been carried out on different datasets and specifically on the biological ones. The results reveal that the proposed approach outperforms the well-known ARM algorithms in both execution time and solution quality.

  6. PHARM – Association Rule Mining for Predictive Health

    Science.gov (United States)

    Cheng, Chih-Wen; Martin, Greg S.; Wu, Po-Yen; Wang, May D.

    2016-01-01

    Predictive health is a new and innovative healthcare model that focuses on maintaining health rather than treating diseases. Such a model may benefit from computer-based decision support systems, which provide more quantitative health assessment, enabling more objective advice and action plans from predictive health providers. However, data mining for predictive health is more challenging compared to that for diseases. This is a reason why there are relatively fewer predictive health decision support systems embedded with data mining. The purpose of this study is to research and develop an interactive decision support system, called PHARM, in conjunction with Emory Center for Health Discovery and Well Being (CHDWB®). PHARM adopts association rule mining to generate quantitative and objective rules for health assessment and prediction. A case study results in 12 rules that predict mental illness based on five psychological factors. This study shows the value and usability of the decision support system to prevent the development of potential illness and to prioritize advice and action plans for reducing disease risks.

  7. Analysis of Electric Power System Using Data Mining Association Rule

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jun Sub; Kim, Min Soo; Choi, Sang Yule; Kim, Chul Whan; Kim, Ung Mo [Skungkyunkwan University (Korea)

    2001-07-01

    Data Mining is a issue of Database fields. Data mining is discovered optimally interesting rules for user, which are results of specific requirement of user, through past data. Through to analyze and to statical suppose interesting rules, we can prepare future faults of system. In this paper, we present a new way which is discovered and repaired faults of Electric Power system using Data Mining techniques. (author). 15 refs., 4 figs., 1 tab.

  8. ASSOCIATION RULES IN HORIZONTALLY DISTRIBUTED DATABASES WITH ENHANCED SECURE MINING

    Directory of Open Access Journals (Sweden)

    Sonal Patil

    2015-10-01

    Full Text Available Recent developments in information technology have made possible the collection and analysis of millions of transactions containing personal data. These data include shopping habits, criminal records, medical histories and credit records among others. In the term of distributed database, distributed database is a database in which storage devices are not all attached to a common processing unit such as the CPU controlled by a distributed database management system (together sometimes called a distributed database system. It may be stored in multiple computers located in the same physical location or may be dispersed over a network of interconnected computers. A protocol has been proposed for secure mining of association rules in horizontally distributed databases. This protocol is optimized than the Fast Distributed Mining (FDM algorithm which is an unsecured distributed version of the Apriori algorithm. The main purpose of this protocol is to remove the problem of mining generalized association rules that affects the existing system. This protocol offers more enhanced privacy with respect to previous protocols. In addition it is simpler and is optimized in terms of communication rounds, communication cost and computational cost than other protocols.

  9. 关联规则挖掘研究述评%Association Rule Mining: A Survey

    Institute of Scientific and Technical Information of China (English)

    贾彩燕; 倪现君

    2003-01-01

    Association rule mining has been one of the most popular data mining subejcts and has a wide range of applicability. In this paper, we first investigate the main approaches for the task of association rule mining, and analyzed the essence of the algorithms. Then we review foundations of assocation rule mining based on the several possible theoretical frameworks for data mining. What's more,we show the open problems in field of the association rule mining and figure out the tendency of its development in recent years.

  10. AN INCREMENTAL UPDATING ALGORITHM FOR MINING ASSOCIATION RULES

    Institute of Scientific and Technical Information of China (English)

    XuBaowen; YiTong; 等

    2002-01-01

    In this letter,on the basis of Frequent Pattern(FP) tree,the support function to update FP-tree is introduced,then an incremental FP(IFP) algorithm for mining association rules is proposed.IFP algorithm considers not only adding new data into the database but also reducing old data from the database.Furthermore,it can predigest five cases to three case .The algorithm proposed in this letter can avoid generating lots of candidate items,and it is high efficient.

  11. An Efficient Approach to Prune Mined Association Rules in Large Databases

    Directory of Open Access Journals (Sweden)

    D. Narmadha

    2011-01-01

    Full Text Available Association rule mining finds interesting associations and/or correlation relationships among large set of data items. However, when the number of association rules become large, it becomes less interesting to the user. It is crucial to help the decision-maker with an efficient postprocessing step in order to select interesting association rules throughout huge volumes of discovered rules. This motivates the need for association analysis. Thus, this paper presents a novel approach to prune mined association rules in large databases. Further, an analysis of different association rule mining techniques for market basket analysis, highlighting strengths of different association rule mining techniques are also discussed. We want to point out potential pitfalls as well as challenging issues need to be addressed by an association rule mining technique. We believe that the results of this approach will help decision maker for making important decisions.

  12. Prediction of users webpage access behaviour using association rule mining

    Indian Academy of Sciences (India)

    R Geetharamani; P Revathy; Shomona G Jacob

    2015-12-01

    Web Usage mining is a technique used to identify the user needs from the web log. Discovering hidden patterns from the logs is an upcoming research area. Association rules play an important role in many web mining applications to detect interesting patterns. However, it generates enormous rules that cause researchers to spend ample time and expertise to discover the really interesting ones. This paper works on the server logs from the MSNBC dataset for the month of September 1999. This research aims at predicting the probable subsequent page in the usage of web pages listed in this data based on their navigating behaviour by using Apriori prefix tree (PT) algorithm. The generated rules were ranked based on the support, confidence and lift evaluation measures. The final predictions revealed that the interestingness of pages mainly depended on the support and lift measure whereas confidence assumed a uniform value among all the pages. It proved that the system guaranteed 100% confidence with the support of 1.3E−05. It revealed that the pages such as Front page, On-air, News, Sports and BBS attracted more interested subsequent users compared to Travel, MSN-News and MSN-Sports which were of less interest.

  13. Integrated Web Recommendation Model with Improved Weighted Association Rule Mining

    Directory of Open Access Journals (Sweden)

    S.A.Sahaaya Arul Mary

    2013-04-01

    Full Text Available World Wide Web plays a significant role in human life. It requires a technological improvement to satisfy the user needs. Web log data is essential for improving the performance of the web. It contains large,heterogeneous and diverse data. Analyzing g the web log data is a tedious process for Web developers, Web designers, technologists and end users. In this work, a new weighted association mining algorithm is developed to identify the best association rules that are useful for web site restructuring and recommendation that reduces false visit and improve users’ navigation behavior. The algorithm finds the frequent item set from a large uncertain database. Frequent scanning of database in each time is the problem with the existing algorithms which leads to complex output set and time consuming process. Theproposed algorithm scans the database only once at the beginning of the process and the generated frequent item sets, which are stored into the database. The evaluation parameters such as support, confidence, lift and number of rules are considered to analyze the performance of proposed algorithm and traditional association mining algorithm. The new algorithm produced best result that helps the developer to restructure their website in a way to meet the requirements of the end user within short time span.

  14. A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

    Directory of Open Access Journals (Sweden)

    Ms. Sanober Shaikh

    2011-09-01

    Full Text Available In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set. The defined algorithm scans the database at the start only once and then makes the undirected item set graph. From this graph by considering minimum support it finds the frequent item set and by considering the minimum confidence it generates the association rule. If database and minimum support is changed, the new algorithm finds the new frequent items by scanning undirected item set graph. That is why it’s executing efficiency is improved distinctly compared to traditional algorithm.

  15. A SURVEY ON PRIVACY PRESERVING ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    K.Sathiyapriya

    2013-03-01

    Full Text Available Businesses share data, outsourcing for specific business problems. Large companies stake a large part of their business on analysis of private data. Consulting firms often handle sensitive third party data as part of client projects. Organizations face great risks while sharing their data. Most of this sharing takes place with little secrecy. It also increases the legal responsibility of the parties involved in the process. So, it is crucial to reliably protect their data due to legal and customer concerns. In this paper, a review of the state-of-the-art methods for privacy preservation is presented. It also analyzes the techniques for privacy preserving association rule mining and points out their merits and demerits. Finally the challenges and directions for future research are discussed.

  16. SQL Based Association Rule Mining%基于SQL的关联规则挖掘

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    Data mining is becoming increasingly important since the size of database grows even larger and the need to explore hidden rules from the database becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will definitely ease implementation of data mining. In this paper ,we introduce an association rule mining algorithm based on Apriori and the implementation using SQL. At the end of the paper ,we summarize the paper.

  17. A Conformity Measure using Background Knowledge for Association Rules: Application to Text Mining

    OpenAIRE

    Cherfi, Hacène; Napoli, Amedeo; Toussaint, Yannick

    2009-01-01

    A text mining process using association rules generates a very large number of rules. According to experts of the domain, most of these rules basically convey a common knowledge, i.e. rules which associate terms that experts may likely relate to each other. In order to focus on the result interpretation and discover new knowledge units, it is necessary to define criteria for classifying the extracted rules. Most of the rule classification methods are based on numerical quality measures. In th...

  18. Association Rule Hiding Techniques for Privacy Preserving Data Mining: A Study

    Directory of Open Access Journals (Sweden)

    Gayathiri P

    2015-12-01

    Full Text Available Association rule mining is an efficient data mining technique that recognizes the frequent items and associative rule based on a market basket data analysis for large set of transactional databases. The probability of most frequent data item occurrence of the transactional data items are calculated to present the associative rule that represents the habits of buying products of the customers in demand. Identifying associative rules of a transactional database in data mining may expose the confidentiality and privacy of an organization and individual. Privacy Preserving Data Mining (PPDM is a solution for privacy threats in data mining. This issue is solved using Association Rule Hiding (ARH techniques in Privacy Preserving Data Mining (PPDM. This research work on Association Rule Hiding technique in data mining performs the generation of sensitive association rules by the way of hiding based on the transactional data items. The property of hiding rules not the data makes the sensitive rule hiding process is a minimal side effects and higher data utility technique.

  19. Mining Association Rules among Gene Functions in Clusters of Similar Gene Expression Maps

    OpenAIRE

    An, Li; Obradovic, Zoran; Smith, Desmond; Bodenreider, Olivier; Megalooikonomou, Vasileios

    2009-01-01

    Association rules mining methods have been recently applied to gene expression data analysis to reveal relationships between genes and different conditions and features. However, not much effort has focused on detecting the relation between gene expression maps and related gene functions. Here we describe such an approach to mine association rules among gene functions in clusters of similar gene expression maps on mouse brain. The experimental results show that the detected association rules ...

  20. Validity of association rules extracted by healthcare-data-mining.

    Science.gov (United States)

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  1. A Fast Distributed Algorithm for Association Rule Mining Based on Binary Coding Mapping Relation

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining algorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.

  2. Action Rules Mining

    CERN Document Server

    Dardzinska, Agnieszka

    2013-01-01

    We are surrounded by data, numerical, categorical and otherwise, which must to be analyzed and processed to convert it into information that instructs, answers or aids understanding and decision making. Data analysts in many disciplines such as business, education or medicine, are frequently asked to analyze new data sets which are often composed of numerous tables possessing different properties. They try to find completely new correlations between attributes and show new possibilities for users.   Action rules mining discusses some of data mining and knowledge discovery principles and then describe representative concepts, methods and algorithms connected with action. The author introduces the formal definition of action rule, notion of a simple association action rule and a representative action rule, the cost of association action rule, and gives a strategy how to construct simple association action rules of a lowest cost. A new approach for generating action rules from datasets with numerical attributes...

  3. Research on Algorithm for Mining Negative Association Rules Based on Frequent Pattern Tree

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP-tree is presented to discover negative association rules.

  4. An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

    Directory of Open Access Journals (Sweden)

    Yogendra Kumar Jain,

    2011-07-01

    Full Text Available The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Association rule hiding algorithm for privacy preserving data mining is to hide certain information so that they cannot be discovered through association rule mining algorithm. The main approached of association rule hiding algorithms to hide some generated association rules, by increase or decrease the support or the confidence of the rules. The association rule items whether in Left Hand Side (LHS or Right Hand Side (RHS of the generated rule, that cannot be deduced through association rule mining algorithms. The concept of IncreaseSupport of Left Hand Side (ISL algorithm is decrease the confidence of rule by increase the support value of LHS. It doesn’t work for both side of rule; it works only for modification of LHS. In Decrease Support of Right Hand Side (DSR algorithm, confidence of the rule decrease by decrease the support value of RHS. It works for the modification of RHS. We proposed a new algorithm solves the problem of them. That can increase and decrease the support of the LHS and RHS item of the rule correspondingly so that more rule hide less number of modification. The efficiency of the proposed algorithm is compared with ISL algorithms and DSR algorithms using real databases, on the basis of number of rules hide, CPU time and the number ofmodifies entries and got better results.

  5. GenMiner: mining informative association rules from genomic data

    OpenAIRE

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2007-01-01

    International audience GENMINER is a smart adaptation of closed itemsets based association rules extraction to genomic data. It takes advantage of the novel NORDI discretization method and of the JCLOSE algorithm to efficiently generate minimal non-redundant association rules. GENMINER facilitates the integration of numerous sources of biological information such as gene expressions and annotations, and can tacitly integrate qualitative information on biological conditions (age, sex, etc.)....

  6. Using Association Rule Mining for Extracting Product Sales Patterns in Retail Store Transactions

    Directory of Open Access Journals (Sweden)

    Pramod Prasad,

    2011-05-01

    Full Text Available Computers and software play an integral part in the working of businesses and organisations. An immense amount of data is generated with the use of software. These large datasets need to be analysed for useful information that would benefit organisations, businesses and individuals by supporting decision making and providing valuable knowledge. Data mining is an approach that aids in fulfilling this requirement. Data mining is the process of applying mathematical, statistical and machine learning techniques on large quantities of data (such as a data warehouse with the intention of uncovering hidden patterns, often previously unknown. Data mining involvesthree general approaches to extracting useful information from large data sets, namely, classification, clustering and association rule mining. This paper elaborates upon the use of association rule mining in extracting patterns that occur frequently within a dataset and showcases the implementation of the Apriori algorithm in mining association rules from a dataset containing sales transactions of a retail store.

  7. WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    Pratiyush Guleria

    2015-11-01

    Full Text Available This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based tool developed using Asp.Net framework and php can be helpful for universities or institutions providing the students with elective courses as well improving academic activities based on feedback collected from students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from students and based on that Feedback it shows performance of faculty and institution. Using that data, it helps management to improve in-house training skills and gains knowledge about educational trends which is to be followed by faculty to improve the effectiveness of the course and teaching skills.

  8. Reduction of Negative and Positive Association Rule Mining and Maintain Superiority of Rule Using Modified Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Nikhil Jain,Vishal Sharma,Mahesh Malviya

    2012-12-01

    Full Text Available Association rule mining play important rule inmarket data analysis and also in medical diagnosisof correlated problem. For the generation ofassociation rule mining various technique are usedsuch as Apriori algorithm, FP-growth and treebased algorithm. Some algorithms are wonderperformance but generate negative association ruleand also suffered from Superiority measureproblem. In this paper we proposed a multi-objectiveassociation rule mining based on genetic algorithmand Euclidean distance formula. In this method wefind the near distance of rule set using Euclideandistance formula and generate two class higherclass and lower class .the validate of class check bydistance weight vector. Basically distance weightvector maintain a threshold value of rule itemsets.In whole process we used genetic algorithm foroptimization of rule set. Here we set population sizeis 1000 and selection process validate by distanceweight vector. Our proposed algorithm distanceweight optimization of association rule mining withgenetic algorithm compared with multi-objectiveassociation rule optimization using geneticalgorithm. Our proposed algorithm is better rule setgeneration instead of MORA method.

  9. Interestingness Measure for Mining Spatial Gene Expression Data using Association Rule

    CERN Document Server

    Anandhavalli, M; Gauthaman, K

    2010-01-01

    The search for interesting association rules is an important topic in knowledge discovery in spatial gene expression databases. The set of admissible rules for the selected support and confidence thresholds can easily be extracted by algorithms based on support and confidence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. The challenge in association rule mining (ARM) essentially becomes one of determining which rules are the most interesting. Association rule interestingness measures are used to help select and rank association rule patterns. Besides support and confidence, there are other interestingness measures, which include generality reliability, peculiarity, novelty, surprisingness, utility, and applicability. In this paper, the application of the interesting measures entropy and variance for association pattern discovery from spatial gene expression data has been studied. In this study the fast mining algorithm has been used which produce candidat...

  10. An algorithm of spatial association rules mining used in mobile computing

    Science.gov (United States)

    Tu, Chengsheng

    2011-10-01

    In order to fast mine spatial association rules and improve efficiency of mobile intelligent system, this paper proposes an algorithm of alternative search spatial association rules mining. The algorithm firstly uses the way of spatial buffer analysis to extract spatial predicate values, and then uses spatial predicate value of every target location to form a spatial transaction and turns it into integer by binary coding, finally uses iteration method of alternative search to extract spatial association rules, namely, not only does it use iteration method of gaining (L-1)-subset of L-non frequent itemsets to generate candidate frequent itemsets, it also uses iteration method of gaining (K+1)-superset of K-frequent itemsets to generate candidate frequent itemsets. The result of simulate experiment indicates that the algorithm is faster and more efficient than present mining algorithms when mining spatial association rules in mobile computing.

  11. EOQ estimation for imperfect quality items using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-09-01

    Full Text Available Timely identification of newly emerging trends is needed in business process. Data mining techniques like clustering, association rule mining, classification, etc. are very important for business support and decision making. This paper presents a method for redesigning the ordering policy by including cross-selling effect. Initially, association rules are mined on the transactional database and EOQ is estimated with revenue earned. Then, transactions are clustered to obtain homogeneous clusters and association rules are mined in each cluster to estimate EOQ with revenue earned for each cluster. Further, this paper compares ordering policy for imperfect quality items which is developed by applying rules derived from apriori algorithm viz. a without clustering the transactions, and b after clustering the transactions. A numerical example is illustrated to validate the results.

  12. Mining of the quantitative association rules with standard SQL queries and its evaluation

    Institute of Scientific and Technical Information of China (English)

    孙海洪; 唐菁; 蒋洪; 杨炳儒

    2004-01-01

    A new algorithm for mining quantitative association rules with standard SQL is presented. The association rules are evaluated with the sufficiency gene LS of subjectivity Bayes reasoning. This algorithm is proved to be quick and effective with its application in Lujiang insects and pests database.

  13. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  14. Association Rule Mining for Both Frequent and Infrequent Items Using Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    MIR MD. JAHANGIR KABIR

    2014-07-01

    Full Text Available In data mining research, generating frequent items from large databases is one of the important issues and the key factor for implementing association rule mining tasks. Mining infrequent items such as relationships among rare but expensive products is another demanding issue which have been shown in some recent studies. Therefore this study considers user assigned threshold values as a constraint which helps users mine those rules which are more interesting for them. In addition, in real world users may prefer to know relationships among frequent items along with infrequent ones. The particle swarm optimization algorithm is an important heuristic technique in recent years and this study uses this technique to mine association rules effectively. If this technique considers user defined threshold values, interesting association rules can be generated more efficiently. Therefore this study proposes a novel approach which includes using particle swarm optimization algorithm to mine association rules from databases. Our implementation of the search strategy includes bitmap representation of nodes in a lexicographic tree and from superset-subset relationship of the nodes it classifies frequent items along with infrequent itemsets. In addition, this approach avoids extra calculation overhead for generating frequent pattern trees and handling large memory which store the support values of candidate item sets. Our experimental results show that this approach efficiently mines association rules. It accesses a database to calculate a support value for fewer numbers of nodes to find frequent itemsets and from that it generates association rules, which dramatically reduces search time. The main aim of this proposed algorithm is to show how heuristic method works on real databases to find all the interesting association rules in an efficient way.

  15. Rule pruning and prediction methods for associative classification approach in data mining

    OpenAIRE

    Abu Mansour, Hussein Y

    2012-01-01

    Recent studies in data mining revealed that Associative Classification (AC) data mining approach builds competitive classification classifiers with reference to accuracy when compared to classic classification approaches including decision tree and rule based. Nevertheless, AC algorithms suffer from a number of known defects as the generation of large number of rules which makes it hard for end-user to maintain and understand its outcome and the possible over-fitting issue caused by the confi...

  16. Causal association rule mining methods based on fuzzy state description

    Institute of Scientific and Technical Information of China (English)

    Liang Kaijian; Liang Quan; Yang Bingru

    2006-01-01

    Aiming at the research that using more new knowledge to develope knowledge system with dynamic accordance, and under the background of using Fuzzy language field and Fuzzy language values structure as description framework, the generalized cell Automation that can synthetically process fuzzy indeterminacy and random indeterminacy and generalized inductive logic causal model is brought forward. On this basis, a kind of the new method that can discover causal association rules is provded. According to the causal information of standard sample space and commonly sample space,through constructing its state (abnormality) relation matrix, causal association rules can be gained by using inductive reasoning mechanism. The estimate of this algorithm complexity is given,and its validity is proved through case.

  17. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    OpenAIRE

    Vishwadeepak Singh Baghela; S. P. Tripathi

    2012-01-01

    A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mi...

  18. An algorithm about spatial association rule mining based on cell pattern

    Science.gov (United States)

    Chen, Jiangping; Li, Pingxiang; Fei, Huang; Wang, Rong

    2006-10-01

    Spatial association rule is one of the upmost knowledge rules in the result of spatial data mining. It emphasizes particularly on confirming the relation of data in different fields. It tries to find out the dependence of data in multi-fields. As we know, in GIS the spatial database is often separated into several layers or tables according the type of the spatial object such as road layer, building layer, plant layer etc. In the relational database we often separate it into several tables which be associated by the primary key and foreign key according the normal form theory. Consequently, the spatial data is stored in different layers and tables. It is necessary and meaning to mining the knowledge and rules in multi-layer and multi-tables. And, It is inevitable to mining spatial association rules in multi-layer in some application. There is a problem in it, that is the number of the rules are magnitude. So, we point a new way by using the cell pattern of the rules which the user interested to reduce and simplify the operation. In this paper the concept of multi-layer spatial association rule is put forward. Then an algorithm of mining multi-layer spatial association rule is presented which based on cell pattern and spatial concept relation. It was called AP-MLSAM in the paper. Last, an example in GIS is given. In AP-MLSAM, First, it confirms the patterns and rules that the user is interested in. Second it counts the large itemsets according with the cell pattern in each data layer. Last, the spatial association rules are gained by the itemsets which be counted in the second step. From the experiment, it proved that AP-MLSAM is effective. It improved the efficiency by reducing the time of finding the large itemsets. It is a significance research field for mining multi-layer spatial association rules. There are many applications based on multi-layer spatial association analyse. For example: traffic flux analyse in city, weather pattern analyse, trend analyse for

  19. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    Directory of Open Access Journals (Sweden)

    Vishwadeepak Singh Baghela

    2012-05-01

    Full Text Available A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mining deals with structured data (for example relational databases, whereas text presents special characteristics and is unstructured. The unstructured data is totally different from databases, where mining techniques are usually applied and structured data is managed. Text mining can work with unstructured or semi-structured data sets A brief review of some recent researches related to mining associations from text documents is presented in this paper.

  20. a Novel Similarity Assessment for Remote Sensing Images via Fast Association Rule Mining

    Science.gov (United States)

    Liu, Jun; Chen, Kai; Liu, Ping; Qian, Jing; Chen, Huijuan

    2016-06-01

    Similarity assessment is the fundamentally important to various remote sensing applications such as image classification, image retrieval and so on. The objective of similarity assessment is to automatically distinguish differences between images and identify the contents of an image. Unlike the existing feature-based or object-based methods, we concern more about the deep level pattern of image content. The association rule mining is capable to find out the potential patterns of image, hence in this paper, a fast association rule mining algorithm is proposed and the similarity is represented by rules. More specifically, the proposed approach consist of the following steps: firstly, the gray level of image is compressed using linear segmentation to avoid interference of details and reduce the computation amount; then the compressed gray values between pixels are collected to generate the transaction sets which are transformed into the proposed multi-dimension data cube structure; the association rules are then fast mined based on multi-dimension data cube; finally the mined rules are represented as a vector and similarity assessment is achieved by vector comparison using first order approximation of Kullback-Leibler divergence. Experimental results indicate that the proposed fast association rule mining algorithm is more effective than the widely used Apriori method. The remote sensing image retrieval experiments using various images for example, QuickBird, WorldView-2, based on the existing and proposed similarity assessment show that the proposed method can provide higher retrieval precision.

  1. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

    Science.gov (United States)

    Tandon, Disha; Haque, Mohammed Monzoorul; Mande, Sharmila S

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM

  2. DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR DISCOVERING ASSOCIATION RULES

    Directory of Open Access Journals (Sweden)

    Johannes K. Chiang

    2012-12-01

    Full Text Available Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. Three main weaknesses of current data-mining techniques are: 1 re-scanning of the entire database must be done whenever new attributes are added. 2 An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. 3 Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.

  3. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    CERN Document Server

    Rajendran, P

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper. The frequent patterns from the CT scan images are generated by frequent pattern tree (FP-Tree) algorithm that mines the association rules. The decision tree method has been used to classify the medical images for diagnosis. This system enhances the classification process to be more accurate. The hybrid method improves the efficiency of the proposed method than the traditional image mining methods. The experimental result on prediagnosed database of brain images showed 97% sensitivity and 95% accuracy respectively. The ph...

  4. MINING MULTIDIMENSIONAL FUZZY ASSOCIATION RULES FROM A DATABASE OF MEDICAL RECORD PATIENTS

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2008-01-01

    Full Text Available Mining association rules is one of the important tasks in the process of data mining application. In general, the input as used in the process of generating rules is taken from a certain data table by which all the corresponding values of every domain data have correlations one to each others as given in the table. A problem arises when we need to generate the rules expressing the relationship between two or more domains that belong to several different tables in a normalized database. To overcome the problem, before generating rules it is necessary to join the participant tables into a general table by a process called Denormalization Process. This paper shows a process of generating Multidimensional Fuzzy Association Rules mining from a normalized database of medical record patients. The process consists of two sub-processes, namely sub-process of join tables (Denormalization Process and sub-process of generating fuzzy rules. In general, the process of generating the fuzzy rules has been discussed in our previous papers [1, 2, 3, 4]. In addition to the process of generating fuzzy rules, this paper proposes a correlation measure of the rules as an additional consideration for evaluating interestingness of provided rules.

  5. Gain ratio based fuzzy weighted association rule mining classifier for medical diagnostic interface

    Indian Academy of Sciences (India)

    N S Nithya; K Duraiswamy

    2014-02-01

    The health care environment still needs knowledge based discovery for handling wealth of data. Extraction of the potential causes of the diseases is the most important factor for medical data mining. Fuzzy association rule mining is wellperformed better than traditional classifiers but it suffers from the exponential growth of the rules produced. In the past, we have proposed an information gain based fuzzy association rule mining algorithm for extracting both association rules and membership functions of medical data to reduce the rules. It used a ranking based weight value to identify the potential attribute. When we take a large number of distinct values, the computation of information gain value is not feasible. In this paper, an enhanced approach, called gain ratio based fuzzy weighted association rule mining, is thus proposed for distinct diseases and also increase the learning time of the previous one. Experimental results show that there is a marginal improvement in the attribute selection process and also improvement in the classifier accuracy. The system has been implemented in Java platform and verified by using benchmark data from the UCI machine learning repository.

  6. Stellar spectra association rule mining method based on the weighted frequent pattern tree

    International Nuclear Information System (INIS)

    Effective extraction of data association rules can provide a reliable basis for classification of stellar spectra. The concept of stellar spectrum weighted itemsets and stellar spectrum weighted association rules are introduced, and the weight of a single property in the stellar spectrum is determined by information entropy. On that basis, a method is presented to mine the association rules of a stellar spectrum based on the weighted frequent pattern tree. Important properties of the spectral line are highlighted using this method. At the same time, the waveform of the whole spectrum is taken into account. The experimental results show that the data association rules of a stellar spectrum mined with this method are consistent with the main features of stellar spectral types. (research papers)

  7. Stellar spectra association rule mining method based on the weighted frequent pattern tree

    Institute of Scientific and Technical Information of China (English)

    Jiang-Hui Cai; Xu-Jun Zhao; Shi-Wei Sun; Ji-Fu Zhang; Hai-Feng Yang

    2013-01-01

    Effective extraction of data association rules can provide a reliable basis for classification of stellar spectra.The concept of stellar spectrum weighted itemsets and stellar spectrum weighted association rules are introduced,and the weight of a single property in the stellar spectrum is determined by information entropy.On that basis,a method is presented to mine the association rules of a stellar spectrum based on the weighted frequent pattern tree.Important properties of the spectral line are highlighted using this method.At the same time,the waveform of the whole spectrum is taken into account.The experimental results show that the data association rules of a stellar spectrum mined with this method are consistent with the main features of stellar spectral types.

  8. Knowledge Discovery from Students’ Result Repository: Association Rule Mining Approach

    Directory of Open Access Journals (Sweden)

    Oladipupo O.O. & Oyelade O.J

    2010-06-01

    Full Text Available Over the years, several statistical tools have been used to analyze students’performance from different points of view. This paper presents data mining ineducation environment that identifies students’ failure patterns using associationrule mining technique. The identified patterns are analysed to offer a helpful andconstructive recommendations to the academic planners in higher institutions oflearning to enhance their decision making process. This will also aid in thecurriculum structure and modification in order to improve students’ academicperformance and trim down failure rate. The software for mining student failedcourses was developed and the analytical process was described.

  9. An Efficient Algorithm for Mining Multilevel Association Rule Based on Pincer Search

    Directory of Open Access Journals (Sweden)

    Pratima Gautam

    2012-07-01

    Full Text Available Discovering frequent itemset is a key difficulty in significant data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. The problem of developing models and algorithms for multilevel association mining poses for new challenges for mathematics and computer science. In this paper, we present a model of mining multilevel association rules which satisfies the different minimum support at each level, we have employed princer search concepts, multilevel taxonomy and different minimum supports to find multilevel association rules in a given transaction data set. This search is used only for maintaining and updating a new data structure. It is used to prune early candidates that would normally encounter in the top-down search. A main characteristic of the algorithms is that it does not require explicit examination of every frequent itemsets, an example is also given to demonstrate and support that the proposed mining algorithm can derive the multiple-level association rules under different supports in a simple and effective manner.

  10. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  11. A mining method for tracking changes in temporal association rules from an encoded database

    Directory of Open Access Journals (Sweden)

    Chelliah Balasubramanian

    2009-07-01

    Full Text Available Mining of association rules has become vital in organizations for decision making. The principle of data mining is better to use complicative primitive patterns and simple logical combination than simple primitive patterns and complex logical form. This paper overviews the concept of temporal database encoding, association rules mining. It proposes an innovative approach of data mining to reduce the size of the main database by an encoding method which in turn reduces the memory required. The use of the anti-Apriori algorithm reduces the number of scans over the database. The Apriori family of algorithms is applied on the encoded temporal database and their performances are compared. Also an important method on how to track the association rules that change with time is focused. This method involves initial decomposition of the problem. Later the changing association rules are tracked by dividing the time into smaller intervals and observing the changes in the itemsets obtained in each such interval. Thus the results obtained are lower complexities of computations involved, time and space with effective identification of changing association rules resulting in good decisions making. This helps in formalizing the database metrics in a better way.

  12. Spatio-Temporal Rule Mining

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2005-01-01

    Recent advances in communication and information technology, such as the increasing accuracy of GPS technology and the miniaturization of wireless communication devices pave the road for Location-Based Services (LBS). To achieve high quality for such services, spatio-temporal data mining techniques...... are needed. In this paper, we describe experiences with spatio-temporal rule mining in a Danish data mining company. First, a number of real world spatio-temporal data sets are described, leading to a taxonomy of spatio-temporal data. Second, the paper describes a general methodology that transforms...... the spatio-temporal rule mining task to the traditional market basket analysis task and applies it to the described data sets, enabling traditional association rule mining methods to discover spatio-temporal rules for LBS. Finally, unique issues in spatio-temporal rule mining are identified and discussed....

  13. Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities

    Directory of Open Access Journals (Sweden)

    C. Usha Rani

    2013-09-01

    Full Text Available Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. The three main weaknesses of current data- mining techniques are: 1 rescanning of the entire database must be done whenever new attributes are added because current methods are based on flat mining using predefined schemata. 2 An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. This may result in loss of important association rules. 3 Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. This paper also presents a benchmark which is used to compare the level of efficiency and effectiveness of the proposed algorithm against other known methods. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.

  14. A Genetic Algorithm Based Multilevel Association Rules Mining for Big Datasets

    Directory of Open Access Journals (Sweden)

    Yang Xu

    2014-01-01

    Full Text Available Multilevel association rules mining is an important domain to discover interesting relations between data elements with multiple levels abstractions. Most of the existing algorithms toward this issue are based on exhausting search methods such as Apriori, and FP-growth. However, when they are applied in the big data applications, those methods will suffer for extreme computational cost in searching association rules. To expedite multilevel association rules searching and avoid the excessive computation, in this paper, we proposed a novel genetic-based method with three key innovations. First, we use the category tree to describe the multilevel application data sets as the domain knowledge. Then, we put forward a special tree encoding schema based on the category tree to build the heuristic multilevel association mining algorithm. As the last part of our design, we proposed the genetic algorithm based on the tree encoding schema that will greatly reduce the association rule search space. The method is especially useful in mining multilevel association rules in big data related applications. We test the proposed method with some big datasets, and the experimental results demonstrate the effectiveness and efficiency of the proposed method in processing big data. Moreover, our results also manifest that the algorithm is fast convergent with a limited termination threshold.

  15. Identification of the Patterns Behavior Consumptions by Using Chosen Tools of Data Mining - Association Rules

    Directory of Open Access Journals (Sweden)

    R. Benda Prokeinová

    2014-09-01

    Full Text Available The research and development in sustainable environment, that is a subject of research goal of many various countries and food producers, now, it has a long tradition. The research aim of this paper allows for an identification of the patterns behaviour consumptions by using of association rules, because of knowledge ́s importance of segmentation differences between consumers and their opinions on current sustainable tendencies. The research area of sustainability will be in Slovakia still discussed, primarily because of impacts and consumer ́s influencing to product ́s buying, that are safety to environment and to nature. We emphasize an importance of sustainability in consumer behaviour and we detailed focused on segmentation differences between respondents. We addressed a sample made by 318 respondents. The article aims identifying sustainable consumer behaviour by using chosen data mining tool - association rules. The area of knowledge-based systems is widely overlaps with the techniques in data mining. Mining in the data is in fact devoted to the process of acquiring knowledge from large amounts of data. Its techniques and approaches are useful only when more focused external systems as well as more general systems to work with knowledge. One of the challenges of knowledge-based systems is to derive new knowledge on the basis of known facts and knowledge. This function in a sense meets methods using association rules. Association rules as a technique in data mining is useful in various applications such as analysis of the shopping cart, discovering hidden dependencies entries or recommendation. After an introduction and explanation of the principle of sustainability in consumption, association rules, follows description of the algorithm for obtaining rules from transaction data. Then will present the practical application of the data obtained by questionnaire survey. Calculations are performed in the free data mining software Tanagra.

  16. A New Hybrid Algorithm for Association Rule Mining

    Institute of Scientific and Technical Information of China (English)

    ZHANG Min-cong; YAN Cun-liang; ZHU Kai-yu

    2007-01-01

    HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ItemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (direct-addressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.

  17. The Books Recommend Service System Based on Improved Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    王萍

    2009-01-01

    The Apriori algorithm is a classical method of association rules mining. Based on analysis of this theory, the paper provides an improved Apriori algorithm. The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to en-hance the usage efficiency of resources as well as the individualized service of the data library.

  18. A Review of Protein-DNA Binding Motif using Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Virendra Kumar Tripathi

    2013-03-01

    Full Text Available The survival of gene regulation and life mechanisms is pre-request of finding unknown pattern of transcription factor binding sites. The discovery motif of gene regulation in bioinformatics is challenging jobs for getting relation between transcription factors and transcription factor binding sites. The increasing size and length of string pattern of motif is issued a problem related to modeling and optimization of gene selection process. In this paper we give a survey of protein-DNA binding using association rule mining. Association rule mining well known data mining technique for pattern analysis. The capability of negative and positive pattern generation help full for discovering of new pattern in DNA binding bioinformatics data. The other data mining approach such as clustering and classification also applied the process of gene selection grouping for known and unknown pattern. But faced a problem of valid string of DNA data, the rule mining principle find a better relation between transcription factors and transcription factor binding sites.

  19. A Review of Protein-DNA Binding Motif using Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Virendra Kumar Tripathi,

    2013-04-01

    Full Text Available Thesurvival of gene regulation and lifemechanisms is pre-request of finding unknownpattern oftranscription factor binding sites. Thediscovery motif of gene regulation inbioinformaticsis challenging jobs for getting relation betweentranscription factors and transcription factorbinding sites. The increasing size and length ofstring pattern of motif is issued a problem related tomodeling and optimization of gene selectionprocess. In this paper we give a survey of protein-DNA binding using association rule mining.Association rule mining well knowndata miningtechnique for pattern analysis. The capability ofnegative and positive pattern generation help fullfordiscoveringof new pattern in DNA bindingbioinformatics data. The other data miningapproach such as clustering and classification alsoapplied the process of gene selection grouping forknown and unknown pattern. But faced a problemof valid string of DNA data, the rule miningprinciple find a better relation between transcriptionfactors and transcription factor binding sites.

  20. A Novel Approach for Discovery Quantitative Fuzzy Multi-Level Association Rules Mining Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saad M. Darwish

    2016-10-01

    Full Text Available Quantitative multilevel association rules mining is a central field to realize motivating associations among data components with multiple levels abstractions. The problem of expanding procedures to handle quantitative data has been attracting the attention of many researchers. The algorithms regularly discretize the attribute fields into sharp intervals, and then implement uncomplicated algorithms established for Boolean attributes. Fuzzy association rules mining approaches are intended to defeat such shortcomings based on the fuzzy set theory. Furthermore, most of the current algorithms in the direction of this topic are based on very tiring search methods to govern the ideal support and confidence thresholds that agonize from risky computational cost in searching association rules. To accelerate quantitative multilevel association rules searching and escape the extreme computation, in this paper, we propose a new genetic-based method with significant innovation to determine threshold values for frequent item sets. In this approach, a sophisticated coding method is settled, and the qualified confidence is employed as the fitness function. With the genetic algorithm, a comprehensive search can be achieved and system automation is applied, because our model does not need the user-specified threshold of minimum support. Experiment results indicate that the recommended algorithm can powerfully generate non-redundant fuzzy multilevel association rules.

  1. Mining Association Rules to Evade Network Intrusion in Network Audit Data

    Directory of Open Access Journals (Sweden)

    Kamini Nalavade

    2014-06-01

    Full Text Available With the growth of hacking and exploiting tools and invention of new ways of intrusion, intrusion detection and prevention is becoming the major challenge in the world of network security. The increasing network traffic and data on Internet is making this task more demanding. There are various approaches being utilized in intrusion detections, but unfortunately any of the systems so far is not completely flawless. The false positive rates make it extremely hard to analyse and react to attacks. Intrusion detection systems using data mining approaches make it possible to search patterns and rules in large amount of audit data. In this paper, we represent a model to integrate association rules to intrusion detection to design and implement a network intrusion detection system. Our technique is used to generate attack rules that will detect the attacks in network audit data using anomaly detection. This shows that the modified association rules algorithm is capable of detecting network intrusions. The KDD dataset which is freely available online is used for our experimentation and results are compared. Our intrusion detection system using association rule mining is able to generate attack rules that will detect the attacks in network audit data using anomaly detection, while maintaining a low false positive rate.

  2. A LFP-tree based method for association rules mining in telecommunication alarm correlation analysis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The mining of association rules is one of the primary methods used in telecommunication alarm correlation analysis,of which the alarm databases are very large.The efficiency of the algorithms plays an important role in tackling with large datasets. The classical frequent pattern growth(FP-growth) algorithm can produce a large number of conditional pattern trees which made it difficult to mine association rules in are telecommunication environment.In this paper,an algorithm based on layered frequent pattern tree(LFP-tree) is proposed for mining frequent patterns. Efficiency of this alagorithm is achieved with following techniques:1) All the frequent patterns are condensed into a layered structure,which can save memory time but also be very useful for updating the alarm databases.2) Each alarm item can be viewed as a triple,in which t is a Boolean vaviable that shows the item frequent or not.3) Deleting infrequent items with dynamic pruning can avoid produce conditional pattern sets. Simulation and analysis of algorithm show that it is a valid method with better time and space efficiency,which is adapted to mine association rules in telecommunication alarm correlation analysis.

  3. A pragmatic approach on association rule mining and its effective utilization in large databases

    Directory of Open Access Journals (Sweden)

    Biswaranjan Nayak

    2012-05-01

    Full Text Available This paper deals with the effective utilization of association rule mining algorithms in large databases used for especially business organizations where the amount of transactions and items plays a crucial role for decision making. Frequent item-set generation and the creation of strong association rules from the frequent item-set patterns are the two basic steps in association rule mining. We have taken suitable illustration of market basket data for generating different item-set frequent patterns and association rule generation through this frequent pattern by the help of Apriori Algorithm and taken the same illustration for FP-Growth association rule mining and a FP-Growth Tree has been constructed for frequent item-set generation and from that strong association rules have been created. For performance study of Apriori and FP-Tree algorithms, experiments have been performed. The customer purchase behaviour i.e. seen in the food outlet environments is mimicked in these transactions. By using the synthetic data generation process, the observations has been plotted in the graphs by taking minimum support count with respect to execution time. From the graphs it has that as the minimum support values decrease, the execution times algorithms increase exponentially which is happened due to decrease in the minimum support threshold values make the number of item-sets in the output to be exponentially increased. It has been established from the graphs that the performance of FP-Growth is better than Apriori algorithm for all problem sizes with factor 2 for high minimum support values to very low level support magnitude.

  4. Comparative Study of Improved Association Rules Mining Based On Shopping System

    Directory of Open Access Journals (Sweden)

    Tang Zhi-hang

    2016-01-01

    Full Text Available Data mining is a process of discovering fascinating designs, new instructions and information from large amount of sales facts in transactional and interpersonal catalogs. The main purpose of this function is to find frequent patterns, associations and relationship between various database using different Algorithms. Association rule mining (ARM is used to improve decisions making in the applications. ARM became essential in an information and decision-overloaded world. They changed the way users make decisions, and helped their creators to increase revenue at the same time. Bringing ARM to a broader audience is essential in order to popularize them beyond the limits of scientific research and high technology entrepreneurship. It will be able to expand and apply effective marketing strategies and in disease identification frequent patterns are generated to discover the frequently occur diseases in a definite area. The conclusion in all applications is some kind of association rules (AR that are useful for efficient decision making.

  5. Design a Weight Based sorting distortion algorithm using Association rule Hiding for Privacy Preserving Data mining

    Directory of Open Access Journals (Sweden)

    R.Sugumar

    2011-12-01

    Full Text Available The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong an efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Weight Based Sorting Distortion (WBSD algorithm is to distort certain data which satisfies a particular sensitive rule. Then hide those transactions which support a sensitive rule and assigns them a priority and sorts them in ascending order according to the priority value of each rule. Then it uses these weights to compute the priority value for each transaction according to how weak the rule is that a transaction supports. Data distortion is one of the important methods to avoid this kind of scalability issues

  6. Recommending new items to customers : A comparison between Collaborative Filtering and Association Rule Mining

    OpenAIRE

    Sohlberg, Henrik

    2015-01-01

    E-commerce is an ever growing industry as the internet infrastructure continues to evolve. The benefits from a recommendation system to any online retail store are several. It can help customers to find what they need as well as increase sales by enabling accurate targeted promotions. Among many techniques that can form recommendation systems, this thesis compares Collaborative Filtering against Association Rule Mining, both implemented in combination with clustering. The suggested implementa...

  7. Efficient Mining of Association Rules by Reducing the Number of Passes over the Database

    Institute of Scientific and Technical Information of China (English)

    李庆忠; 王海洋; 闫中敏; 马绍汉

    2001-01-01

    This paper introduces a new algorithm of mining association rules.The algorithm RP counts the itemsets with different sizes in the same pass of scanning over the database by dividing the database into m partitions. The total number of passes over the database is only (k+2m-2)/m, where k is the longest size in the itemsets. It is much less than k.

  8. Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Noha Negm

    2013-09-01

    Full Text Available The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

  9. Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms

    Directory of Open Access Journals (Sweden)

    Puttegowda D

    2010-07-01

    Full Text Available The information age has seen most of the activities generating huge volumes of data. The explosive growth of business, scientific and government databases sizes has far outpaced our ability to interpret and digest the stored data. This has created a need for new generation tools and techniques for automated and intelligent database analysis. These tools and techniques are the subjects of the rapidly emerging field of data mining. One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates in the database of transactions. To prune the exponentially large space of candidates, most existing algorithms consider only those candidates that have a user defined minimum support. Even with the pruning, the task of finding all association rules requires a lot of computation power and memory. Parallel computers offer a potential solution to the computation requirement of this task, provided efficient and scalable parallel algorithms can be designed. In this paper, we have implemented Sequential and Parallel mining of Association Rules using Apriori algorithms and evaluated the performance of both algorithms.

  10. A novel biclustering approach to association rule mining for predicting HIV-1-human protein interactions.

    Directory of Open Access Journals (Sweden)

    Anirban Mukhopadhyay

    Full Text Available Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1-human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1-human interaction network. Novel HIV-1-human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed.

  11. Association rule mining on grid monitoring data to detect error sources

    CERN Document Server

    Maier, G; Kranzlmueller, D; Gaidioz, B

    2010-01-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability

  12. Association rule mining on grid monitoring data to detect error sources

    Science.gov (United States)

    Maier, Gerhild; Schiffers, Michael; Kranzlmueller, Dieter; Gaidioz, Benjamin

    2010-04-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information - expressed by association rules - is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability.

  13. An Overview of Secure Mining of Association Rules in Horizontally Distributed Databases

    Directory of Open Access Journals (Sweden)

    Sonal Patil

    2015-10-01

    Full Text Available In this paper, propose a protocol for secure mining of association rules in horizontally distributed databases. Now a day the current leading protocol is Kantarcioglu and Clifton. This protocol is based on the Fast Distributed Mining (FDM algorithm which is an unsecured distributed version of the Apriori algorithm. The main ingredients in this protocol are two novel secure multi-party algorithms 1. That computes the union of private subsets that each of the interacting players hold, and 2. Tests the inclusion of an element held by one player in a subset held by another. In this protocol offers enhanced privacy with respect to the other one. Differences in this protocol, it is simpler and is significantly more efficient in terms of communication rounds, communication cost and computational cost [1].

  14. A Business Intelligence Model to Predict Bankruptcy using Financial Domain Ontology with Association Rule Mining Algorithm

    CERN Document Server

    Martin, A; Venkatesan, Dr V Prasanna

    2011-01-01

    Today in every organization financial analysis provides the basis for understanding and evaluating the results of business operations and delivering how well a business is doing. This means that the organizations can control the operational activities primarily related to corporate finance. One way that doing this is by analysis of bankruptcy prediction. This paper develops an ontological model from financial information of an organization by analyzing the Semantics of the financial statement of a business. One of the best bankruptcy prediction models is Altman Z-score model. Altman Z-score method uses financial rations to predict bankruptcy. From the financial ontological model the relation between financial data is discovered by using data mining algorithm. By combining financial domain ontological model with association rule mining algorithm and Zscore model a new business intelligence model is developed to predict the bankruptcy.

  15. A Business Intelligence Model to Predict Bankruptcy using Financial Domain Ontology with Association Rule Mining Algorithm

    Directory of Open Access Journals (Sweden)

    A Martin

    2011-05-01

    Full Text Available Today in every organization financial analysis provides the basis for understanding and evaluating the results of business operations and delivering how well a business is doing. This means that the organizations can control the operational activities primarily related to corporate finance. One way that doing this is by analysis of bankruptcy prediction. This paper develops an ontological model from financial information of an organization by analyzing the Semantics of the financial statement of a business. One of the best bankruptcy prediction models is Altman Z-score model. Altman Z-score method uses financial rations to predict bankruptcy. From the financial ontological model the relation between financial data is discovered by using data mining algorithm. By combining financial domain ontological model with association rule mining algorithm and Z-score model a new business intelligence model is developed to predict the bankruptcy.

  16. ADAPTIVE ASSOCIATION RULE MINING BASED CROSS LAYER INTRUSION DETECTION SYSTEM FOR MANET

    Directory of Open Access Journals (Sweden)

    V. Anjana Devi

    2011-10-01

    Full Text Available Mobile ad-hoc wireless networks (MANET are a significant area of research with many applications.MANETs are more vulnerable to malicious attack. Authentication and encryption techniques can be usedas the first line of defense for reducing the possibilities of attacks. Alternatively, these approaches haveseveral demerits and designed for a set of well known attacks. This paper proposes a cross layer intrusiondetection architecture to discover the malicious nodes and different types of DoS attacks by exploiting theinformation available across different layers of protocol stack in order to improve the accuracy ofdetection. This approach uses a fixed width clustering algorithm for efficient detection of the anomalies inthe MANET traffic and also for detecting newer attacks generated . In the association process, theAdaptive Association Rule mining algorithm is utilized. This helps to overcome the more time taken forperforming the association process.

  17. Adaptive Interval Configuration to Enhance Dynamic Approach for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    Most proposed algorithms for mining association rules follow the conventional le vel-wise approach. The dynamic candidate generation idea introduced in the dyna mic itemset counting (DIC) a lgorithm broke away from the level-wise limitation which could find the large i t emsets using fewer passes over the database than level-wise algorithms. However , the dynamic approach is very sensitive to the data distribution of the database and it requires a proper interval size. In this paper an optimization technique named adaptive interval configuration (AIC) has been developed to enhance the d y namic approach. The AIC optimization has the following two functions. The first is that a homogeneous distribution of large itemsets over intervals can be achie ved so that less unnecessary candidates could be generated and less database sca nning passes are guaranteed. The second is that the near optimal interval size c ould be determined adaptively to produce the best response time. We also develop ed a candidate pruning technique named virtual partition pruning to reduce the s ize-2 candidate set and incorporated it into the AIC optimization. Based on the optimization technique, we proposed the efficient AIC algorithm for mining asso c iation rules. The algorithms of AIC, DIC and the classic Apriori were implemente d on a Sun Ultra Enterprise 4000 for performance comparison. The results show th at the AIC performed much better than both DIC and Apriori, and showed a strong robustness.

  18. 一种新的关联规则挖掘的模型%A New Model of Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    苏毅娟; 严小卫

    2001-01-01

    A new algorithm for mining positive and negative association rules is presented. A new confi-dence is constructed to measure the uncertainty of an association rule based on the probability theory and Piatetsky-Shapiro′s model.

  19. [A method to enhance user experience of EMR based on mining association rules of incremental updating data].

    Science.gov (United States)

    Zhou, Bao-zhuo; Li, Chuan-fu; Dai, Liang-liang; Feng, Huan-qing

    2009-03-01

    The user experience (EX) of current Electronic Medical Record systems (EMR) is needed to improve. This paper proposed a new method to enhance EX of EMR. Firstly, system template and text characterization are used to make the EMR data structured. Then, the structured date are mined based on mining the association rules of incremental updating data to find the association of the elements of template of EMR and the values of elements. Finally, with the help of mined results, the users of EMR are able to input data effectively and quickly.

  20. Generalized Multidimensional Association Rules

    Institute of Scientific and Technical Information of China (English)

    周傲英; 周水庚; 金文; 田增平

    2000-01-01

    The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowl-edge discovery from large-scale databases. And there has been a spurt of research activities around this problem. Traditional association rule mining is limited to intra-transaction. Only recently the concept of N-dimensional inter-transaction as-sociation rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extends Lu's definition of NDITAR based on the analysis of its limitations, and the general-ized multidimensional association rule (GMDAR) is subsequently introduced, which is more general, flexible and reasonable than NDITAR.

  1. An association rule mining-based framework for understanding lifestyle risk behaviors.

    Directory of Open Access Journals (Sweden)

    So Hyun Park

    Full Text Available OBJECTIVES: This study investigated the prevalence and patterns of lifestyle risk behaviors in Korean adults. METHODS: We utilized data from the Fourth Korea National Health and Nutrition Examination Survey for 14,833 adults (>20 years of age. We used association rule mining to analyze patterns of lifestyle risk behaviors by characterizing non-adherence to public health recommendations related to the Alameda 7 health behaviors. The study variables were current smoking, heavy drinking, physical inactivity, obesity, inadequate sleep, breakfast skipping, and frequent snacking. RESULTS: Approximately 72% of Korean adults exhibited two or more lifestyle risk behaviors. Among women, current smoking, obesity, and breakfast skipping were associated with inadequate sleep. Among men, breakfast skipping with additional risk behaviors such as physical inactivity, obesity, and inadequate sleep was associated with current smoking. Current smoking with additional risk behaviors such as inadequate sleep or breakfast skipping was associated with physical inactivity. CONCLUSION: Lifestyle risk behaviors are intercorrelated in Korea. Information on patterns of lifestyle risk behaviors could assist in planning interventions targeted at multiple behaviors simultaneously.

  2. Linguistic Valued Association Rules

    Institute of Scientific and Technical Information of China (English)

    LU Jian-jiang; QIAN Zuo-ping

    2002-01-01

    Association rules discovering and prediction with data mining method are two topics in the field of information processing. In this paper, the records in database are divided into many linguistic values expressed with normal fuzzy numbers by fuzzy c-means algorithm, and a series of linguistic valued association rules are generated. Then the records in database are mapped onto the linguistic values according to largest subject principle, and the support and confidence definitions of linguistic valued association rules are also provided. The discovering and prediction methods of the linguistic valued association rules are discussed through a weather example last.

  3. A Set Operation Based Algorithm for Association Rules Mining%基于集合运算的关联规则采掘算法

    Institute of Scientific and Technical Information of China (English)

    铁治欣; 陈奇; 俞瑞钊

    2001-01-01

    Mining association rules are an important data mining problem. In this paper ,an association rules mining algorithm,ARDBSO,which is based on set operation,is given. It can find all large itemsets in the database while only scan the database once. So,the time for I/O is reduced enormously and the efficiency of ARDBSO is improved. The experiments show that the efficiency of ARDBSO is 80~ 150times of Apriori's.

  4. Creating a prediction model for weather forecasting based on artificial neural network supported by association rules mining

    OpenAIRE

    Kadlec, Jakub

    2016-01-01

    This diploma thesis focuses on creating a predictive model for the purpose of automated weather predictions based on a neural network. Attributes for input layer of the network are selected through association rules mining using the 4ft-Miner procedure. First part of the thesis consists of collection of theoretical knowledge enabling the creation of such predictive model, whereas the second part describes the creation of the model itself using the CRISP-DM methodology. Final part of the thesi...

  5. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    Science.gov (United States)

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil

    2015-10-01

    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  6. Business rule mining from spreadsheets

    NARCIS (Netherlands)

    Roy, S.

    2015-01-01

    Business rules represent the knowledge that guides the operations of a business organization. They are implemented in software applications used by organizations, and the activity of extracting them from software is known as business rule mining. It has various purposes amongst which migration and g

  7. A Novel Association Rule Mining with IEC Ratio Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers

    Directory of Open Access Journals (Sweden)

    Ms. Kanika Shrivastava

    2012-06-01

    Full Text Available Dissolved gas Analysis (DGA is the most importantcomponent of finding fault in large oil filledtransformers. Early detection of incipient faults intransformers reduces costly unplanned outages. Themost sensitive and reliable technique for evaluatingthe core of transformer is dissolved gas analysis. Inthis paper we evaluate different transformercondition on different cases. This paper usesdissolved gas analysis to study the history ofdifferent transformers in service, from whichdissolved combustible gases (DCG in oil are usedas a diagnostic tool for evaluating the condition ofthe transformer. Oil quality and dissolved gassestests are comparatively used for this purpose. In thispaper we present a novel approach which is basedon association rule mining and IEC ratio method.By using data mining concept we can categorizefaults based on single and multiple associations andalso map the percentage of fault. This is an efficientapproach for fault diagnosis of power transformerswhere we can find the fault in all obviousconditions. We use java for programming andcomparative study.

  8. GenMiner: mining non-redundant association rules from integrated gene expression data and annotations

    OpenAIRE

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2008-01-01

    International audience GenMiner is an implementation of association rule discovery dedicated to the analysis of genomic data. It allows the analysis of datasets integrating multiple sources of biological data represented as both discrete values, such as gene annotations, and continuous values, such as gene expression measures. GenMiner implements the new NorDi (normal discretization) algorithm for normalizing and discretizing continuous values and takes advantage of the JClose algorithm to...

  9. Multi-objective Numeric Association Rules Mining via Ant Colony Optimization for Continuous Domains without Specifying Minimum Support and Minimum Confidence

    Directory of Open Access Journals (Sweden)

    Parisa Moslehi

    2011-09-01

    Full Text Available Currently, all search algorithms which use discretization of numeric attributes for numeric association rule mining, work in the way that the original distribution of the numeric attributes will be lost. This issue leads to loss of information, so that the association rules which are generated through this process are not precise and accurate. Based on this fact, algorithms which can natively handle numeric attributes would be interesting. Since association rule mining can be considered as a multi-objective problem, rather than a single objective one, a new multi-objective algorithm for numeric association rule mining is presented in this paper, using Ant Colony Optimization for Continuous domains (ACOR. This algorithm mines numeric association rules without any need to specify minimum support and minimum confidence, in one step. In order to do this we modified ACOR for generating rules. The results show that we have more precise and accurate rules after applying this algorithm and the number of rules is more than the ones resulted from previous works.

  10. A PROPOSAL OF FUZZY MULTIDIMENSIONAL ASSOCIATION RULES

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2006-01-01

    Full Text Available Association rules that involve two or more dimensions or predicates can be referred as multidimensional association rules. Rather than searching for frequent itemsets (as is done in mining single-dimensional association rules, in multidimensional association rules, we search for frequent predicate sets. In general, there are two types of multidimensional association rules, namely interdimension association rules and hybrid-dimension association rules. Interdimension association rules are multidimensional association rules with no repeated predicates. This paper introduces a method for generating interdimension association rules. A more meaningful association rules can be provided by generalizing crisp value of attributes to be fuzzy value. To generate the multidimensional association rules implying fuzzy value, this paper introduces an alternative method for mining the rules by searching for the predicate sets.

  11. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

    KAUST Repository

    Boudellioua, Imane

    2016-07-08

    The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.

  12. Analysis of Medical Domain Using CMARM: Confabulation Mapreduce Association Rule Mining Algorithm for Frequent and Rare Itemsets

    Directory of Open Access Journals (Sweden)

    Dr. Jyoti Gautam

    2015-11-01

    Full Text Available In Human Life span, disease is a major cause of illness and death in the modern society. There are various factors that are responsible for diseases like work environment, living and working conditions, agriculture and food production, housing, unemployment, individual life style etc. The early diagnosis of any disease that frequently and rarely occurs with the growing age can be helpful in curing the disease completely or to some extent. The long-term prognosis of patient records might be useful to find out the causes that are responsible for particular diseases. Therefore, human being can take early preventive measures to minimize the risk of diseases that may supervene with the growing age and hence increase the life expectancy chances. In this paper, a new CMARM: Confabulation-MapReduce based association rule mining algorithm is proposed for the analysis of medical data repository for both rare and frequent itemsets using an iterative MapReduce based framework inspired by cogency. Cogency is the probability of the assumed facts being true if the conclusion is true, means it is based on pairwise item conditional probability, so the proposed algorithm mine association rules by only one pass through the file. The proposed algorithm is also valuable for dealing with infrequent items due to its cogency inspired approach.

  13. Mining Association Rules in Big Data for E-healthcare Information System

    Directory of Open Access Journals (Sweden)

    N. Rajkumar

    2014-08-01

    Full Text Available Big data related to large volume, multiple ways of growing data sets and autonomous sources. Now the big data is quickly enlarged in many advanced domains, because of rapid growth in networking and data collection. The study is defining the E-Healthcare Information System, which needs to make logical and structural method of approaching the knowledge. And also effectually preparing and controlling the data generated during the diagnosis activities of medical application through sharing information among E-Healthcare Information System devices. The main objective is, A E-Healthcare Information System which is extensive, integrated knowledge system designed to control all the views of a hospital operation, such as medical data’s, administrative, financial, legal information’s and the corresponding service processing. At last the analysis of result will be generated using Association Mining Techniques which processed from big data of hospital information datasets. Finally mining techniques result could be evaluated in terms of accuracy, precision, recall and positive rate.

  14. A FUZZY FREQUENT PATTERN-GROWTH ALGORITHM FOR ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    A.H.M. Sajedul Hoque

    2015-09-01

    Full Text Available Currently the number of tuples of a database of an enterprise is increasing significantly. Sometimes the associations among attributes in tuples are essential to make plan or decision for future for higher authority of an organization. The quantitative attributes in tuples must be split into two or more intervals. Due to the over and under-estimation problem closer to the boundary of classical logic, fuzzy logic has been used to make intervals for quantitative attribute. These fuzzy intervals are based on the generation of more realistic associations. This paper focuses on implication of association rules among the quantitative attributes and categorical attribute of a database employing fuzzy logic and Frequent Pattern (FP - Growth algorithm. The effectiveness of the method has been justified over a sample database.

  15. Apriori and Ant Colony Optimization of Association Rules

    OpenAIRE

    Anshuman Singh Sadh; Nitin Shukla

    2013-01-01

    Association Rule mining is one of the important and most popular data mining technique. Association rule mining can be efficiently used in any decision making processor decision based rule generation. In this paper we present an efficient mining based optimization techniques for rule generation. By using apriori algorithm we find the positive and negative association rules. Then we apply ant colony optimization algorithm (ACO) for optimizing the association rules. Our results show the effecti...

  16. Discovering fuzzy spatial association rules

    Science.gov (United States)

    Kacar, Esen; Cicekli, Nihan K.

    2002-03-01

    Discovering interesting, implicit knowledge and general relationships in geographic information databases is very important to understand and use these spatial data. One of the methods for discovering this implicit knowledge is mining spatial association rules. A spatial association rule is a rule indicating certain association relationships among a set of spatial and possibly non-spatial predicates. In the mining process, data is organized in a hierarchical manner. However, in real-world applications it may not be possible to construct a crisp structure for this data, instead some fuzzy structures should be used. Fuzziness, i.e. partial belonging of an item to more than one sub-item in the hierarchy, could be applied to the data itself, and also to the hierarchy of spatial relations. This paper shows that, strong association rules can be mined from large spatial databases using fuzzy concept and spatial relation hierarchies.

  17. Hiding Sensitive Association Rule Using Clusters of Sensitive Association Rule

    Directory of Open Access Journals (Sweden)

    Sanjay keer

    2012-06-01

    Full Text Available The security of the large database that contains certain crucialinformation, it will become a serious issue when sharing data to thenetwork against unauthorized access. Association rules hidingalgorithms get strong and efficient performance for protectingconfidential and crucial data. The objective of the proposedAssociation rule hiding algorithm for privacy preserving datamining is to hide certain information so that they cannot bediscovered through association rule mining algorithm. The mainapproached of association rule hiding algorithms to hide somegenerated association rules, by increase or decrease the support orthe confidence of the rules. The association rule items whether inLeft Hand Side (LHS or Right Hand Side (RHS of the generatedrule, that cannot be deduced through association rule miningalgorithms. The concept of Increase Support of Left Hand Side(ISL algorithm is decrease the confidence of rule by increase thesupport value of LHS. It doesn’t work for both side of rule. Itworks only for modification of LHS. In this paper, we propose aheuristic algorithm named ISLRC (Increase Support of L.H.S. itemof Rule Clusters based on ISL approach to preserve privacy forsensitive association rules in database. Proposed algorithmmodifies fewer transactions and hides many rules at a time. Theefficiency of the proposed algorithm is compared with ISLalgorithms.

  18. Customer Requirements Mapping Method Based on Association Rule Mining for Mass Customization

    Institute of Scientific and Technical Information of China (English)

    XIA Shi-sheng; WANG Li-ya

    2008-01-01

    Customer requirements analysis is the key step for product variety design of mass customiza-tion(MC). Quality function deployment (QFD) is a widely used management technique for understanding thevoice of the customer (VOC), however, QFD depends heavily on human subject judgment during extractingcustomer requirements and determination of the importance weights of customer requirements. QFD pro-cess and related problems are so complicated that it is not easily used. In this paper, based on a generaldata structure of product family, generic bill of material (CBOM), association rules analysis was introducedto construct the classification mechanism between customer requirements and product architecture. The newmethod can map customer requirements to the items of product family architecture respectively, accomplishthe mapping process from customer domain to physical domain directly, and decrease mutual process betweencustomer and designer, improve the product design quality, and thus furthest satisfy customer needs. Finally,an example of customer requirements mapping of the elevator cabin was used to illustrate the proposed method.

  19. Pattern Discovery Using Association Rules

    Directory of Open Access Journals (Sweden)

    Ms Kiruthika M,

    2011-12-01

    Full Text Available The explosive growth of Internet has given rise to many websites which maintain large amount of user information. To utilize this information, identifying usage pattern of users is very important. Web usage mining is one of the processes of finding out this usage pattern and has many practical applications. Our paper discusses how association rules can be used to discover patterns in web usage mining. Our discussion starts with preprocessing of the given weblog, followed by clustering them and finding association rules. These rules provide knowledge that helps to improve website design, in advertising, web personalization etc.

  20. Using Association Rule Mining for Extracting Product Sales Patterns in Retail Store Transactions

    OpenAIRE

    Pramod Prasad,; Dr. Latesh Malik

    2011-01-01

    Computers and software play an integral part in the working of businesses and organisations. An immense amount of data is generated with the use of software. These large datasets need to be analysed for useful information that would benefit organisations, businesses and individuals by supporting decision making and providing valuable knowledge. Data mining is an approach that aids in fulfilling this requirement. Data mining is the process of applying mathematical, statistical and machine lear...

  1. Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor

    OpenAIRE

    Dipankar SENGUPTA; Sood, Meemansa; Vijayvargia, Poorvika; Hota, Sunil; Naik, Pradeep K

    2013-01-01

    Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis & treatment of disease from the clinical dataset is therefore increasingly becoming necessary. Aim of this study was to assess the applicability of knowledge discovery in brain tumor data warehouse, applying data mining techniques for investigation of clinical parameters that...

  2. Mining Algorithm of Normalized Weighted Association Rules in Database%数据库中标准加权关联规则挖掘算法

    Institute of Scientific and Technical Information of China (English)

    杜鹢; 藏海霞

    2001-01-01

    在原有的关联规则挖掘算法的研究中,认为所有的属性的重要程度相同,提出标准加权关联规则的挖掘算法,能够解决因属性重要程度不一样带来的问题。%Previous algorithms on mining association rules maintain that theimportance of each item in database is equal. This paper presents a method of mining weighted association rules in database, which can solve the problems caused by the unequal importance of the items.

  3. DETERMINING THE CORE PART OF SOFTWARE DEVELOPMENT CURRICULUM APPLYING ASSOCIATION RULE MINING ON SOFTWARE JOB ADS IN TURKEY

    Directory of Open Access Journals (Sweden)

    Ilkay Yelmen

    2016-01-01

    Full Text Available The software technology is advancing rapidly over the years. In order to adapt to this advancement, the employees on software development should renew themselves consistently. During this rapid change, it is vital to train the proper software developer with respect to the criteria desired by the industry. Therefore, the curriculum of the programs related to software development at the universities should be revised according to software industry requirements. In this study, the core part of Software Development Curriculum is determined by applying association rule mining on Software Job ads in Turkey. The courses in the core part are chosen with respect to IEEE/ACM computer science curriculum. As a future study, it is also important to gather the academic personnel and the software company professionals to determine the compulsory and elective courses so that newly graduated software dev

  4. Association Rule Discovery and Its Applications

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Data mining, i.e. , mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed peoples ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining.The association mining task consists of identifying the frequent item sets and then forming conditional implication rule among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.``

  5. Discovering Non-Redundant Association Rules using MinMax Approximation Rules

    Directory of Open Access Journals (Sweden)

    R. Vijaya Prakash

    2012-12-01

    Full Text Available Frequent pattern mining is an important area of data mining used to generate the Association Rules. The extracted Frequent Patterns quality is a big concern, as it generates huge sets of rules and many of them are redundant. Mining Non-Redundant Frequent patterns is a big concern in the area of Association rule mining. In this paper we proposed a method to eliminate the redundant Frequent patterns using MinMax rule approach, to generate the quality Association Rules.

  6. Studying Co-evolution of Production and Test Code Using Association Rule Mining

    NARCIS (Netherlands)

    Lubsen, Z.; Zaidman, A.; Pinzger, M.

    2009-01-01

    Long version of the short paper accepted for publication in the proceedings of the 6th International Working Conference on Mining Software Repositories (MSR 2009). Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on

  7. Interestingness of association rules in data mining: Issues relevant to e-commerce

    Indian Academy of Sciences (India)

    Rajesh Natarajan; B Shekar

    2005-04-01

    The ubiquitous low-cost connectivity synonymous with the internet has changed the competitive business environment by dissolving traditional sources of competitive advantage based on size, location and the like. In this level playing field, firms are forced to compete on the basis of knowledge. Data mining tools and techniques provide e-commerce applications with novel and significant knowledge. This knowledge can be leveraged to gain competitive advantage. However, the automated nature of data mining algorithms may result in a glut of patterns – the sheer numbers of which contribute to incomprehensibility. Importance of automated methods that address this immensity problem, particularly with respect to practical application of data mining results, cannot be overstated. We first examine different approaches to address this problem citing their applicability to e-commerce whenever appropriate. We then provide a detailed survey of one important approach, namely interestingness measure, and discuss its relevance in e-commerce applications such as personalization in recommender systems. Study of current literature brings out important issues that reveal many promising avenues for future research. We conclude by reiterating the importance of post-processing methods in data mining for effective and efficient deployment of e-commerce solutions.

  8. AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION RULES

    Directory of Open Access Journals (Sweden)

    ARKAN A. G. AL-HAMODI

    2016-03-01

    Full Text Available In mining frequent itemsets, one of most important algorithm is FP-growth. FP-growth proposes an algorithm to compress information needed for mining frequent itemsets in FP-tree and recursively constructs FP-trees to find all frequent itemsets. In this paper, we propose the EFP-growth (enhanced FPgrowth algorithm to achieve the quality of FP-growth. Our proposed method implemented the EFPGrowth based on MapReduce framework using Hadoop approach. New method has high achieving performance compared with the basic FP-Growth. The EFP-growth it can work with the large datasets to discovery frequent patterns in a transaction database. Based on our method, the execution time under different minimum supports is decreased..

  9. Studying Co-evolution of Production and Test Code Using Association Rule Mining

    OpenAIRE

    Lubsen, Z.; Zaidman, A.; Pinzger, M.

    2009-01-01

    Long version of the short paper accepted for publication in the proceedings of the 6th International Working Conference on Mining Software Repositories (MSR 2009). Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on the correctness of their code. In order to achieve high quality, well-maintained tests are needed. Ideally, tests co-evolve with the production code to test changes as soon as possible. In this pap...

  10. Identifying Combinatorial Biomarkers by Association Rule Mining in the CAMD Alzheimer's Database

    OpenAIRE

    Szalkai, Balazs; Grolmusz, Vince K.; Grolmusz, Vince I.; Diseases, Coalition Against Major

    2013-01-01

    Background: The concept of combinatorial biomarkers was conceived around 2010: it was noticed that simple biomarkers are often inadequate for recognizing and characterizing complex diseases. Methods: Here we present an algorithmic search method for complex biomarkers which may predict or indicate Alzheimer's disease (AD) and other kinds of dementia. We applied data mining techniques that are capable to uncover implication-like logical schemes with detailed quality scoring. Our program SCARF i...

  11. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  12. Investment risk rules mining in insurance business data with association rules method%用关联规则方法挖掘保险业务数据中的投资风险规则

    Institute of Scientific and Technical Information of China (English)

    田金兰; 张素琴; 黄刚

    2001-01-01

    Insurance companies need to find the rules for applications and claims in insurance business data to make a profit. Mining for association rules is a simple and very useful data mining method. This paper introduces the definition of the association rule and it's four attributes: confidence, support, expected confidence and lift. Then some association rules are mined with SGI (Silicon Graphics Incorporati on) Mineset, a data mining tool. Some risk control rules are given that play important roles in the insurance company business. The association rules are widely used in the field of banking, electronic communication, commerce, etc.%如何找出保险业务数据中有关投保和理赔的规律是保险公司能否提高盈利的至关重要的问题。关联规则发现是数据挖掘技术的一种简单又很实用的方法。文章首先介绍了关联规则的定义以及关联规则的4个属性: 可信度、支持度、期望可信度和作用度。然后讲述了如何用SGI公司的数据挖掘工具Mineset在保险业务数据中发现关联规则,从而得出一些对保险公司起指导作用的控制投资风险的规则。关联规则还可广泛用于银行、电信、商业等其它领域。

  13. 关联规则快速挖掘在CRM中的应用%Fast Association Rule Mining in CRM

    Institute of Scientific and Technical Information of China (English)

    王扶东; 李洁; 薛劲松; 朱云龙

    2004-01-01

    交叉销售分析是CRM中的主要分析内容之一.提出了一种前件固定、后件受约束的关联规则快速挖掘算法,该算法的挖掘结果可以帮助企业利用销售情况好的产品促进其他产品的销售;同时提出了一种后件固定、前件受约束的关联规则快速挖掘算法,该算法的挖掘结果可以有效地帮助企业利用交叉销售方法为新产品开拓市场.仿真结果表明,这两种算法能够帮助企业快速准确地得到所需的信息.%The analysis of cross-selling is one of the important parts in analytical CRM. We present a constraint-based association rules mining algorithm AApriori with the specified antecedent and the constrained consequent. The outcome of this algorithm can help enterprises use selling products to popularize products that are unpopular. At the same time, an algorithm CApriori that the consequent is specified and the antecedent is constraind is presented.It can effectively support enterprises to exploit the market of new products. The evaluation demonstrated that the algorithm AApriori and CApriori could quickly get exact information that the enterprise wants.

  14. RESEARCH ON ALL NEGATIVE ASSOCIATION RULES MINING IN A DATABASE%数据库中全部负关联规则挖掘研究

    Institute of Scientific and Technical Information of China (English)

    李红; 宗瑜; 解浚源

    2011-01-01

    数据库中关联规则信息是知识的表述形式之一,负关联规则挖掘是数据库关联信息挖掘的重要研究内容,具有广泛的应用范围.现有的挖掘方法不能获取数据库中全部的负关联规则,考虑从数据库中提取全部的负关联规则,通过(1)扫描数据库建立数据库频繁模式树DFP-tree( Database Frequent Pattern tree);(2)在精简DFP-tree的基础上获取全部极小非频繁项集ASI;(3)对ASI中极大频繁项集的向上闭包,得到全部非频繁项集;(4)在此基础上采用相关度作为规则兴趣度量之一提取负关联规则.理论和实验表明算法的正确性和效率.%In a database, associated rule information is one of the representation formats for knowledge. Negative association rule mining is so important to study in database association information mining that it bears wide application value. Existing mining approaches can not obtain all negative rules from a database. The paper considers to extract all negative association rules from a database through: (1) scanning the database to build a database frequent pattern tree called DFP-tree; (2) acquiring based on pruning the DFP-tree all small infrequent itemsets; (3) acquiring via upward closure packets of large frequent itemsets in ASI all infrequent itemsets; (4) based on the previous 3 steps adopting correlation metric as one of rule interest measurements to extract negative association rules. Theories and experiments validate the correctness and efficiency of the presented algorithm.

  15. 基于领域知识的冗余关联规则消除算法%Elimination algorithm of redundant rules in association rules mining based on domain knowledge

    Institute of Scientific and Technical Information of China (English)

    张晶; 张斌; 胡学钢

    2011-01-01

    Many association rule mining algorithms have been developed to extract interesting patterns from large databases. However, a large amount of knowledge explicitly represented in domain knowledge(DK) has not been used to reduce the number of association rules. A significant number of well known dependences are unnecessarily extracted by association rule mining algorithrns, which results in the generation of hundreds or thousands of non-interesting association rules. This paper presents a DKARM algorithm, which takes both database and relative DK into account, to eliminate all associations explicitly represented in DK. Experiments on the proposed algorithm show the significant reduction of the number of rules and the elimination of non-interesting rules.%关联规则挖掘算法用于从大型数据库中提取感兴趣的规则,然而,在领域知识中已经能清晰表示的知识并没有被充分考虑,关联规则挖掘算法提取的规则中包含了大量已知的关联性,从而产生了很多冗余规则.文章提出一种算法DKARM,同时考虑了数据本身以及相关的领域知识,以消除在领域知识中清晰表示的已知关联性.实验表明,该算法合理消除了冗余规则,有效降低了规则数目.

  16. A Quick Algorithm for Mining Exceptional Rules

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Exceptional rules are often ignored because of their small support. However, they have high confidence, so they are useful sometimes. A new algorithm for mining exceptional rules is presented, which creates a large itemset from a relatively small database and scans the whole database only one time to generate all exceptional rules. This algorithm is proved to be quick and effective through its application in a mushroom database.

  17. a Reliability Evaluation System of Association Rules

    Science.gov (United States)

    Chen, Jiangping; Feng, Wanshu; Luo, Minghai

    2016-06-01

    In mining association rules, the evaluation of the rules is a highly important work because it directly affects the usability and applicability of the output results of mining. In this paper, the concept of reliability was imported into the association rule evaluation. The reliability of association rules was defined as the accordance degree that reflects the rules of the mining data set. Such degree contains three levels of measurement, namely, accuracy, completeness, and consistency of rules. To show its effectiveness, the "accuracy-completeness-consistency" reliability evaluation system was applied to two extremely different data sets, namely, a basket simulation data set and a multi-source lightning data fusion. Results show that the reliability evaluation system works well in both simulation data set and the actual problem. The three-dimensional reliability evaluation can effectively detect the useless rules to be screened out and add the missing rules thereby improving the reliability of mining results. Furthermore, the proposed reliability evaluation system is applicable to many research fields; using the system in the analysis can facilitate obtainment of more accurate, complete, and consistent association rules.

  18. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis.

    Science.gov (United States)

    Gardiner, Eleanor J; Gillet, Valerie J

    2015-09-28

    Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development

  19. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis.

    Science.gov (United States)

    Gardiner, Eleanor J; Gillet, Valerie J

    2015-09-28

    Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development

  20. Discovery of Association Rules from University Admission System Data

    OpenAIRE

    Abdul Fattah Mashat; Mohammed M. Fouad; Yu, Philip S.; Tarek F. Gharib

    2013-01-01

    Association rules discovery is one of the vital data mining techniques. Currently there is an increasing interest in data mining and educational systems, making educational data mining (EDM) as a new growing research community. In this paper, we present a model for association rules discovery from King Abdulaziz University (KAU) admission system data. The main objective is to extract the rules and relations between admission system attributes for better analysis. The model utilizes an apriori...

  1. Discovery of Association Rules from University Admission System Data

    Directory of Open Access Journals (Sweden)

    Abdul Fattah Mashat

    2013-05-01

    Full Text Available Association rules discovery is one of the vital data mining techniques. Currently there is an increasing interest in data mining and educational systems, making educational data mining (EDM as a new growing research community. In this paper, we present a model for association rules discovery from King Abdulaziz University (KAU admission system data. The main objective is to extract the rules and relations between admission system attributes for better analysis. The model utilizes an apriori algorithm for association rule mining. Detailed analysis and interpretation of the experimental results is presented with respect to admission office perspective.

  2. Apriori Association Rule Algorithms using VMware Environment

    Directory of Open Access Journals (Sweden)

    R. Sumithra

    2014-07-01

    Full Text Available The aim of this study is to carry out a research in distributed data mining using cloud platform. Distributed Data mining becomes a vital component of big data analytics due to the development of network and distributed technology. Map-reduce hadoop framework is a very familiar concept in big data analytics. Association rule algorithm is one of the popular data mining techniques which finds the relationships between different transactions. A work has been executed using weighted apriori and hash T apriori algorithms for association rule mining on a map reduce hadoop framework using a retail data set of transactions. This study describes the above concepts, explains the experiment carried out with retail data set on a VMW are environment and compares the performances of weighted apriori and hash-T apriori algorithms in terms of memory and time.

  3. On the Mining Algorithm Based on BDIF Association Rule%基于BDIF的关联规则挖掘算法研究

    Institute of Scientific and Technical Information of China (English)

    郭昌建

    2015-01-01

    This article describes research on association rule mining and classification methods of association rules, analyzes and evaluates the classic Apriori algorithm, which gives rise to an efficient frequent BDIF (Based Transactional Databases Including Frequent Item Set) algorithm. It thereby reduces scanning data block and improves algorithm efficiency by dividing data block and quickly searching for frequent item set.%阐述了关联规则挖掘的研究情况,关联规则的分类方法等,对经典Apriori算法进行了分析和评价,在此基础上提出了一种高效产生频繁集的BDIF(Based Transactional Databases Including Frequent ItemSet)算法;它通过划分数据块,快速的搜寻频繁项目集,从而减少对数据块的扫描次数,提高了算法的效率。并用BorlandC++Builder6.0开发环境来调试、验证该算法。

  4. Book Lending Data Mining Based on Association Rules%基于关联规则的图书借阅数据挖掘

    Institute of Scientific and Technical Information of China (English)

    吴玉春; 龙小建

    2016-01-01

    Based on the university libraries’ actual business needs, this article uses association rules to analyze book lending data of students in university libraries. First the article puts forward library history lending data pretreatment, including data cleaning, data integration, data transformation and transactional database construction. Then we apply MFP-Miner algorithm to the transaction database mining, aiming to excavate the association rules of lending books, providing scientific data support for lending books and books services, so as to enhance the university libraries’ service quality.%文章根据高校图书馆的实际业务需要,运用关联规则对高校图书馆学生的借阅数据进行了挖掘分析。首先将图书馆历史借阅数据进行预处理,预处理包括对数据进行清理、集成、转换以及建立事务数据库;然后利用关联规则挖掘算法(MFP-Miner算法)对事务数据库进行挖掘处理,挖掘出图书借阅的关联规则,为图书借阅、图书推荐等服务提供科学的数据支持,从而提升图书馆服务质量。

  5. Association Rule Mining Strategy Based on Three-Sectional Coding Immune Genetic Algorithm%三段式编码的改进的IGA关联规则挖掘算法

    Institute of Scientific and Technical Information of China (English)

    王晓光; 张永健

    2014-01-01

    There are some shortcomings of low mining accuracy and falling into local convergence easily in the lat-est intelligence algorithm on the mining association rules mining. To solve these problems, a TIGA strategy was pro-pose. Firstly, a three-step encoding was used to encode continuous association rule mining in order to educe the seg-mentation point of mining influence. Secondly, an immune algorithm was used to mine the association rules. A multi-dimensional mining plan was proposed based on vector distance of genetic algorithm, which can increase the diversity of population and the accuracy of mining rules. Finally, the adaptive crossover and mutation factors were uses to re-duce the interference of artificial setting parameters on the mining results. The experimental results show that, com-pared with the latest mining algorithm, the proposed algorithm has the advantages of high precision and global conver-gence based on mining association rules.%最新智能算法在关联规则挖掘上存在挖掘精度低,易陷入局部收敛,运行时间较长等弊端,针对以上问题,提出了求解连续属性关联规则挖掘的三段式的改进的免疫遗传挖掘算法( TIIGA),首先使用三段式编码方案降低分割点的选取对挖掘的影响,其次提出了基于矢量矩浓度的TIIGA的选择方案,可以提高挖掘规则的多样性和挖掘的精度,最后使用了自适应的交叉与变异因子降低人工设置参数对挖掘结果的干扰。实验结果表明,与最新智能算法相比,提出的TIIGA算法在关联规则连续属性挖掘上具有挖掘精度高、全局收敛,挖掘时间短等优势。

  6. 门诊处方药物关联的数据挖掘%Data mining the association rules in outpatient service prescriptions

    Institute of Scientific and Technical Information of China (English)

    傅翔; 杨樟卫; 陈盛新; 陈长虹; 何宇涛; 黄晓钟

    2011-01-01

    目的 对某医院门诊处方数据进行分析,挖掘处方中药物的关联规则,揭示处方模式,发现问题.方法 应用数据挖掘软件PASW(R)Modeler 13,建立Apriori关联分析模型.结果 在抽样获得的47 132张处方中,防治心血管等慢性疾病药物使用最为频繁;祛痰药、镇咳药、清热解毒中成药等与头孢菌素类抗菌药有较为明显的关联.结论 数据挖掘技术能较快速地处理和分析处方数据,反映处方模式,适用于当前药物利用研究中对大量数据的分析.%Objective To mine the association rules in and to identify the patterns of the prescription. Methods PASW Modeler 13 was applied to establish Apriori model and analyze the data. Results In 47 132 prescriptions, the drugs for prophylaxis and treatment of some chronic disease were present frequently. Expectorants, cough suppressants and prepared Chinese medicine for "Qing Re Jie Du" played dominant roles in the associations with cephalosporins. Conclusion The data mining technique was able to process and analyze prescription data effectively, which will be widely applicable to drug utilization research.

  7. Indirect associations between multiple items and a mining algorithm

    Institute of Scientific and Technical Information of China (English)

    Ni Min; Xu Xiaofei; Deng Shengchun

    2005-01-01

    Indirect association is a high level relationship between items and frequent itemsets in data. Current research approaches on indirect association mining are limited to indirect association between itempairs, which will discovertoo many rules from dataset. A formal definition of indirect association between multiple items is presented, along with an algorithm, SET-NIA,for mining this kind of indirect associations based on anti-monotonicity of indirect associations and frequent itempair support matrix. While the found rules contain same information as compared to the rules found by indirect association between itempairs mining algorithms, this notion brings space-saving in storage ofthe rules as well as superiority for human to understand and apply the rules. Experiments conducted on two real-word datasets show that SET-NIA can effectively find fewer rules than existing algorithms which mine indirect association between itempairs, the experimental results also prove that SET-NIA has better performance than existing algorithms.

  8. Secure Medical Diagnosis Using Rule Based Mining

    Science.gov (United States)

    Saleem Durai, M. A.; Sriman Narayana Iyengar, N. Ch.

    Security is the governing dynamics of all walks of life. Here we propose a secured medical diagnosis system. Certain specific rules are specified implicitly by the designer of the expert system and then symptoms for the diseases are obtained from the users and by using the pre defined confidence and support values we extract a threshold value which is used to conclude on a particular disease and the stage using Rule Mining. "THINK" CAPTCHA mechanism is used to distinguish between the human and the robots thereby eliminating the robots and preventing them from creating fake accounts and spam's. A novel image encryption mechanism is designed using genetic algorithm to encrypt the medical images thereby storing and sending the image data in a secured manner.

  9. 图像数据高置信度关联规则的提取方法研究%Research on mine high confidence association rules for multi-images

    Institute of Scientific and Technical Information of China (English)

    杜琳; 陈云亮; 谢长生; 蔡之华

    2009-01-01

    In some fields of image association rules mining,it is necessary to mine some high confident association rules,which possibly require low support.A method is proposed to mine high confident association rule,while considering support threshold.In order to figure out the problem,the approach uses bSQ (bit Sequential) grid image data format,then rules tree is adopted to avoid the cost of producing large frequent itemsets.Finally,priority in mining association rules from multi-images and image data cube are used to efficiently mine pixel rank association rules from multi-images.The experiment shows the proposed approach could efficiently mine high confident association rules from image data,further validate its feasibility.%在图像关联规则挖掘的某些领域,要求提取出具有较高置信度的关联规则,同时对支持度的要求相时较低.提出了一种在兼顾支持度的情况下挖掘出高置信度的图像关联规则的方法.为了便于有效地提取图像关联规则,使用了名为bSQ(bit Sequential)的一种栅格数据格式.而后采取"逐层搜索"的方法,建立规则树,避免了传统方法在处理低支持度时产生的大量频繁项集.最后通过多图像关联规则提取优先级和图像数据立方体等技术在多幅图像中提取基于象素级的关联规则.通过实验证明,该方法能有效地提取图像数据高置信度关联规则,方法具有可行性.

  10. CONCISE REPRESENTATIONS FOR ASSOCIATION RULES IN MULTI-LEVEL DATASETS

    Institute of Scientific and Technical Information of China (English)

    Yue XU; Gavin SHAW; Yuefeng LI

    2009-01-01

    Association rule mining plays an important role in knowledge and information discovery. Often for a dataset, a huge number of rules can be extracted, but many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant rules is a promising approach to solve this problem. However, existing work (Pasquier et al. 2005, Xu & Li 2007) is only focused on single level datasets. In this paper, we firstly present a definition for redundancy and a concise representation called Reliable basis for representing non-redundant association rules, then we propose an extension to the previous work that can remove hierarchically redundant rules from multi-level datasets. We also show that the resulting concise representation of non-redundant association rules is lossless since all association rules can be derived from the representation. Experiments show that our extension can effectively generate multilevel non-redundant rules.

  11. 基于关联规则挖掘的入侵检测算法研究%Research on the Intrusion Detection algorithm Based on the association rule mining

    Institute of Scientific and Technical Information of China (English)

    吴斌; 陆培军

    2012-01-01

    This paper analyzes the features of wireless network intrusion detection,proposed a new the association rule mining based on time windows.Theoretical analysis and experimental results show that the association rule mining algorithm is superior.It has achieved better results in the field of intrusion detection.%本文分析无线网络入侵检测的特点,提出来基于时间窗口关联规则挖掘算法。分析与实验结果表明,基于时间窗口关联规则挖掘算法在效率等方面更优越,在入侵检测中取得了较好的效果。

  12. 一种基于数据两方垂直分布的多维关联规则挖掘算法%AN ALGORITHM OF MULTIDIMENSIONAL ASSOCIATION RULES MINING BASED ON DATA VERTICALLY DISTRIBUTED IN TWO PARTS

    Institute of Scientific and Technical Information of China (English)

    李海磊; 王晗; 孔令富; 高慧星

    2014-01-01

    对垂直分布于不同站点的数据进行联合关联规则挖掘是一个重要的研究方向,然而已有的算法挖掘得到的都是全局单维关联规则,不能处理多维数据集并得到全局多维关联规则。针对此问题提出一种数据两方垂直分布条件下的多维关联规则挖掘算法TDDM(Two Part Vertically Distributed Data Mining),该算法结合数据立方体技术,直接在垂直分布于两方的数据上进行挖掘,得到多维关联规则。理论分析和实验结果表明,该算法可以有效挖掘数据两方垂直分布条件下的多维关联规则。%It is an important research direction that for the data vertically distributed in different parts the joint association rules mining is conducted.However what gained from the existing algorithms are all the global association rules in single dimension,and they cannot deal with the multidimensional data and get multidimensional global association rules.To solve this problem,we propose a new algorithm TDDM (two-part vertically distributed data mining ),it is a multidimensional association rules mining algorithm under the condition of data distributed vertically on two parts.Combining the technology of data cube,the algorithm directly mine the data distributed vertically on two parts and obtains the multidimensional association rules.Theoretical analysis and experimental results show that the TDDM can effectively mine the multidimensional association rules under the condition of data vertically distributed on two parts.

  13. State of The Art - Modern Sequential Rule Mining Techniques

    Directory of Open Access Journals (Sweden)

    Ms. Anjali Paliwal

    2014-08-01

    Full Text Available This paper is state of the art of existing sequential rule mining algorithms. Extracting sequential rule is a very popular and computationally expensive task. We also explain the fundamentals of sequential rule mining. We describe today’s approaches for sequential rule mining. From the broad variety of efficient algorithms that have been developed we will compare the most important ones. We will systematize the algorithms and analyze their performance based on both their run time performance and theoretical considerations. Their strengths and weaknesses are also investigated.

  14. Mining rare associations between biological ontologies.

    Science.gov (United States)

    Benites, Fernando; Simon, Svenja; Sapozhnikova, Elena

    2014-01-01

    The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

  15. Mining rare associations between biological ontologies.

    Directory of Open Access Journals (Sweden)

    Fernando Benites

    Full Text Available The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

  16. New game - new rules: mining in the democratic South Africa

    Energy Technology Data Exchange (ETDEWEB)

    Motlatsi, J. [National Union of Mineworkers (South Africa)

    1995-12-31

    Discusses the eight areas identified by the South African Union of Mineworkers as requiring new rules to improve safety and conditions in the South African mining industry. The areas are: improved health and safety; the elimination of racism; fair wages; decent living conditions; proper training; care for workers and areas affected by the downscaling of mining; development of an economically viable mining sector; and a mining sector run on a humane and participatory manner.

  17. Comparison of New Multilevel Association Rule Algorithm with MAFIA

    OpenAIRE

    Arpna Shrivastava; Jain, R. C.; Ajay Kumar Shrivastava

    2014-01-01

    Multilevel association rules provide the more precise and specific information. Apriori algorithm is an established algorithm for finding association rules. Fast Apriori implementation is modified to develop new algorithm for finding frequent item sets and mining multilevel association rules. MAFIA is another established algorithm for finding frequent item sets. In this paper, the performance of this new algorithm is analyzed and compared with MAFIA algorithm.

  18. Comparison of New Multilevel Association Rule Algorithm with MAFIA

    Directory of Open Access Journals (Sweden)

    Arpna Shrivastava

    2014-10-01

    Full Text Available Multilevel association rules provide the more precise and specific information. Apriori algorithm is an established algorithm for finding association rules. Fast Apriori implementation is modified to develop new algorithm for finding frequent item sets and mining multilevel association rules. MAFIA is another established algorithm for finding frequent item sets. In this paper, the performance of this new algorithm is analyzed and compared with MAFIA algorithm.

  19. New Classification Method Based on Support-Significant Association Rules Algorithm

    Science.gov (United States)

    Li, Guoxin; Shi, Wen

    One of the most well-studied problems in data mining is mining for association rules. There was also research that introduced association rule mining methods to conduct classification tasks. These classification methods, based on association rule mining, could be applied for customer segmentation. Currently, most of the association rule mining methods are based on a support-confidence structure, where rules satisfied both minimum support and minimum confidence were returned as strong association rules back to the analyzer. But, this types of association rule mining methods lack of rigorous statistic guarantee, sometimes even caused misleading. A new classification model for customer segmentation, based on association rule mining algorithm, was proposed in this paper. This new model was based on the support-significant association rule mining method, where the measurement of confidence for association rule was substituted by the significant of association rule that was a better evaluation standard for association rules. Data experiment for customer segmentation from UCI indicated the effective of this new model.

  20. An Interactive System using Association Rule Discovery for Dyeing Processing System

    Directory of Open Access Journals (Sweden)

    Rama Sree .R.J

    2011-09-01

    Full Text Available This paper uses prior domain knowledge to guide the mining of association rules in the dyeing business process environment. This approach is used in order to overcome the drawbacks of data mining using rule induction such as loss of information, discover too many obvious patterns, and mining of overwhelmed association rules. A data mining interactive rule induction algorithm is introduced to mine rules at micro levels. The mined rules describe the impact of different shades of the colours, originator of the treatment, treatment details to improve the dyeing process quality and production growth. A system was built based on this algorithm and was tested and verified on real data set in Emerald Dyeing unit, which is the leading dyeing industry in Andhra Pradesh, India. Hence, this paper contributes more on to derive simple interactive system called process model using association rule mining algorithm for dyeing processing system.

  1. Association rules of data mining in library service application research%关联规则数据挖掘在图书馆个性化服务中的应用研究

    Institute of Scientific and Technical Information of China (English)

    刘志勇; 王阿利; 魏迎; 郭轶

    2012-01-01

    随着计算机技术、网络技术以及现代通信技术的蓬勃发展,数据挖掘作为信息技术飞速发展的衍生物,为数字知识资源的有效管理提供了技术保障。文章通过对关联规则数据挖掘技术以及图书馆个性化服务相关内容的介绍,探讨了关联规则数据挖掘在数字化图书馆中的应用,说明关联规则挖掘技术在数字图书馆应用的必要性,以及在提升图书馆服务质量和服务水平方面的发挥的重要作用。%Along with the computer technology,network technology and modern communication technology rapid development,the data mining as the rapid development of information technology the derivatives,for digital intellectual resources effective management to provide technical support.Based on the association rules in data mining technology and library personalized service related content introduction,discusses the association rules in data mining in digital library application,illustrate the association rules mining technology in the digital library the application necessity,as well as in the promotion of library service quality and service level of the play important role.

  2. Recent Trends and Research Issues in Video Association Mining

    CERN Document Server

    V, Vijayakumar

    2011-01-01

    With the ever-growing digital libraries and video databases, it is increasingly important to understand and mine the knowledge from video database automatically. Discovering association rules between items in a large video database plays a considerable role in the video data mining research areas. Based on the research and development in the past years, application of association rule mining is growing in different domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. The purpose of this paper is to provide general framework of mining the association rules from video database. This article is also represents the research issues in video association mining followed by the recent trends.

  3. The research and implement on an algorithm for time-series association rules mining%一种时序关联规则挖掘算法的研究与实现

    Institute of Scientific and Technical Information of China (English)

    董辉; 方晓; 方跃胜

    2012-01-01

    针对时间序列,研究和分析时序关联规则挖掘,提出时序关联规则数据挖掘的基于滑动窗口和时序树特殊结构的新的挖掘算法,并利用该算法挖掘超过给定支持数阈值频繁时序,为用户的决策支持及趋势预测提供支持,并通过实验验证算法的有效性和实用性。%According to the time series,the research and analysis of time-series association rules mining has been discussed in this paper,and a new algorithm for time-series association rules data mining based on sliding window and time-series tree with special structure has been proposed.By using this algorithm,the mining to frequent time-series which exceeds a given support count threshold has been acted to provide the decision support and trend prediction for users.The experiment results prove the validity and practicability of this algorithm.

  4. Observational Calculi and Association Rules

    CERN Document Server

    Rauch, Jan

    2013-01-01

    Observational calculi were introduced in the 1960’s as a tool of logic of discovery. Formulas of observational calculi correspond to assertions on analysed data. Truthfulness of suitable assertions can lead to acceptance of new scientific hypotheses. The general goal was to automate the process of discovery of scientific knowledge using mathematical logic and statistics. The GUHA method for producing true formulas of observational calculi relevant to the given problem of scientific discovery was developed. Theoretically interesting and practically important results on observational calculi were achieved. Special attention was paid to formulas - couples of Boolean attributes derived from columns of the analysed data matrix. Association rules introduced in the 1990’s can be seen as a special case of such formulas. New results on logical calculi and association rules were achieved. They can be seen as a logic of association rules. This can contribute to solving contemporary challenging problems of data minin...

  5. 聚类与关联规则在信息舞弊识别中的应用%The Application of Clustering and Associate Rule Mining to Fraud Information Identification

    Institute of Scientific and Technical Information of China (English)

    幸莉仙; 黄慧连

    2012-01-01

    针对现代电子数据迅速膨胀,传统的审计方式已经无法应对海量的业务数据,试图将数据挖掘中的聚类和关联规则算法引入审计领域.在研究聚类与关联规则算法的含义及相关算法—K-Means和Apriori算法的基础上,提出了一种基于聚类与关联规则的审计模型,并以某市城镇医疗保险的审计为例,首先利用聚类分析进行数据筛选,然后利用关联规则挖掘海量数据之间潜在的关系,为审计提供线索.文章通过案例分析为数据挖掘在信息舞弊识别领域的应用提供参考.%Considering that with the rapid expansion of electronic data, the traditional audit approachs can not cope with vast business data, this paper intend to introduce the Clustering and Association Rule Mining in the audit fields. Based on the study of the meaning of Clustering and Association Rule Mining and their Algorithm—K-Means and Apriori, this article proposed an audit model which is based on the Clustering and Association Rule Mining, at the same time, taking the audit of medical insurance of some a city as an example, it detailed first how to use the Clustering to filter data, then how to mining the potential relationships in vast data so as to determine the audit priorities and audit clues.Through the case, the article is committed to provide a reference for the application of data mining in the fraud information identification.

  6. SAS: Implementation of scaled association rules on spatial multidimensional quantitative dataset

    Directory of Open Access Journals (Sweden)

    M. N. Doja

    2012-09-01

    Full Text Available Mining spatial association rules is one of the most important branches in the field of Spatial Data Mining (SDM. Because of the complexity of spatial data, a traditional method in extracting spatial association rules is to transform spatial database into general transaction database. The Apriori algorithm is one of the most commonly used methods in mining association rules at present. But a shortcoming of the algorithm is that its performance on the large database is inefficient. The present paper proposed a new algorithm by extracting maximum frequent itemsets based on spatial multidimensional quantitative dataset. Algorithms for mining spatial association rules are similar to association rule mining except consideration of special data, the predicates generation and rule generation processes are based on Apriori. The proposed method (SAS Scaled Aprori on Spatial multidimensional quantitative dataset in the paper reduces the number of itemsets generated and also improves the execution time of the algorithm.

  7. The diagnostic rules of peripheral lung cancer preliminary study based on data mining technique

    Institute of Scientific and Technical Information of China (English)

    Yongqian Qiang; Youmin Guo; Xue Li; Qiuping Wang; Hao Chen; Duwu Cui

    2007-01-01

    Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age,cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.

  8. Sanitizing sensitive association rules using fuzzy correlation scheme

    International Nuclear Information System (INIS)

    Data mining is used to extract useful information hidden in the data. Sometimes this extraction of information leads to revealing sensitive information. Privacy preservation in Data Mining is a process of sanitizing sensitive information. This research focuses on sanitizing sensitive rules discovered in quantitative data. The proposed scheme, Privacy Preserving in Fuzzy Association Rules (PPFAR) is based on fuzzy correlation analysis. In this work, fuzzy set concept is integrated with fuzzy correlation analysis and Apriori algorithm to mark interesting fuzzy association rules. The identified rules are called sensitive. For sanitization, we use modification technique where we substitute maximum value of fuzzy items with zero, which occurs most frequently. Experiments demonstrate that PPFAR method hides sensitive rules with minimum modifications. The technique also maintains the modified data's quality. The PPFAR scheme has applications in various domains e.g. temperature control, medical analysis, travel time prediction, genetic behavior prediction etc. We have validated the results on medical dataset. (author)

  9. A new incremental updating algorithm for association rules

    Institute of Scientific and Technical Information of China (English)

    WANG Zuo-cheng; XUE Li-xia

    2007-01-01

    Incremental data mining is an attractive goal for many kinds of mining in large databases or data warehouses. A new incremental updating algorithm rule growing algorithm (RGA) is presented for efficient maintenance discovered association rules when new transaction data is added to a transaction database. The algorithm RGA makes use of previous association rules as seed rules. By RGA, the seed rules whether are strong or not can be confirmed without scanning all the transaction DB in most cases. If the distributing of item of transaction DB is not uniform, the inflexion of robustness curve comes very quickly, and RGA gets great efficiency, saving lots of time for I/O. Experiments validate the algorithm and the test results showed that this algorithm is efficient.

  10. Medical images data mining using classification algorithm based on association rule%基于关联分类算法的医学图像数据挖掘

    Institute of Scientific and Technical Information of China (English)

    邓薇薇; 卢延鑫

    2012-01-01

    Objective In order to assist clinicians in diagnosis and treatment of brain disease,a classifier for medical images which contains tumora inside,based on association rule data mining techniques was constructed.Methtoods After a pre-processing phase of the medical images,the related features from those images were extracted and discretized as the input of association rule,then the medical images classifier was constructed by improved Apriori algorithm.Results The medical images classifier was constructed.The known type of medical images was utilized to train the classifier so as to mine the association rules that satisfy the constraint conditions.Then the brain tumor in an unknown type of medical image was classified by the classifier constructed.Conclusion Classification algorithm based on association rule can be effectively used in mining image features,and constructing an image classifier to identify benign or malignant tumors.%目的 利用关联分类算法,构造医学图像分类器,对未知类型的脑肿瘤图像进行自动判别和分类,以帮助临床医生进行脑疾病的诊断和治疗.方法 对医学图像经过预处理后进行特征提取,再将提取的特征离散化后放到事务数据库中作为关联分类规则的输入,然后利用改进的Apriori算法构造医学图像分类器.结果 构造了医学图像分类器,用已知类型的图像训练分类器挖掘满足约束条件的关联规则,然后利用发现的关联规则对未知类型的医学图像进行分类以判断脑肿瘤的良恶性.结论 利用关联分类算法可以有效地挖掘医学图像特征,进而构造图像分类器,实现脑肿瘤良恶性的自动判别.

  11. Data Mining of Front Pages of Medical Records Based on Association Rules%基于关联规则的病案首页数据挖掘

    Institute of Scientific and Technical Information of China (English)

    杜军; 郭慧敏; 杜静静; 李宁; 黄路非; 杨建南

    2016-01-01

    Objectives To find the association rules of each index of discharged patients’information in the use of Apriori algorithm, provide a theoretical basis for hospital management and decision making. Methods Apriori correlation analysis was conducted on discharged patients in 2015 with the application of R software, to explore gender department and hospital, medical treatment, hospital departments, hospitalization days and total expenses, medical treatment, hospital departments and association rules whether the operation, and analyzed its causes. Results After the field analysis on the front pages of medical records of 49737 cases of patients discharged in 2015, we found the rules below:the discharged number in respiratory ward, digestion ward, general surgery ward, male were more than female patients, and the confidence of the strong association rules were 0.621, 0.531,0.518;in neurology ward and ophthalmology ward, female were more than male in discharged patients, and the confidence of the strong association rules were 0.565, 0.561;health care hospital hospitalization expenses was closely related with the duration of hospitalization, and the confidence of the strong association rules were 0.731、0.649、0.745、0.545;whether to adopt surgical treatment and there was a close relationship between departments, and the confidence of the strong association rules were 0.951、0.748、0.985、0.974、0.735. Conclusions The potential association rules of association rules could explore different indicators, and provide the basis for hospital management and policy decision.%目的:利用Apriori算法找到出院患者信息各个指标中的关联规则,为医院管理和决策提供理论依据。方法利用R软件中的arules包对2015年某院出院患者做Apriori关联分析,探索出院科室与性别,费别、出院科室、住院天数与总费用,费别、出院科室与是否手术的关联规则,并分析其原因。结果对2015

  12. Mining Rules from Electrical Load Time Series Data Set

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The mining of the rules from the electrical load time series data which are collected from the EMS (Energy Management System) is discussed. The data from the EMS are too huge and sophisticated to be understood and used by the power system engineer, while useful information is hidden in the electrical load data. The authors discuss the use of fuzzy linguistic summary as data mining method to induce the rules from the electrical load time series. The data preprocessing techniques are also discussed in the paper.

  13. Price Adjustment by Mining Negative Association Rules%基于负关联规则挖掘的价格调整

    Institute of Scientific and Technical Information of China (English)

    黄发良; 郑小建; 张师超

    2006-01-01

    定制优良的产品价格是激烈竞争的市场中一个关键,基于负关联规则挖掘的技术提出一种新的定价方法,它可通过人力参与和完全自动两种方式进行,该方法具有易操作与易扩展的优点.实验表明该方法是有效的.%Well-determining product price has been a crucial problem in marketing competition. A novel pricing method based on negative association rules identified from past data is proposed, which is easily-manipulated and well-extended for end users. In our approach, an optimal price can be generated with two alternative strategies: human-assisted pricing strategy and automatic pricing strategy. In addition, an efficient algorithm for generating short negative association rules is devised. The results show that the approach is promising and efficient.

  14. Arabic Text Mining Using Rule Based Classification

    OpenAIRE

    Fadi Thabtah; Omar Gharaibeh; Rashid Al-Zubaidy

    2012-01-01

    A well-known classification problem in the domain of text mining is text classification, which concerns about mapping textual documents into one or more predefined category based on its content. Text classification arena recently attracted many researchers because of the massive amounts of online documents and text archives which hold essential information for a decision-making process. In this field, most of such researches focus on classifying English documents while there are limited studi...

  15. 关联规则挖掘在软件企业客户关系管理中的研究与应用%Research and Application of Association Rule Mining in Software Enterprise's Customer Relationship Management

    Institute of Scientific and Technical Information of China (English)

    杨盛苑; 张向娟

    2012-01-01

      关联规则挖掘是在数据挖掘研究中最为活跃的一种挖掘方法之一。该文利用关联规则的Apriori算法对软件企业的客户与商品之间的关系进行挖掘,发现商品间的潜在关系,指导决策者对不同的客户实施不同的营销策略,从而达到高质量的客户关系管理。%  Association rule mining is one of the most active algorithm in data mining.This paper use Apriori algorithm to mining the relationship between customers and products of software enterprise in order to find potential relationship between various products, according to which managers can use different marketing strategies to different customers, and achieve high quality cus⁃tomer relationship management.

  16. Application and Discussion of Data Mining Model Based on Microsoft Association Rules Algorithm%基于 MS 关联规则数据挖掘模型的应用与探讨

    Institute of Scientific and Technical Information of China (English)

    刘城霞

    2013-01-01

      文中研究了数据挖掘算法中的 MS 关联规则算法以及其在金融领域的应用.数据挖掘的作用就是要从海量的数据里找到有用的、潜在的信息,模型通过对客户账户及交易数据的过滤和深入挖掘,建立了一个为银行管理人员提供更好的智能决策和建议,为普通客户提供咨询的数据挖掘商业应用实例系统.系统的选择 Visual Studio. NET 2008进行客户端的开发,使用 ADOMD. NET 对象连接挖掘模型和建立预测目标,使用 Web 控件对展示模型的结果.客户通过输入一些个人属性以及办理业务的基本要求,查看所关心的支付情况、贷款数量和应办理的信用卡类型,银行可以针对用户的支付特点,提供相应的增值服务等.在整个实例系统的构建过程中,对关联规则模型的挖掘过程进行了详细的分析,促进了数据挖掘的应用实践.%The application of Microsoft association rules algorithm of data mining in financial field is discussed in this paper. The function of the data mining is mining useful and potential information from the massive data. A business data mining system is created based on Microsoft association rules algorithm,which can provide better decisions and recommendations for the bank through filtering and mining the customers' transaction information. The client part of the system is developed with the Visual Studio. NET 2008. And it uses the ob-jects of ADOMD. NET to associate the data warehouse and the interface and the Web controls to display the result of mining. By using the application system analyze the customer's attributes to predict the payment ability and credit card type. The bank also can supply more service based on the customer's interest. In the creation of the instance model system the whole program of data mining is introduced in detail and this helps the development of data mining's application.

  17. Recent Trends and Research Issues in Video Association Mining

    Directory of Open Access Journals (Sweden)

    Vijayakumar.V

    2011-12-01

    Full Text Available With the ever-growing digital libraries and video databases, it is increasingly important to understand andmine the knowledge from video database automatically. Discovering association rules between items in alarge video database plays a considerable role in the video data mining research areas. Based on theresearch and development in the past years, application of association rule mining is growing in differentdomains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well aspersonal and online media collections. The purpose of this paper is to provide general framework ofmining the association rules from video database. This article is also represents the research issues invideo association mining followed by the recent trends.

  18. Mining research on association rules in law of compatibility of medicines in prescriptions%关联规则在方剂配伍规律中的挖掘研究

    Institute of Scientific and Technical Information of China (English)

    陈安娜; 陈联源

    2013-01-01

    方剂以中医药理论为指导,“配伍”是将诸药按照一定规则进行组合,是方剂研究的核心问题。使用关联规则挖掘技术分析方剂配伍的模式和规则,深化对方剂配伍本质规律的认识,将传统的Apriori算法与改进的算法进行实验分析。结果表明,改进的算法能为合理的配伍与精简复方提供理论支持。%The theory of traditional Chinese medicine is the guidance of prescription. Compatibility means that all drugs are combined according to certain rules, and it is the core problem of prescription study. The mining technology of association rules are used to analyze the patterns and rules of prescription compatibility , which deepens the recognition on the nature of prescription compatibility. The traditional Apriori algorithm and the improved algorithm are for experimental analysis. And the results indicate that the improved algorithm can provide theoretical support for reasonable compatibility and streamlining the compound.

  19. 基于文化免疫克隆算法的关联规则挖掘研究%Mining association rules based on cultured immune clone algorithm

    Institute of Scientific and Technical Information of China (English)

    杨光军

    2013-01-01

      针对关联规则挖掘问题,给出一种基于文化免疫克隆算法的关联规则挖掘方法,该方法将免疫克隆算法嵌入到文化算法的框架中,采用双层进化机制,利用免疫克隆算法的智能搜索能力和文化算法信念空间形成的公共认知信念的引导挖掘规则。该方法重新给出了文化算法中状况知识和历史知识的描述,设计了一种变异算子,能够自适应调节变异尺度,提高免疫克隆算法全局搜索能力。实验表明,该算法的运行速度和所得关联规则的准确率优于免疫克隆算法。%For the association rules mining, a method of mining association rules based on cultured immune clone algorithm is proposed. This method uses two-layer evolutionary mechanism and embeds the immune clone algorithm in the culture algorithm framework. It uses the intelligent searching ability of the immune clone algorithm and the commonly accepted knowledge in the culture algorithm to guide the rules mining. The situational knowledge and history knowledge in the culture algorithm are rede-fined, and a new mutation operator is put forward. This operator has the adaptive adjustment of mutation measure to improve the global search ability of immune clone algorithm. The experiments show that the new algorithm is superior to immune clone algo-rithm in performance speed and the rules’accuracy.

  20. An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules

    Directory of Open Access Journals (Sweden)

    Ahmed Abdul-WahabAl-Opahi

    2015-06-01

    Full Text Available Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of positive rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. But these algorithms do not consider negation occurrence of the attribute in them and also these rules are not in infrequent form. The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of interest negative association rules, and their huge number as compared with positive association rules. The interesting discovery of association rules is an important and active area within data mining research. In this paper, an efficient algorithm is proposed for discovering interesting positive and negative association rules from frequent and infrequent items. The experimental results show the usefulness and effectiveness of the proposed algorithm.

  1. Remote Sensing Classification based on Improved Ant Colony Rules Mining Algorithm

    Directory of Open Access Journals (Sweden)

    Shuying Liu

    2014-09-01

    Full Text Available Data mining can uncover previously undetected relationships among data items using automated data analysis techniques. In data mining, association rule mining is a prevalent and well researched method for discovering useful relations between variables in large databases. This paper investigates the principle of traditional rule mining, which will produce more non-essential candidate sets when it reads data into candidate items. Particularly when it deals with massive data, if the minimum support and minimum confidence are relatively small, combinatorial explosion of frequent item sets will occur and computing power and storage space required are likely to exceed the limits of machine. A new ant colony algorithm based on conventional Ant-Miner algorithm is proposed and is used in rules mining. Measurement formula of effectiveness of the rules is improved and pheromone concentration update strategy is also carried out. The experiment results show that execution time of proposed algorithm is lower than traditional algorithm and has better execution time and accuracy

  2. Role of Interestingness Measures in CAR Rule Ordering for Associative Classifier: An Empirical Approach

    CERN Document Server

    Kannan, S

    2010-01-01

    Associative Classifier is a novel technique which is the integration of Association Rule Mining and Classification. The difficult task in building Associative Classifier model is the selection of relevant rules from a large number of class association rules (CARs). A very popular method of ordering rules for selection is based on confidence, support and antecedent size (CSA). Other methods are based on hybrid orderings in which CSA method is combined with other measures. In the present work, we study the effect of using different interestingness measures of Association rules in CAR rule ordering and selection for associative classifier.

  3. Closed-set-based Discovery of Representative Association Rules Revisited

    CERN Document Server

    Balcázar, José L

    2010-01-01

    The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the "essential" or "representative" rules. We revisit the algorithm given by Kryszkiewicz (Int. Symp. Intelligent Data Analysis 2001, Springer-Verlag LNCS 2189, 350-359) for mining representative rules. We show that its output is sometimes incomplete, due to an oversight in its mathematical validation, and we propose an alternative complete generator that works within only slightly larger running times.

  4. 云计算环境下的关联挖掘在图书销售中的研究%RESEARCH ON ASSOCIATION RULE MINING IN BOOK SALES UNDER CLOUD COMPUTING ENVIRONMENT

    Institute of Scientific and Technical Information of China (English)

    郭健; 任永功

    2014-01-01

    随着大数据时代的到来,如今人们已经淹没在海量的信息当中。云计算技术的出现,为解决在海量数据中高效地挖掘出有价值的信息问题提供了新的思路。利用云计算的分布式处理和虚拟化技术的优势,提出一种基于Map/Reduce编程模型与编码操作相结合的分布式关联规则挖掘算法———MCM-Apriori算法;设计并实现一个基于Hadoop云平台的网上图书销售系统。为进一步验证该系统的高效性,在该系统中利用MCM-Apriori算法进行图书推荐服务的应用。实验对比结果表明,该系统实现了快速分析与查询、可靠存储的功能,可以明显提高关联规则挖掘效率。%With the advent of big data era, people are now overwhelmed by massive information.The emergence of cloud computing tech-nology provides new idea for efficiently mining the valuable information from mass data.By utilising its advantages in distributed processing and virtualisation, we present a distributed associate rule mining algorithm ( MCM-Apriori) , which is based on the combination of Map/Re-duce programming model and coding operation.We also design and implement an online bookstore sales system with Hadoop framework using cloud computing.To further verify the efficiency of the system, we use MCM-Apriori algorithm to implement the application of book recom-mendations service in it.Contrasted experimental results demonstrate that this system achieves the functions of fast analysis and query as well as reliable storage, and can significantly improve the efficiency of association rules mining.

  5. Using the interestingness measure lift to generate association rules

    Directory of Open Access Journals (Sweden)

    Nada Hussein

    2015-04-01

    Full Text Available In this digital age, organizations have to deal with huge amounts of data, sometimes called Big Data. In recent years, the volume of data has increased substantially. Consequently, finding efficient and automated techniques for discovering useful patterns and relationships in the data becomes very important. In data mining, patterns and relationships can be represented in the form of association rules. Current techniques for discovering association rules rely on measures such as support for finding frequent patterns and confidence for finding association rules. A shortcoming of confidence is that it does not capture the correlation that exists between the left-hand side (LHS and the right-hand side (RHS of an association rule. On the other hand, the interestingness measure lift captures such as correlation in the sense that it tells us whether the LHS influences the RHS positively or negatively. Therefore, using Lift instead of confidence as a criteria for discovering association rules can be more effective. It also gives the user more choices in determining the kind of association rules to be discovered. This in turn helps to narrow down the search space and consequently, improves performance. In this paper, we describe a new approach for discovering association rules that is based on Lift and not based on confidence.

  6. ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURISTIC ALGORITHMS

    Directory of Open Access Journals (Sweden)

    Roghayeh Saneifar

    2015-11-01

    Full Text Available According to the increase of using data mining techniques in improving educational systems operations, Educational Data Mining has been introduced as a new and fast growing research area. Educational Data Mining aims to analyze data in educational environments in order to solve educational research problems. In this paper a new associative classification technique has been proposed to predict students final performance. Despite of several machine learning approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along with high accuracy. In this research work, we have employed Honeybee Colony Optimization and Particle Swarm Optimization to extract association rule for student performance prediction as a multi-objective classification problem. Results indicate that the proposed swarm based algorithm outperforms well-known classification techniques on student performance prediction classification problem.

  7. Image segmentation using association rule features.

    Science.gov (United States)

    Rushing, John A; Ranganath, Heggere; Hinke, Thomas H; Graves, Sara J

    2002-01-01

    A new type of texture feature based on association rules is described. Association rules have been used in applications such as market basket analysis to capture relationships present among items in large data sets. It is shown that association rules can be adapted to capture frequently occurring local structures in images. The frequency of occurrence of these structures can be used to characterize texture. Methods for segmentation of textured images based on association rule features are described. Simulation results using images consisting of man made and natural textures show that association rule features perform well compared to other widely used texture features. Association rule features are used to detect cumulus cloud fields in GOES satellite images and are found to achieve higher accuracy than other statistical texture features for this problem.

  8. 基于关联规则的动态数据库快速挖掘算法%Dynamic Fast Database Mining Algorithm Based on Association Rules

    Institute of Scientific and Technical Information of China (English)

    王宗江

    2007-01-01

    关联规则的动态快速挖掘算法(Dynamic Fast Mining Algorithm,DFMA),不需要重复扫描原始数据库,克服关联规则挖掘最具代表性的方法Apriori算法耗时多、无法在线挖掘等诸多弱点.可支持在线挖掘及渐进式挖掘的需求.利用DFMA多层同步处理与更新的特性,搭配敏感度指数的定义,可以被用来挖掘对决策者有用的实时性信息.

  9. Sequential association rules in atonal music

    NARCIS (Netherlands)

    A. Honingh; T. Weyde; D. Conklin

    2009-01-01

    This paper describes a preliminary study on the structure of atonal music. In the same way as sequential association rules of chords can be found in tonal music, sequential association rules of pitch class set categories can be found in atonal music. It has been noted before that certain pitch class

  10. 文本挖掘探索泌尿系感染中西医用药规律%Exploring the associated rules of traditional Chinese medicine and western medicine on urinary tract infection with text mining technique

    Institute of Scientific and Technical Information of China (English)

    陈文; 姜洋; 黄蕙莉; 孙玉香

    2013-01-01

    Objective To explore the associated rules between western medicine and traditional Chinese medicine (TCM) on urinary tract infection (UTI) with text mining technique.Methods The data set on UTI was downloaded from CBM database.The regularities of Chinese patent medicines (CPM),western medicines and the combination of CPM and western medicines on UTI were mined out by data slicing algorithm.The results were showed visually with Cytoscape2.8 software.Results The main function of CPM was focused on clearing heat and removing toxicity,promoting diuresis and relieving stranguria.For western medicine,antibacterial agents was often used and it was also frequently used together with CPM such as Sanjinpian.Conclusions Text mining approach provides an important method in the summary of the application regularity for disease in both TCM and western medicine.%目的 利用文本挖掘技术探索泌尿系感染中西医用药规律.方法 在中国生物医学文献服务系统中收集治疗泌尿系感染文献数据,采用基于敏感关键词频数统计的数据分层算法,挖掘泌尿系感染中成药、西药、中成药与西药联合应用规律,并利用Cytoscape2.8软件进行可视化展示.结果 中成药的应用以清热解毒、利尿通淋为主;西药以抗菌治疗为主;具有清热解毒、利尿通淋之功的中成药常与抗菌药联合应用.结论 文本挖掘能够比较客观地总结疾病用药规律,为临床应用提供有益的探索和参考.

  11. 统计分析及关联挖掘在大学生心理健康中的应用%Statistical Analysis and Association Rule Mining of Application in College Students’ Mental Health

    Institute of Scientific and Technical Information of China (English)

    亓文娟; 黄书城

    2014-01-01

    为深入了解影响大学生心理健康的主要因素以及心理症状之间的关系,以某高校2011级的学生心理测试数据为基础,采用统计分析和关联规则挖掘两种方法,从性别、学生干部、独生子女、来源地、家庭结构、家庭月收入等方面进行了分析研究,根据研究结果为高校开展大学生心理健康教育的规划、决策提供依据。%To better understand the relationship between the main factors affecting the mental health of college students as well as psychological symptoms between a university’s 2011’ students’ psychological test data, the research uses statistical analysis and association rule mining two species method. From gender, only-child or not, native place, student cadre or not, family structure, family’s monthly income to analysis research. According to the research results will help educators to get a deeper understanding of students’ mental health problems and provide a basis for them to make plans and decisions about college studnets’ psychological educaiton.

  12. A Study of Frequent Cyclic Association Rule%经常性周期关联规则的研究

    Institute of Scientific and Technical Information of China (English)

    黄益民

    2000-01-01

    One of the most intportant data mining problems is mining association rules. In this paper,we considered the problem of founding frequent cyclic association rules. By exploiting the relationship between cycles and large itemsets,we identified optimization techniques that allow us to minimize the unnecessary amount of work performed during the data mining process. Furthermore,we demonstrated the effectiveness of these methods through a series of experiments.

  13. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  14. Fast Algorithms of Mining Probability Functional Dependency Rules in Relational Database

    Institute of Scientific and Technical Information of China (English)

    TAO Xiaopeng; ZHOU Aoying; HU Yunfa

    2000-01-01

    This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presented to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.

  15. 加权模糊关联规则的研究%Research on Weighted Fuzzy Association Rules

    Institute of Scientific and Technical Information of China (English)

    陆建江

    2003-01-01

    Algorithms for mining quantitative association rules consider each attribute equally, but the attributes usu-ally have different importance. Two kinds of algorithms for mining the weighted fuzzy association rules are providedwith respect to two kinds of database. The first algorithm can effectively consider the importance of quantitative at-tributes, and considers that the importance of association rule is not increased with the amount of attributes in therule. The second algorithm not only considers the importance of quantitative attributes, but also considers that theimportance of association rule is increased with the amount of attributes in the rule.

  16. Using Fuzzy Association Rules to Design E-commerce Personalized Recommendation System

    Directory of Open Access Journals (Sweden)

    Guofang Kuang

    2013-09-01

    Full Text Available In order to improve the efficiency of fuzzy association rule mining, the paper defines the redundant fuzzy association rules, and strong fuzzy association rules redundant nature. As much as possible for more information in the e-commerce environment, and in the right form is a prerequisite for personalized recommendation. Personalized recommendation technology is a core issue of e-commerce automated recommendation system. Higher complexity than ordinary association rules algorithm fuzzy association rules, the low efficiency become a bottleneck in the practical application of fuzzy association rules algorithm. The paper presents using fuzzy association rules to design E-commerce personalized recommendation system. The experimental results show that the new algorithm to improve the efficiency of the implementation.

  17. On construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-01-01

    This paper is devoted to the study of approximate algorithms for minimization of partial association rule length. It is shown that under some natural assumptions on the class NP, a greedy algorithm is close to the best polynomial approximate algorithms for solving of this NP-hard problem. The paper contains various bounds on precision of the greedy algorithm, bounds on minimal length of rules based on an information obtained during greedy algorithm work, and results of the study of association rules for the most part of binary information systems. © 2009 Springer Berlin Heidelberg.

  18. EMCAR: Expert Multi Class Based on Association Rule

    Directory of Open Access Journals (Sweden)

    Wa'el Hadi

    2013-04-01

    Full Text Available Several experimental studies revealed that expert systems have been successfully applied in real world domains such as medical diagnoses, traffic control, and many others. However, one of the major drawbacks of classic expert systems is their reliance on human domain experts which require time, care, experience and accuracy. This shortcoming also may result in building knowledge bases that may contain inconsistent rules or contradicting rules. To treat the abovementioned we intend to propose and develop automated methods based on data mining called Associative Classification (AC that can be easily integrated into an expert system to produce the knowledge base according to hidden correlations in the input database. The methodology employed in the proposed expert system is based on learning the rules from the database rather than inputting the rules by the knowledge engineer from the domain expert and therefore, care and accuracy as well as processing time are improved. The proposed automated expert system contains a novel learning method based on AC mining that has been evaluated on Islamic textual data according to several evaluation measures including recall, precision and classification accuracy. Furthermore, five different classification approaches: Decision trees (C4.5, KNN, SVM, MCAR and NB and the proposed automated expert system have been tested on the Islamic data set to determine the suitable method in classifying Arabic texts.

  19. Two Mining Based on Maximal Frequent Associated Patterns to Study the Rules of Drug Patterns%基于最大频繁all-confidence模式的二次挖掘探讨药物模式组配规律

    Institute of Scientific and Technical Information of China (English)

    周忠眉

    2012-01-01

      方剂配伍规律研究是方剂学难点和重点课题之一。利用 all-confidence 度量,挖掘最大频繁all-confidence 模式。提出基于最大频繁 all-confidence 模式的二次挖掘方法,挖掘药物模式之间的组配规则,从而探寻药物模式之间的组配规律,辅助方剂配伍规律研究,并为临床组方提供规则参考。首先给出最大频繁all-confidence模式等相关的概念,其次给出基于最大频繁all-confidence模式的二次挖掘方法,最后在方剂数据库上进行实验,实验结果表明基于最大频繁 all-confidence 模式的二次挖掘能有效挖掘大量药物模式之间的组配规则,有利于药物模式之间组配规律的探寻。%  The research of the composition of medicines in a prescription is one of difficult and important tasks in formulas of Chinese medicine. We use all-confidence to mine maximal frequent all-confidence patterns. We give two mining method base on maximal frequent all-confidence patterns and mine rules of the composition of drug patterns in order to find laws of the composition of drug patterns. As a result, we can help the research of the composition of medicines in a prescription and meanwhile offer a number of rules for clinically formulating prescription. First, we give the related definitions, such as the definition of a maximal frequent all-confidence pattern and so on. Second, we give the two mining method based on maximal frequent all-confidence patterns. Finally, we perform experiments in prescription database. The experimental results show that the two mining method base on maximal frequent all-confidence patterns can find many rules of the composition of drug patterns. Thus, it is helpful for us to find laws of the composition of drug patterns.

  20. Investigation of Medication Rule in Wang Zhongqis Medical Records by Frequent Itemset Mining and Association Rule Learning%采用频繁集与关联规则挖掘《王仲奇医案》用药规律

    Institute of Scientific and Technical Information of China (English)

    张凯; 寿志勤; 郭亚光; 马宗华; 郑日新

    2013-01-01

    目的 研究用药规律并进行关联性分析,为临床用药提供参考.方法 以"咳血"、"虚劳"以及"湿温"医案为研究对象,通过分析医案信息结构以对医案原文进行数据预处理,构建数据库;通过整合Apriori关联规则算法,设计并实现"新安中医临证指导系统",完成数据挖掘结果的可视化,提供"临床查询应用"功能以及用药规律的关联性分析.结果 药物关联性分析结果显示治疗"咳血"的常用药物为丝瓜络、茜根以及牡丹皮等,其中核心药对为丝瓜络和茜根;治疗"虚劳"的常用药物为石斛、牡蛎及甘草等,其中核心药对为石斛和牡蛎;治疗"湿温"的常用药物为茯苓、佩兰及杏仁等,其中核心药对为茯苓、佩兰.结论 关联规则分析可用于挖掘医案的用药规律,本技术框架可应用于其他医籍的研究.%Objective To investigate the medication rule in Wang Zhongqi's Medical Records and conduct association rule analysis and to provide reference for clinical medication. Methods Taking "hemoptysis", "consumptive disease" , and "damp-warm syndrome" as the diseases for research, the information structure of medical records was analyzed to perform data preprocessing of the original text of medical records, so that the database of Wang ZhongqVs Medical Records was established. With the use of Apriori algorithm, the "Xin'an Traditional Chinese Medicine Clinical Guide System" was designed and created, so as to visualize the data mining results and provide the "application of clinical queries" and association rule analysis on medication rule. Results The association rule analysis showed that the common traditional Chinese medicines for treating hemoptysis were loofah sponge, Rubia cordi folia Radicis, Cortex Moutan Radicis , and so on, with loofah sponge and Rubia cordi folia Radicis as the core medicines; the common traditional Chinese medicines for treating consumptive disease were Dendrobium, Concha Ostreae

  1. Study on Third Party Logistics Enterprise Marketing Decision-making Based on Associated Rule Mining Technology%基于关联规则挖掘技术的第三方物流企业营销决策研究

    Institute of Scientific and Technical Information of China (English)

    李强

    2013-01-01

    首先概括了物流信息挖掘技术及其作用,并介绍了一些相关的研究成果和模型,接着重点进行了基于关联规则挖掘技术在第三方物流企业营销决策中的实证分析和研究.实证结果表明,第三方物流企业业务开展和营销决策中,应用关联规则对业务数据信息进行挖掘在技术上是可行的,而且基于关联规则的物流企业业务数据信息挖掘过程中所发现的业务关联规律和影响因子可以帮助物流企业进行营销决策的制定和优化,为市场营销活动和决策提供科学依据.%In this paper, we first introduced the function of the logistics information mining technology, as well as some relevant findings and models and then focused empirically on the decision-making process of third party logistics enterprises based on the association rule mining technology. The result indicated that in the business launching and marketing decision of third party logistics enterprises, it was technically feasible to apply the association rules in the mining of business data and information and that the business association patterns and influence factors uncovered in the data and information mining process based on the association rules could help the enterprises customize and optimize the marketing decision process.

  2. Reduction of Number of Association Rules with Inter Itemset Distance in Transaction Databases

    Directory of Open Access Journals (Sweden)

    Pankaj Kumar Deva Sarma

    2012-11-01

    Full Text Available Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified threshold values of support and confidence.The interestingness of Association Rules are determined by these two measures. However,other measures of interestingness like lift and conviction are also used. But, there occurs an explosive growth of discovered association rules and many of such rules are insignificant. In this paper we introduce a new measure of interestingness called Inter Itemset Distance or Spread and implemented this notion based on the approaches of the apriori algorithm with a view to reduce the number of discovered Association Rules in a meaningful manner. An analysis of the working of the new algorithm is done and the results are presented and compared with the results of conventional apriori algorithm.

  3. MAGDM-Miner: A New Algorithm for Mining Trapezoidal Intuitionistic Fuzzy Correlation Rules

    OpenAIRE

    Robinson, John P.; Henry Amirtharaj

    2014-01-01

    In this article, the authors propose a new framework called the MAGDM-Miner, for mining correlation rules from trapezoidal intuitionistic fuzzy data efficiently. In the MAGDM-Miner, the raw data from a Multiple Attribute Group Decision Making (MAGDM) problem with trapezoidal intuitionistic fuzzy data are first pre-processed using some arithmetic aggregation operators. The aggregated data in turn are processed for efficient data selection through fuzzy correlation rule mining where the unwante...

  4. 5 CFR 5201.105 - Additional rules for Mine Safety and Health Administration employees.

    Science.gov (United States)

    2010-01-01

    ... Health Administration employees. 5201.105 Section 5201.105 Administrative Personnel DEPARTMENT OF LABOR... for Mine Safety and Health Administration employees. The rules in this section apply to employees of the Mine Safety and Health Administration (MSHA) and are in addition to §§ 5201.101, 5201.102,...

  5. The Research of Intrusion Detection System Based on Improved Apriori Algorithm of Data Mining Association Rules%基于数据挖掘关联规则Apriori改进算法的入侵检测系统的研究

    Institute of Scientific and Technical Information of China (English)

    张浩; 景凤宣; 谢晓尧

    2011-01-01

    在众多的关联规则挖掘算法中,Apriori算法是最为经典的一个,但Apriori算法有以下缺陷:需要扫描多次数据库、生成大量候选集以及迭代求解频繁项集。因而提出了一种新方法,使Apriori算法产生的候选项集再通过数据库查找是否为频繁项集,从而提高算法的效率。最后针对入侵检测系统形成关联规则。实验结果表明,改进后的算法能有效地提高关联规则挖掘的效率。%Among a large number of association rule mining algorithms, Apriori algorithm is the most classic one ,but it has three deficiencies,including scanning databases many times, senerating a large number of candidate anthology, and mining frequent itemsets iteratively. This paper presented a method, Apriori algorithm to generate the candidate itemsets and then finds whether it is the frequent item- sets through the database, thereby enhancing the efficiency of the algorithm. Finally, intrusion detection system for the formation of association rules (IDS). The experimental results show that the optimized algorithm can effectively improve the efficiency of mining association rules.

  6. Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data

    NARCIS (Netherlands)

    Feng, Ling; Dillon, Tharam; Liu, James; Chen, P.P.

    2001-01-01

    Inter-transactional association rules, first presented in our early work [H. Lu, J. Han, L. Feng, Stock movement prediction and n-dimensional inter-transaction association rules, in: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Seattle, Washington

  7. Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal; Tunç, Murat Mustafa; Tunc, Murat Mustafa

    2012-01-01

    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding ...

  8. 基于加权关联规则和文本挖掘的金融新闻传播 Agent 实现%WEIGHTED ASSOCIATION RULES AND TEXT MINING-BASED AGENT REALISATION OF FINANCIAL NEWS SPREADING

    Institute of Scientific and Technical Information of China (English)

    张人上; 曲开社

    2015-01-01

    针对传统的金融预测系统仅仅依靠股票价格和市场指数等定量数据而不能很好地满足实时性和高准确性的问题,提出一种基于加权关联规则和文本挖掘的新闻传播 Agent 实现方法。首先,利用中文知识与信息处理系统将每个新闻标题分离得到每个中文单词;然后,利用加权关联规则算法检测频繁出现在同一条新闻标题中的多个术语,并提取名词、动词和复合语;最后,根据新闻供给市场第一个交易日股票交易金融价格指数为提取的关键字分配权重,并根据新闻标题的权重值判断其对股票价格的影响程度。新闻标题特征数据库上的实验验证了该方法在金融新闻标题的实时信息发布应用中的可行性,实验结果表明,相比其他几种预测方法,该方法取得了更高的预测准确率和召回率。%Traditional financial prediction systems cannot well satisfy both real-time property and high accuracy because they rely on quantitative data of stock prices and market indexes only.For which,we propose the weighted association rules and text mining-based Agent realisation of news spreading.First,it employs Chinese knowledge and information processing system to divide every news headline into single Chinese characters.Then,it uses WAR algorithm to detect multiple terminologies frequently appearing in same news headlines,and extracts noun,verb and complex languages as well.Finally,it assigns weights to the extracted keywords according to the first day’s financial price index of stock transactions in news supplying market,and estimates the influence degree of weighted values of news headlines on stock prices. The effectiveness of the proposed method in application of real-time information delivery of financial news headlines has been verified by the experiments on news headlines characteristic database.Experimental results show that the proposed method achieves higher accuracy

  9. Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

    Institute of Scientific and Technical Information of China (English)

    YI Yunfei; LIN Fang; QIN Jun

    2008-01-01

    Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules. The paper gives the method to encode for the rules, the fitness function, the selecting, crossover, mutation and migration operator for the DAGA at the same time are designed.

  10. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Arup; Hansen, Lars Kai; Balslev, Daniela

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...... that the statistically motivated associations are well aligned with general neuroscientific knowledge...

  11. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup; Hansen, Lars Kai; Balslev, D.

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...... that the statistically motivated associations are well aligned with general neuroscientific knowledge....

  12. [Rules of acupoints combination of ancient acupuncture for Xiaoke based on data mining technology].

    Science.gov (United States)

    Xu, LinLing; Xu, Tianshu; Zhang, Jianbin

    2015-08-01

    The rules of acupoints combination of ancient acupuncture for Xiaoke are mainly explored. By retrieval on ancient literature, the database of acupuncture and moxibustion for Xiaoke is established; based on the database, association analysis between acupoints and symptoms is performed. According to the association analysis in 5 databases of Xiaoke database, Xiaoke database of kidney deficiency, Xiaoke-database of dry mnouth and thirst, Xiaoke database of difficult urination, Xiaoke database of drinking addiction, the results are mainly characterized with symptom differentiation combination, distal-local combination, local combination and front-back combination, which can nourish yin and clear heat. It is believed that establishment of TCM ancient literature database and exploration of data mining technology is a potential research orientation.

  13. Analysis on Biological Information Network Multidimensional Data Mining Algorithm Based on Association Rule Mapping%基于关联规则映射的生物信息网络多维数据挖掘算法分析

    Institute of Scientific and Technical Information of China (English)

    吐尔逊江·托合提

    2015-01-01

    For the biological information network,want to a wide range of data mining is applied to the algorithm has many deifciencies,such as low precision,slow speed and memory,and so on.Based on this background,this paper proposes a data mining algorithm which can carry on the mapping and association for the biological information,this algorithm not only can be related to map,the network data sets,but also be based on the algorithm introduces the relative error,increase the accuracy of algorithm.By constructing the correlation between data sets,can distinguish the data within the space,achieve better effect of data mining.%要想对生物信息网络数据进行大范围的挖掘,所应用到的算法有很多不足之处,比如,精确度低,运行速度迟缓以及占内存等,基于这一背景,文章提出了一种能够对生物信息进行映射并关联的数据挖掘算法,这种算法不仅能够映射关联,确定网络数据集,还能够基于算法引入相对误差,使算法的精确性提高。通过构建数据集间的关联,能够对空间内的数据进行区分,达到更好的数据挖掘效果。

  14. Proceedings of the 2010 International Mine Water Association symposium : mine water and innovative thinking

    Energy Technology Data Exchange (ETDEWEB)

    Wolkersdorfer, C. [Cape Breton Univ., Sydney, NS (Canada); Freund, A. [CBU Press, Sydney, NS (Canada)] (eds.)

    2010-07-01

    Acid mine drainage is causing pollution in many waterways and ground water tables throughout the world. Hosted by the International Mine Water Association, this symposium examined issues related to acid mine drainage and explored various water treatment and water removal technologies and mine water chemistry analysis methods. Issues concerning the remediation and monitoring of abandoned mines were explored and recent innovations in geochemistry and geological engineering were presented. Water management issues in various types of geologic formations were included. The conference themes were: mine water issues and innovative mining methods; mine water engineering; mine water treatment, active systems; mine water treatment, passive systems; mine water geothermal, geochemistry and biochemistry uses; analysis of mine water and its chemistry; underground and surface coal mining; mine closures; legal and social aspects of mine water; mine tailings; the Cape Breton Development Corporation legacy; and the concept of a zero waste mine. The symposium featured 155 presentations, of which 32 have been catalogued separately for inclusion in this database. tabs., figs.

  15. Application Research in Medicine Based on Texture Features Association Rules Mining%基于纹理特征的关联规则挖掘方法的医学应用

    Institute of Scientific and Technical Information of China (English)

    于超; 王璐; 吴琼; 裴志松

    2012-01-01

    In order to meet the requirement of medical image auxiliary diagnosis, we present a feature fusion algorithm based on Apriori algorithm: texture features and patient natural features in HIS ( Hospital Information System). Accordingly, the combination of pruning methods associated rule base, prototype system for a CT (Computer Tomography) image is divided into normal and abnormal categories. Experiments were evaluated in accordance with the system, showing that association rules established by the algorithm library, in the auxiliary doctor diagnosed, with good results.%为满足借助医学图像辅助诊断的要求,提出了一种基于Apriori算法的特征融合算法:融合图像的纹理特征和医院信息系统( HIS:Hospital Information System)中病患自然特征.结合剪枝方法建立关联规则库,实现了一个可以自动将CT( Computer Tomography)图像分为正常与异常两类的原型系统.依据该系统进行了评价实验.实验表明,通过该算法建立的关联规则库,对辅助医生诊断具有较好的效果.

  16. Mining Target-Oriented Fuzzy Correlation Rules to Optimize Telecom Service Management

    CERN Document Server

    Chueh, Hao-En

    2011-01-01

    To optimize telecom service management, it is necessary that information about telecom services is highly related to the most popular telecom service. To this end, we propose an algorithm for mining target-oriented fuzzy correlation rules. In this paper, we show that by using the fuzzy statistics analysis and the data mining technology, the target-oriented fuzzy correlation rules can be obtained from a given database. We conduct an experiment by using a sample database from a telecom service provider in Taiwan. Our work can be used to assist the telecom service provider in providing the appropriate services to the customers for better customer relationship management.

  17. Overlying strata movement rules and safety mining technology for the shallow depth seam proximity beneath a room mining goaf

    Institute of Scientific and Technical Information of China (English)

    Wang Fangtian; Zhang Cun; Zhang Xiaogang; Song Qi

    2015-01-01

    Aiming at the shallow depth seam proximity beneath a room mining goaf, due to that the shallow depth seam is exploited using the longwall mining and overlain by thin bedrock and thick loose sands, many accidents are likely to occur, including roof structure instability, roof step subsidence, damages of shield supports, and the face bumps triggered by the large area roof weighting, resulting in serious threats to the safety of underground miners and equipment. This paper analyses the overlying strata movement rules for the shallow seams using the physical simulation, the 3DEC numerical simulation and the field mea-surements. The results show that, in shallow seam mining, the overburden movement forms caved zone and fractured zone, the cracks develop continuously and reach the surface with the face advancing, and the development of surface cracks generally goes through four stages. With the application of loose blast-ing of residual pillars, reasonable mining height, and roof support and management, the safe, efficient and high recovery rate mining has been achieved in the shallow depth seam proximity beneath a room min-ing goaf.

  18. [Acupoints selection rules analysis of ancient acupuncture for urinary incontinence based on data mining technology].

    Science.gov (United States)

    Zhang, Wei; Tan, Zhigao; Cao, Juanshu; Gong, Houwu; Qin, Zuoai; Zhong, Feng; Cao, Yue; Wei, Yanrong

    2015-12-01

    Based on ancient literature of acupuncture in Canon of Chinese Medicine (4th edition), the articles regarding acupuncture for urinary incontinence were retrieved and collected to establish a database. By Weka data mining software, the multi-level association rules analysis method was applied to analyze the acupoints selection characteristics and rules of ancient acupuncture for treatment of urinary incontinence. Totally 356 articles of acupuncture for urinary incontinence were collected, involving 41 acupoints with a total frequency of 364. As a result, (1) the acupoints in the yin-meridian of hand and foot were highly valued, as the frequency of acupoints in yin-meridians was 2.6 times than that in yang-meridians, and the frequency of acupoints selected was the most in the liver meridian of foot-jueyin; (2) the acupoints in bladder meridian of foot-taiyang were also highly valued, and among three yang-meridians of foot, the frequency of acupoints in the bladder meridian of foot-taiyang was 54, accounting for 65.85% (54/82); (3) more acupoints selected were located in the lower limbs and abdomen; (4) specific acupoints in above meridians were mostly selected, presenting 73.2% (30/41) to the ratio of number and 79.4% (289/364) to the frequency, respectively; (5) Zhongji (CV 3), the front-mu point of bladder meridian, was seldom selected in the ancient acupuncture literature, which was different from modern literature reports. The results show that urinary incontinence belongs to external genitalia diseases, which should be treated from yin, indicating more yin-meridians be used and special acupoints be focused on. It is essential to focus inheritance and innovation in TCM clinical treatment, and applying data mining technology to ancient literature of acupuncture could provide classic theory basis for TCM clinical treatment. PMID:26964186

  19. [Acupoints selection rules analysis of ancient acupuncture for urinary incontinence based on data mining technology].

    Science.gov (United States)

    Zhang, Wei; Tan, Zhigao; Cao, Juanshu; Gong, Houwu; Qin, Zuoai; Zhong, Feng; Cao, Yue; Wei, Yanrong

    2015-12-01

    Based on ancient literature of acupuncture in Canon of Chinese Medicine (4th edition), the articles regarding acupuncture for urinary incontinence were retrieved and collected to establish a database. By Weka data mining software, the multi-level association rules analysis method was applied to analyze the acupoints selection characteristics and rules of ancient acupuncture for treatment of urinary incontinence. Totally 356 articles of acupuncture for urinary incontinence were collected, involving 41 acupoints with a total frequency of 364. As a result, (1) the acupoints in the yin-meridian of hand and foot were highly valued, as the frequency of acupoints in yin-meridians was 2.6 times than that in yang-meridians, and the frequency of acupoints selected was the most in the liver meridian of foot-jueyin; (2) the acupoints in bladder meridian of foot-taiyang were also highly valued, and among three yang-meridians of foot, the frequency of acupoints in the bladder meridian of foot-taiyang was 54, accounting for 65.85% (54/82); (3) more acupoints selected were located in the lower limbs and abdomen; (4) specific acupoints in above meridians were mostly selected, presenting 73.2% (30/41) to the ratio of number and 79.4% (289/364) to the frequency, respectively; (5) Zhongji (CV 3), the front-mu point of bladder meridian, was seldom selected in the ancient acupuncture literature, which was different from modern literature reports. The results show that urinary incontinence belongs to external genitalia diseases, which should be treated from yin, indicating more yin-meridians be used and special acupoints be focused on. It is essential to focus inheritance and innovation in TCM clinical treatment, and applying data mining technology to ancient literature of acupuncture could provide classic theory basis for TCM clinical treatment.

  20. Greedy algorithms withweights for construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-09-10

    This paper is devoted to the study of approximate algorithms for minimization of the total weight of attributes occurring in partial association rules. We consider mainly greedy algorithms with weights for construction of rules. The paper contains bounds on precision of these algorithms and bounds on the minimal weight of partial association rules based on an information obtained during the greedy algorithm run.

  1. Formal and Computational Properties of the Confidence Boost of Association Rules

    CERN Document Server

    Balcázar, José L

    2011-01-01

    Some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. One can push the intuition of redundancy further and find an intuitive notion of interest of an association rule, in terms of its "novelty" with respect to other rules. Namely: an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher? We propose to measure such a sort of "novelty" through the confidence boost of a rule, which encompasses two previous similar notions (confidence width and rule blocking, of which the latter is closely related to the earlier measure "improvement"). Acting as a complement to confidence and support, the confidence boost helps to obtain small and crisp sets of mined association rules, and solves the well-known problem that, in certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two version...

  2. FSRM: A Fast Algorithm for Sequential Rule Mining

    Directory of Open Access Journals (Sweden)

    Anjali Paliwal

    2014-10-01

    Full Text Available Recent developments in computing and automation technologies have resulted in computerizing business and scientific applications in various areas. Turing the massive amounts of accumulated information into knowledge is attracting researchers in numerous domains as well as databases, machine learning, statistics, and so on. From the views of information researchers, the stress is on discovering meaningful patterns hidden in the massive data sets. Hence, a central issue for knowledge discovery in databases, additionally the main focus of this paper, is to develop economical and scalable mining algorithms as integrated tools for management systems.

  3. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    OpenAIRE

    Ujjwal Maulik; Saurav Mallik; Anirban Mukhopadhyay; Sanghamitra Bandyopadhyay

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the bio...

  4. Association and Sequence Mining in Web Usage

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-06-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. The focus of this paper is to provide an overview how to use frequent pattern techniques for discovering different types of patterns in a Web log database. In this paper we will focus on finding association as a data mining technique to extract potentially useful knowledge from web usage data. I implemented in Java, using NetBeans IDE, a program for identification of pages’ association from sessions. For exemplification, we used the log files from a commercial web site.

  5. Research on e-commerce commodity recommendation system based on mining algorithm of weighted association rules%基于加权关联规则挖掘算法的电子商务商品推荐系统研究

    Institute of Scientific and Technical Information of China (English)

    郝海涛; 马元元

    2016-01-01

    To solve the direct commodity rapid and accurate matching problem between electronic shoppers and merchants, the e⁃commerce commodity recommendation system based on mining algorithm of weighted association rules is researched. Ai⁃ming at the insufficiency of the classic Apriori algorithm,a new weighted fuzzy association rules mining algorithm is put forward to ensure the downward closure of frequent item sets. The work flow of the recommendation system was tested through the struc⁃tural design of e⁃commerce recommendation system,data preprocessing module design and recommendation module design. The hit rate is selected as the evaluation standard of different recommendation models. The contrastive analysis for the practical col⁃lected data was conducted with the half⁃off cross test method. The experimental results show that the hit rate of Top⁃N products in association rule set is significantly higher than that of the interest recommendation method and best selling recommendation method.%为了解决电子购物者和商家直接的商品快速、准确匹配问题,进行基于加权关联规则挖掘算法的电子商务商品推荐系统研究。首先指出了经典Apriori算法的缺点和不足,并提出一种新的加权模糊关联挖掘模型算法,以保证频繁项集的向下封闭性;通过对电子商务推荐系统的结构化设计、数据预处理模块设计、推荐模块设计,完成了推荐系统的工作流程测试;最后选取命中率作为不同推荐模型的评价标准,通过五折交叉试验法对实际采集数据进行了对比分析,试验结果表明关联规则集的Top⁃N产品命中率要明显高于兴趣推荐和畅销推荐法。

  6. Finding Exception For Association Rules Via SQL Queries

    Directory of Open Access Journals (Sweden)

    Luminita DUMITRIU

    2000-12-01

    Full Text Available Finding association rules is mainly based on generating larger and larger frequent set candidates, starting from frequent attributes in the database. The frequent sets can be organised as a part of a lattice of concepts according to the Formal Concept Analysis approach. Since the lattice construction is database contents-dependent, the pseudo-intents (see Formal Concept Analysis are avoided. Association rules between concept intents (closed sets A=>B are partial implication rules, meaning that there is some data supporting A and (not B; fully explaining the data requires finding exceptions for the association rules. The approach applies to Oracle databases, via SQL queries.

  7. Generalization-based discovery of spatial association rules with linguistic cloud models

    Institute of Scientific and Technical Information of China (English)

    杨斌; 田永青; 朱仲英

    2004-01-01

    Extraction of interesting and general spatial association rules from large spatial databases is an important task in the development of spatial database systems. In this paper, we investigate the generalization-based knowledge discovery mechanism that integrates attribute-oriented induction on nonspatial data and spatial merging and generalization on spatial data. Furthermore, we present linguistic cloud models for knowledge representation and uncertainty handling to enhance current generalization-based method. With these models, spatial and nonspatial attribute values are well generalized at higher-concept levels, allowing discovery of strong spatial association rules. Combining the cloud model based generalization method with Apriori algorithm for mining association rules from a spatial database shows the benefits in effectiveness and flexibility.

  8. Mining tree-query associations in graphs

    CERN Document Server

    Hoekx, Eveline

    2010-01-01

    New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

  9. Mining Compatibility Rules from Irregular Chinese Traditional Medicine Database by Apriori Agorithm

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper aims to mine the knowledge and rules on compatibility of drugs from the prescriptions for curing arrhythmia in the Chinese traditional medicine database by Apriori algorithm. For data preparation, 1 113 prescriptions for arrhythmia, including 535herbs (totally 10884 counts of herbs) were collected into the database. The prescription data were preprocessed through redundancy reduction, normalized storage, and knowledge induction according to the pretreatment demands of data mining. Then the Apriori algorithm was used to analyze the data and form the related technical rules and treatment procedures. The experimental result of compatibility of drugs for curing arrhythmia from the Chinese traditional medicine database shows that the prescription compatibility obtained by Apriori algorithm generally accords with the basic law of traditional Chinese medicine for arrhythmia. Some special compatibilities unreported were also discovered in the experiment, which may be used as the basis for developing new prescriptions for arrhythmia.

  10. The role of semantics in mining frequent patterns from knowledge bases in description logics with rules

    CERN Document Server

    Jozefowska, Joanna; Lukaszewski, Tomasz

    2010-01-01

    We propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular we consider the setting of using a language that combines description logics with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to the task of frequent pattern discovery, and for the core of our method, we propose an algorithm that exploits the semantics of the combined knowledge base. We have developed a proof-of-concept data mining implementation of this. Using this we have empirically shown that using the combined knowledge base to perform semantic tests can make data mining faster by pruning useless candidate patterns before their evaluation. We have also shown that the quality of the set of patterns produced may be improved: the patterns are more compact, and there are fewer patterns. We conclude that exploiting the semantics of a chosen r...

  11. AN EVALUATION APPROACH FOR THE PROGRAM OF ASSOCIATION RULES ALGORITHM BASED ON METAMORPHIC RELATIONS

    Institute of Scientific and Technical Information of China (English)

    Zhang Jing; Hu Xuegang; Zhang Bin

    2011-01-01

    As data mining more and more popular applied in computer system,the quality assurance test of its software would be get more and more attention.However,because of the existence of the ‘oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.

  12. Rule Based System for Enhancing Recall for Feature Mining from Short Sentences in Customer Review Documents

    Directory of Open Access Journals (Sweden)

    Tanvir Ahmad

    2012-06-01

    Full Text Available This paper discovers rules for enhancing the recall values of sentences containing opinions from customer review documents. It does so by mining the features and opinion from different blogs, news site, and review sites. With the advent of numerous web sites which are posting online reviews and opinion there has been exponential growth of user generated contents. Since almost all the contents are stored in unstructured or semi-structured format, mining of features and opinions from it has become a challenging task. The paper extracts features and thereby opinions sentences using semantic and linguistic analysis of text documents. The polarity of the extracted opinions is established using numeric score values obtained through Senti- WordNet. The system shows that normal rules discovered earlier are not sufficient to improve recall values as some of the opinions does not contain sentences which are linguistically correct but they express the main idea what the writer wants to convey about his opinion on a particular product. Our experiment uses a method which first identifies short sentences and then uses rules which can be applied on those sentences so that the recall values are enhanced. The paper also applies rules on sentences which are linguistically and syntactically incorrect. The efficacy of the system is established through experimentation over customer reviews on four different models of digital camera, and iPhone.

  13. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    Science.gov (United States)

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  14. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution. The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post

  15. An Object Extraction Model Using Association Rules and Dependence Analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Extracting objects from legacy systems is a basic step insystem's obje ct-orientation to improve the maintainability and understandability of the syst e ms. A new object extraction model using association rules an d dependence analysis is proposed. In this model data are classified by associat ion rules and the corresponding operations are partitioned by dependence analysis.

  16. 一种关联规则挖掘方法在客户分析中的应用%Research of the Application on Customer Analysis of an Association Rules Mining Method

    Institute of Scientific and Technical Information of China (English)

    沈元怿

    2005-01-01

    数据挖掘(DataMining)是数据库系统和数据库应用的一个繁荣的学科前沿.Apriori算法作为数据挖掘中关联规则挖掘的算法之一,是一种最有影响的挖掘布尔关联规则频繁项集的算法.本文主要探讨Apriori算法的实现细节及其结合在电信业中的实现过程,并通过对实际数据的分析提出提高电信业务量的建议.

  17. MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RULE DISCOVERY

    Directory of Open Access Journals (Sweden)

    Irene Kahvazadeh

    2015-11-01

    Full Text Available Extracting association rules from numeric features involves searching a very large search space. To deal with this problem, in this paper a meta-heuristic algorithm is used that we have called MOCANAR. The MOCANAR is a Pareto based multi-objective cuckoo search algorithm which extracts high quality association rules from numeric datasets. The support, confidence, interestingness and comprehensibility are the objectives that have been considered in the MOCANAR. The MOCANAR extracts rules incrementally, in which, in each run of the algorithm, a small number of high quality rules are made. In this paper, a comprehensive taxonomy of metaheuristic algorithm have been presented. Using this taxonomy, we have decided to use a Cuckoo Search algorithm because this algorithm is one of the most matured algorithms and also, it is simple to use and easy to comprehend. In addition, until now, to our knowledge this method has not been used as a multi-objective algorithm and has not been used in the association rule mining area. To demonstrate the merit and associated benefits of the proposed methodology, the methodology has been applied to a number of datasets and high quality results in terms of the objectives were extracted.

  18. Fuzzy association rules for biological data analysis: A case study on yeast

    Directory of Open Access Journals (Sweden)

    Cano Carlos

    2008-02-01

    Full Text Available Abstract Background Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data. Results In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones. Conclusion An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

  19. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

    Institute of Scientific and Technical Information of China (English)

    Cai-Yan Jia; Xie-Ping Gao

    2005-01-01

    One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off.

  20. Association Rule Extraction from XML Stream Data for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Juryon Paik

    2014-07-01

    Full Text Available With the advances of wireless sensor networks, they yield massive volumes of disparate, dynamic and geographically-distributed and heterogeneous data. The data mining community has attempted to extract knowledge from the huge amount of data that they generate. However, previous mining work in WSNs has focused on supporting simple relational data structures, like one table per network, while there is a need for more complex data structures. This deficiency motivates XML, which is the current de facto format for the data exchange and modeling of a wide variety of data sources over the web, to be used in WSNs in order to encourage the interchangeability of heterogeneous types of sensors and systems. However, mining XML data for WSNs has two challenging issues: one is the endless data flow; and the other is the complex tree structure. In this paper, we present several new definitions and techniques related to association rule mining over XML data streams in WSNs. To the best of our knowledge, this work provides the first approach to mining XML stream data that generates frequent tree items without any redundancy.

  1. A spatiotemporal mining framework for abnormal association patterns in marine environments with a time series of remote sensing images

    Science.gov (United States)

    Xue, Cunjin; Song, Wanjiao; Qin, Lijuan; Dong, Qing; Wen, Xiaoyang

    2015-06-01

    A spatiotemporal mining framework is a novel tool for the analysis of marine association patterns using multiple remote sensing images. From data pretreatment, to algorithm design, to association rule mining and pattern visualization, this paper outlines a spatiotemporal mining framework for abnormal association patterns in marine environments, including pixel-based and object-based mining models. Within this framework, some key issues are also addressed. In the data pretreatment phase, we propose an algorithm for extracting abnormal objects or pixels over marine surfaces, and construct a mining transaction table with object-based and pixel-based strategies. In the mining algorithm phase, a recursion method to construct a direct association pattern tree is addressed with an asymmetric mutual information table, and a recursive mining algorithm to find frequent items. In the knowledge visualization phase, a "Dimension-Attributes" visualization framework is used to display spatiotemporal association patterns. Finally, spatiotemporal association patterns for marine environmental parameters in the Pacific Ocean are identified, and the results prove the effectiveness and the efficiency of the proposed mining framework.

  2. Analysis 320 coal mine accidents using structural equation modeling with unsafe conditions of the rules and regulations as exogenous variables.

    Science.gov (United States)

    Zhang, Yingyu; Shao, Wei; Zhang, Mengjia; Li, Hejun; Yin, Shijiu; Xu, Yingjun

    2016-07-01

    Mining has been historically considered as a naturally high-risk industry worldwide. Deaths caused by coal mine accidents are more than the sum of all other accidents in China. Statistics of 320 coal mine accidents in Shandong province show that all accidents contain indicators of "unsafe conditions of the rules and regulations" with a frequency of 1590, accounting for 74.3% of the total frequency of 2140. "Unsafe behaviors of the operator" is another important contributory factor, which mainly includes "operator error" and "venturing into dangerous places." A systems analysis approach was applied by using structural equation modeling (SEM) to examine the interactions between the contributory factors of coal mine accidents. The analysis of results leads to three conclusions. (i) "Unsafe conditions of the rules and regulations," affect the "unsafe behaviors of the operator," "unsafe conditions of the equipment," and "unsafe conditions of the environment." (ii) The three influencing factors of coal mine accidents (with the frequency of effect relation in descending order) are "lack of safety education and training," "rules and regulations of safety production responsibility," and "rules and regulations of supervision and inspection." (iii) The three influenced factors (with the frequency in descending order) of coal mine accidents are "venturing into dangerous places," "poor workplace environment," and "operator error." PMID:27085591

  3. Analysis 320 coal mine accidents using structural equation modeling with unsafe conditions of the rules and regulations as exogenous variables.

    Science.gov (United States)

    Zhang, Yingyu; Shao, Wei; Zhang, Mengjia; Li, Hejun; Yin, Shijiu; Xu, Yingjun

    2016-07-01

    Mining has been historically considered as a naturally high-risk industry worldwide. Deaths caused by coal mine accidents are more than the sum of all other accidents in China. Statistics of 320 coal mine accidents in Shandong province show that all accidents contain indicators of "unsafe conditions of the rules and regulations" with a frequency of 1590, accounting for 74.3% of the total frequency of 2140. "Unsafe behaviors of the operator" is another important contributory factor, which mainly includes "operator error" and "venturing into dangerous places." A systems analysis approach was applied by using structural equation modeling (SEM) to examine the interactions between the contributory factors of coal mine accidents. The analysis of results leads to three conclusions. (i) "Unsafe conditions of the rules and regulations," affect the "unsafe behaviors of the operator," "unsafe conditions of the equipment," and "unsafe conditions of the environment." (ii) The three influencing factors of coal mine accidents (with the frequency of effect relation in descending order) are "lack of safety education and training," "rules and regulations of safety production responsibility," and "rules and regulations of supervision and inspection." (iii) The three influenced factors (with the frequency in descending order) of coal mine accidents are "venturing into dangerous places," "poor workplace environment," and "operator error."

  4. [Analysis on medication rules of state medical master Yan Zhenghua from prescriptions with citri reticulatae pericarpium based on data mining].

    Science.gov (United States)

    Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Bing; Zhang, Xiao-Meng; Yang, Bing; Sheng, Xiao-Guang

    2014-02-01

    The prescriptions containing pericarpium citri reticulatae that built by Professor. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. After analyzed by data mining, such as apriori algorithm, the frequency of single medicine, the frequency of drug combination, the association rules between drugs and core drug combinations can be get from the database. Through the analysis of 1 027 prescriptions with pericarpium citri reticulatae, these prescriptions were commonly used to treat stomach aches, cough and other syndromes. The most frequency drug combinations were "Citri Reticulatae Pericarpium-Poria", "Paeoniae Radix Rubra-Citri Reticulatae Pericarpium" and so on. The drug association rules that the confidence was 1 were "Glycyrrhizae Radix ex Rhizoma --> Citri Reticulatae Pericarpium", "Paeoniae Alba Radix-Cyperi Rhizoma --> Citri Reticulatae Pericarpium", "Poria --> Citri Reticulatae Pericarpium", and so on. The drugs in the prescriptions containing pericarpium citri reticulatae that built by Professor Yan mostly had the effects of regulating the flow of Qi and invigorate blood circulation, which reflected the clearly thought when making prescriptions.

  5. Research on spatial state conversion rule mining and stochastic predicting based on CA

    Science.gov (United States)

    Li, Xinyun; Kong, Xiangqiang

    2007-06-01

    Spatial dynamic prediction in GIS is the process of spatial calculation that infers the thematic maps in future according to the historical thematic maps, and it is space-time calculation from map to map. There is great application value that spatial dynamic prediction applied to the land planning, urban land-use planning and town planning, but there is some imperfect in method and technique at present. The main technical difficulty is excavation and expression of spatial state conversion rule. In allusion to the deficiency in spatial dynamic prediction using CA, the method which excavated spatial state conversion rule based on spatial data mining was put forward. Stochastic simulation mechanism was put into the prediction calculating based on state conversion rule. The result of prediction was more rational and the relation between the prediction steps and the time course was clearer. The method was applied to prediction of spatial structure change of urban land-use in Jinan. The Urban land-use change maps were predicted in 2006 and 2010 by using the land-use maps in 1998 and 2002. The result of this test was rational by analyzing.

  6. NIA2: A fast indirect association mining algorithm

    Institute of Scientific and Technical Information of China (English)

    NI Min; XU Xiao-fei; DENG Sheng-chun; WEN Xiao-xian

    2005-01-01

    Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis,web - log analysis, recommended system, etc. Existing indirect association mining algorithms are mostly based on the notion of post - processing of discovery of frequent item sets. In the mining process, all frequent item sets need to be generated first, and then they are filtered and joined to form indirect associations. We have presented an indirect association mining algorithm (NIA) based on anti - monotonicity of indirect associations whereas k candidate indirect associations can be generated directly from k - 1 candidate indirect associations,without all frequent item sets generated. We also use the frequent itempair support matrix to reduce the time and memory space needed by the algorithm. In this paper, a novel algorithm (NIA2) is introduced based on the generation of indirect association patterns between itempairs through one item mediator sets from frequent itempair support matrix. A notion of mediator set support threshold is also presented. NIA2 mines indirect association patterns directly from the dataset, without generating all frequent item sets. The frequent itempair support matrix and the notion of using tm as the support threshold for mediator sets can significantly reduce the cost of joint operations and the search process compared with existing algorithms. Results of experiments on a realword web log dataset have proved NIA2 one order of magnitude faster than existing algorithms.

  7. Design and implementation of data mining tools

    CERN Document Server

    Thuraisingham, Bhavani; Awad, Mamoun

    2009-01-01

    DATA MINING TECHNIQUES AND APPLICATIONS IntroductionTrendsData Mining Techniques and ApplicationsData Mining for Cyber Security: Intrusion DetectionData Mining for Web: Web Page Surfing PredictionData Mining for Multimedia: Image ClassificationOrganization of This BookNext StepsData Mining TechniquesIntroductionOverview of Data Mining Tasks and TechniquesArtificial Neural NetworksSupport Vector MachinesMarkov ModelAssociation Rule Mining (ARM)Multiclass ProblemImage MiningSummaryData Mining ApplicationsIntroductionIntrusion DetectionWeb Page Surfing PredictionImage ClassificationSummaryDATA MI

  8. Data Mining for Gene Networks Relevant to Poor Prognosis in Lung Cancer via Backward-Chaining Rule Induction

    Directory of Open Access Journals (Sweden)

    Zhihua Chen

    2007-01-01

    Full Text Available We use Backward Chaining Rule Induction (BCRI, a novel data mining method for hypothesizing causative mechanisms, to mine lung cancer gene expression array data for mechanisms that could impact survival. Initially, a supervised learning system is used to generate a prediction model in the form of “IF THEN ” style rules. Next, each antecedent (i.e. an IF condition of a previously discovered rule becomes the outcome class for subsequent application of supervised rule induction. This step is repeated until a termination condition is satisfi ed. “Chains” of rules are created by working backward from an initial condition (e.g. survival status. Through this iterative process of “backward chaining,” BCRI searches for rules that describe plausible gene interactions for subsequent validation. Thus, BCRI is a semi-supervised approach that constrains the search through the vast space of plausible causal mechanisms by using a top-level outcome to kick-start the process. We demonstrate the general BCRI task sequence, how to implement it, the validation process, and how BCRI-rules discovered from lung cancer microarray data can be combined with prior knowledge to generate hypotheses about functional genomics.

  9. Re-mining item associations: methodology and a case study in apparel retailing

    OpenAIRE

    Demiriz, Ayhan; Ertek, Gürdal; Ertek, Gurdal; Atan, Tankut; Kula, Ufuk

    2011-01-01

    Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negat...

  10. NIOSH comments to DOL on the Mine Safety and Health Administration proposed rule on safety standards for underground coal mine ventilation by R. W. Niemeier, August 19, 1988

    Energy Technology Data Exchange (ETDEWEB)

    Niemeier, R.W.

    1988-08-19

    The testimony concerns the Mine Safety and Health Administration's (MSHA's) proposed rule on safety standards for underground coal mine ventilation. These post hearing comments are given in support of the comments and testimony previously submitted by NIOSH. These comments specifically concern opposition to prohibiting flame safety lamps and additional evidence and comments in support of NIOSH's previously stated opposition to the use of belt haulage entries as intake air courses. The flame safety lamp offers a number of ergonomic and psychological advantages over presently available continuous reading methane and oxygen monitors. It has a long history of reliable performance in the mines. A flame safety lamp can become an explosion hazard if improperly maintained or carelessly operated. The effect of ventilation rate on flame propagation and fire intensity of conveyor belts and dust dispersion are discussed in relationship to the use of belt haulage entries as intake air courses.

  11. CONTENT BASED MEDICAL IMAGE RETRIEVAL USING BINARY ASSOCIATION RULES

    OpenAIRE

    Akila; Uma Maheswari

    2013-01-01

    In this study, we propose a content-based medical image retrieval framework based on binary association rules to augment the results of medical image diagnosis, for supporting clinical decision making. Specifically, this work is employed on scanned Magnetic Resonance brain Images (MRI) and the proposed Content Based Image Retrieval (CBIR) process is for enhancing relevancy rate of retrieved images. The pertinent features of a query brain image are extracted by applying third order moment inva...

  12. Compact Tree for Associative Classification of Data Stream Mining

    Directory of Open Access Journals (Sweden)

    K.Prasanna Lakshmi

    2012-03-01

    Full Text Available The data streams have recently emerged to address the problems of continuous data. Mining with data streams is the process of extracting knowledge structures from continuous, rapid data records [1]. An important goal in data stream mining is generation of compact representation of data. This helps in reducing time and space needed for further decision making process. In this paper we propose a new scheme called Prefix Stream Tree (PST for associative classification. This helps in compact storage of data streams. This PSTree is generated in a single scan. This tree efficiently discovers the exact set of patterns from data streams using sliding window.

  13. Collaborative Data Mining Tool for Education

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

    2009-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…

  14. Discovery of Web Topic-Specific Association Rules%Web主题关联知识自学习算法

    Institute of Scientific and Technical Information of China (English)

    杨沛; 郑启伦; 彭宏

    2003-01-01

    There are hidden and rich information for data mining in the topology of topic-specific websites. A new topic-specific association rules mining algorithm is proposed to further the research on this area. The key idea is to analyze the frequent hyperlinked relati ons between pages of different topics. In the topic-specific area, if pages of onetopic are frequently hyperlinked by pages of another topic, we consider the two topics are relevant. Also, if pages oftwo different topics are frequently hyperlinked together by pages of the other topic, we consider the two topics are relevant.The initial experiments show that this algorithm performs quite well while guiding the topic-specific crawling agent and it can be applied to the further discovery and mining on the topic-specific website.

  15. Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

    International Nuclear Information System (INIS)

    The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based PanDA job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job's owner immedeatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehavior. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitoring data to provide job and site health summary information to users and admins is presented. Finally, the provision of a secure real-time control and steering channel to the job as extension of the presented monitoring software is considered and a possible model of such the control method is presented.

  16. Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

    CERN Document Server

    Ahrens, R; The ATLAS collaboration; Kalinin, S; Maettig, P; Sandhoff, M; dos Santos, T; Volkmer, F

    2012-01-01

    The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based “PanDA” job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job’s owner immideatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehaviour. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitorin...

  17. A guided search genetic algorithm using mined rules for optimal affective product design

    Science.gov (United States)

    Fung, Chris K. Y.; Kwong, C. K.; Chan, Kit Yan; Jiang, H.

    2014-08-01

    Affective design is an important aspect of new product development, especially for consumer products, to achieve a competitive edge in the marketplace. It can help companies to develop new products that can better satisfy the emotional needs of customers. However, product designers usually encounter difficulties in determining the optimal settings of the design attributes for affective design. In this article, a novel guided search genetic algorithm (GA) approach is proposed to determine the optimal design attribute settings for affective design. The optimization model formulated based on the proposed approach applied constraints and guided search operators, which were formulated based on mined rules, to guide the GA search and to achieve desirable solutions. A case study on the affective design of mobile phones was conducted to illustrate the proposed approach and validate its effectiveness. Validation tests were conducted, and the results show that the guided search GA approach outperforms the GA approach without the guided search strategy in terms of GA convergence and computational time. In addition, the guided search optimization model is capable of improving GA to generate good solutions for affective design.

  18. Study on the change rule of groundwater level and its impacts on vegetation at arid mining area

    Institute of Scientific and Technical Information of China (English)

    LEI Shao-gang; BIAN Zheng-fu; ZHANG Ri-chen; LI Lin

    2007-01-01

    The shallow groundwater in Shendong mining area was broken because of large-scale underground mining activities. Selecting 32201 working-face as research area,analyzed the change rule of groundwater level and aquifer thickness under mining impact with a large number of water level observation data. Then, the impacts of groundwater level change on vegetation were analyzed by the relationship theory of arid area groundwater and vegetation. The results show that the aquifer structure and the water condition of supply flow and drainage are changed by the water proof mining. The groundwater level recovere only a little compared with the original groundwater level in two years. But the great change of groundwater level do not have notable influences on vegetation of this mining area, and further study indicates that there are certain conditions where groundwater level change impacted on vegetation. When the influence of groundwater level change was evaluated, the plant ecological water level, warning water level and spatial distribution character of original groundwater and mining-impacted groundwater-level change should be integrated.

  19. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  20. Finding Influential Users in Social Media Using Association Rule Learning

    Directory of Open Access Journals (Sweden)

    Fredrik Erlandsson

    2016-04-01

    Full Text Available Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods.

  1. Finding Influential Users in Social Media Using Association Rule Learning

    Science.gov (United States)

    Erlandsson, Fredrik; Bródka, Piotr; Borg, Anton; Johnson, Henric

    2016-04-01

    Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods.

  2. Respirable quartz hazard associated with coal mine roof bolter dust

    Energy Technology Data Exchange (ETDEWEB)

    Joy, G.J.; Beck, T.W.; Listak, J.M. [National Inst. for Occupational Safety and Health, Pittsburgh, PQ (United States)

    2010-07-01

    Pneumoconiosis has been reported to be increasing among underground coal miners in the Southern Appalachian Region. The National Institute for Occupational Safety and Health conducted a study to examine the particle size distribution and quartz content of dust generated by the installation of roof bolts in mines. Forty-six bulk samples of roof bolting machine pre-cleaner cyclone dump dust and collector box dust were collected from 26 underground coal mines. Real-time and integrated airborne respirable dust concentrations were measured on 3 mining sections in 2 mines. The real-time airborne dust concentrations profiles were examined to identify any concentration changes that might be associated with pre-cleaner cyclone dust discharge events. The study showed that bolter dust is a potential inhalation hazard due to the fraction of dust less than 10 {mu}m in size, and the quartz content of the dust. The pre-cleaner cyclone dust was significantly larger than the collector box dust, indicating that the pre-cleaner functioned properly in removing the larger dust size fraction from the airstream. However, the pre-cleaner dust still contained a substantial amount of respirable dust. It was concluded that in order to maintain the effectiveness of a roof bolter dust collector, periodic removal of dust is required. Appropriate work procedures and equipment are necessary to minimize exposure during this cleaning task. 13 refs., 3 tabs., 2 figs.

  3. Mining φ-Frequent Itemset Using FP-Tree

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ-association rule mining. It allows people to have different interests on different itemsets that are the need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP-tree for mining φ-frequent itemset is presented. It is shown by experiments that the proposed method is efficient and scalable over large databases.

  4. Predicting the risk associated to pregnancy using data mining

    OpenAIRE

    Machado, José Manuel; Abelha, António; Santos, Manuel; Portela, Filipe; Pereira, Eliana; Brandão, Andreia

    2015-01-01

    Woman willing to terminate pregnancy should in general use a specialized health unit, as it is the case of Maternidade Júlio Dinis in Porto, Portugal. One of the four stages comprising the process is evaluation. The purpose of this article is to evaluate the process of Voluntary Termination of Pregnancy and, consequently, identify the risk associated to the patients. Data Mining (DM) models were induced to predict the risk in a real environment. Three different techniques were considered: Dec...

  5. Urban association rules: uncovering linked trips for shopping behavior

    CERN Document Server

    Yoshimura, Yuji; Hobin, Juan N Bautista; Ratti, Carlo; Blat, Josep

    2016-01-01

    In this article, we introduce the method of urban association rules and its uses for extracting frequently appearing combinations of stores that are visited together to characterize shoppers' behaviors. The Apriori algorithm is used to extract the association rules (i.e., if -> result) from customer transaction datasets in a market-basket analysis. An application to our large-scale and anonymized bank card transaction dataset enables us to output linked trips for shopping all over the city: the method enables us to predict the other shops most likely to be visited by a customer given a particular shop that was already visited as an input. In addition, our methodology can consider all transaction activities conducted by customers for a whole city in addition to the location of stores dispersed in the city. This approach enables us to uncover not only simple linked trips such as transition movements between stores but also the edge weight for each linked trip in the specific district. Thus, the proposed methodo...

  6. 基于行为分析的篮球领域规律挖掘的应用研究%Basketball Domain Rule Mining Application Research Based on Behaviour Analysis

    Institute of Scientific and Technical Information of China (English)

    马萌; 于重重; 陈钧; 郭雪

    2014-01-01

    关联规则挖掘过程重视算法研究,忽视商业需求,使挖掘结果很难满足商业目标,提出基于行为分析的篮球领域规律挖掘模型。模型使用行为分析概念建立待挖掘数据库,并且在整个挖掘过程中以领域知识为基准,使用技术兴趣度和商业兴趣度相结合的方法来发现满足规则评价标准的深度规则。以中国男子篮球职业联赛( China Basketball Association, CBA)2012赛季比赛数据为例,进行实验和分析。实验结果表明了基于行为分析的篮球领域规律挖掘模型的有效性和实用性。%To deal with the problems of attaching importance to algorithm research and ignoring the business needs in association rule mining and leading to the mining results that it is difficult to meet business objectives, a mining model based on the analysis of behavior of the basketball domain rule was proposed. We used the concept of behavior analysis to set up mining database. The whole mining process was based on domain knowledge, using the method which combines the degrees of technical interest and commercial interest in order to find the deep rules that meet the evaluation criteria. According to the game stats of season 2012 in CBA, some experiments and analysis were carried out. The results demonstrate the effectiveness and practicality of the mining model based on the analysis of the behav-ior of basketball domain rules.

  7. NV - Assessment of wildlife hazards associated with mine pit lakes

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Several open pit mines in Nevada lower groundwater to mine ore below the water table. After mining, the pits partially fill with groundwater to form pit lakes....

  8. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems

    Directory of Open Access Journals (Sweden)

    Brian Foo

    2009-01-01

    Full Text Available Networks of classifiers can offer improved accuracy and scalability over single classifiers by utilizing distributed processing resources and analytics. However, they also pose a unique combination of challenges. First, classifiers may be located across different sites that are willing to cooperate to provide services, but are unwilling to reveal proprietary information about their analytics, or are unable to exchange their analytics due to the high transmission overheads involved. Furthermore, processing of voluminous stream data across sites often requires load shedding approaches, which can lead to suboptimal classification performance. Finally, real stream mining systems often exhibit dynamic behavior and thus necessitate frequent reconfiguration of classifier elements to ensure acceptable end-to-end performance and delay under resource constraints. Under such informational constraints, resource constraints, and unpredictable dynamics, utilizing a single, fixed algorithm for reconfiguring classifiers can often lead to poor performance. In this paper, we propose a new optimization framework aimed at developing rules for choosing algorithms to reconfigure the classifier system under such conditions. We provide an adaptive, Markov model-based solution for learning the optimal rule when stream dynamics are initially unknown. Furthermore, we discuss how rules can be decomposed across multiple sites and propose a method for evolving new rules from a set of existing rules. Simulation results are presented for a speech classification system to highlight the advantages of using the rules-based framework to cope with stream dynamics.

  9. Population cancer risks associated with coal mining: a systematic review.

    Directory of Open Access Journals (Sweden)

    Wiley D Jenkins

    Full Text Available BACKGROUND: Coal is produced across 25 states and provides 42% of US energy. With production expected to increase 7.6% by 2035, proximate populations remain at risk of exposure to carcinogenic coal products such as silica dust and organic compounds. It is unclear if population exposure is associated with increased risk, or even which cancers have been studied in this regard. METHODS: We performed a systematic review of English-language manuscripts published since 1980 to determine if coal mining exposure was associated with increased cancer risk (incidence and mortality. RESULTS: Of 34 studies identified, 27 studied coal mining as an occupational exposure (coal miner cohort or as a retrospective risk factor but only seven explored health effects in surrounding populations. Overall, risk assessments were reported for 20 cancer site categories, but their results and frequency varied considerably. Incidence and mortality risk assessments were: negative (no increase for 12 sites; positive for 1 site; and discordant for 7 sites (e.g. lung, gastric. However, 10 sites had only a single study reporting incidence risk (4 sites had none, and 11 sites had only a single study reporting mortality risk (2 sites had none. The ecological study data were particularly meager, reporting assessments for only 9 sites. While mortality assessments were reported for each, 6 had only a single report and only 2 sites had reported incidence assessments. CONCLUSIONS: The reported assessments are too meager, and at times contradictory, to make definitive conclusions about population cancer risk due to coal mining. However, the preponderance of this and other data support many of Hill's criteria for causation. The paucity of data regarding population exposure and risk, the widespread geographical extent of coal mining activity, and the continuing importance of coal for US energy, warrant further studies of population exposure and risk.

  10. The spatiotempora variations rules of Songzao coal mining subsidence based on numerical simulation

    OpenAIRE

    Lu, J.; Li, Y.; Cheng, H.; Tang, Z.(University of Science & Technology of China, Hefei, 230026, China)

    2015-01-01

    With the increasing demand of coal, coal mining at Songzao makes the area of land subsidence growing larger. Land subsidence in coal mining area not only made large subsided farmland out of production and caused the enormous loss to local agricultural production, but also brought a number of serious problems to the local social economy and ecology Environment. To use Probability-integral Method based on numerical simulation of Songzao Mine, its subsidence simulation data fro...

  11. An improved predictive association rule based classifier using gain ratio and T-test for health care data diagnosis

    Indian Academy of Sciences (India)

    M Nandhini; S N Sivanandam

    2015-09-01

    Health care data diagnosis is a significant task that needs to be executed precisely, which requires much experience and domain-knowledge. Traditional symptoms-based disease diagnosis may perhaps lead to false presumptions. In recent times, Associative Classification (AC), the combination of association rule mining and classification has received attention in health care applications which desires maximum accuracy. Though several AC techniques exist, they lack in generating quality rules for building efficient associative classifier. This paper aims to enhance the accuracy of the existing CPAR (Classification based on Predictive Association Rule) algorithm by generating quality rules using Gain Ratio. Mostly, health care applications deal with high dimensional datasets. Existence of high dimensions causes unfair estimates in disease diagnosis. Dimensionality reduction is commonly applied as a preprocessing step before classification task to improve classifier accuracy. It eliminates redundant and insignificant dimensions by keeping good ones without information loss. In this work, dimensionality reductions by T-test and reduct sets (or simply reducts) are performed as preprocessing step before CPAR and CPAR using Gain Ratio (CPAR-GR) algorithms. An investigation was also performed to determine the impact of T-test and reducts on CPAR and CPAR-GR. This paper synthesizes the existing work carried out in AC, and also discusses the factors that influence the performance of CPAR and CPAR-GR. Experiments were conducted using six health care datasets from UCI machine learning repository. Based on the experiments, CPAR-GR with T-test yields better classification accuracy than CPAR.

  12. 3D reconstruction method and connectivity rules of fracture networks generated under different mining layouts

    Institute of Scientific and Technical Information of China (English)

    Zhang Ru; Ai Ting; Li Hegui; Zhang Zetian; Liu Jianfeng

    2013-01-01

    In current research, a series of triaxial tests, which were employed to simulate three typical mining lay-outs (i.e., top-coal caving, non-pillar mining and protected coal seam mining), were conducted on coal by using MTS815 Flex Test GT rock mechanics test system, and the fracture networks in the broken coal samples were qualitatively and quantitatively investigated by employing CT scanning and 3D reconstruc-tion techniques. This work aimed at providing a detail description on the micro-structure and fracture-connectivity characteristics of rupture coal samples under different mining layouts. The results show that: (i) for protected coal seam mining layout, the coal specimens failure is in a compression-shear manner and oppositely, (ii) the tension-shear failure phenomenon is observed for top-coal caving and non-pillar mining layouts. By investigating the connectivity features of the generated fractures in the direction of r1 under different mining layouts, it is found that the connectivity level of the fractures of the samples corresponding to non-pillar mining layout was the highest.

  13. Quantum Privacy-Preserving Data Mining

    OpenAIRE

    Ying, Shenggang; Ying, Mingsheng; Feng, Yuan

    2015-01-01

    Data mining is a key technology in big data analytics and it can discover understandable knowledge (patterns) hidden in large data sets. Association rule is one of the most useful knowledge patterns, and a large number of algorithms have been developed in the data mining literature to generate association rules corresponding to different problems and situations. Privacy becomes a vital issue when data mining is used to sensitive data sets like medical records, commercial data sets and nationa...

  14. Data mining algorithm for discovering matrix association regions (MARs)

    Science.gov (United States)

    Singh, Gautam B.; Krawetz, Shephan A.

    2000-04-01

    Lately, there has been considerable interest in applying Data Mining techniques to scientific and data analysis problems in bioinformatics. Data mining research is being fueled by novel application areas that are helping the development of newer applied algorithms in the field of bioinformatics, an emerging discipline representing the integration of biological and information sciences. This is a shift in paradigm from the earlier and the continuing data mining efforts in marketing research and support for business intelligence. The problem described in this paper is along a new dimension in DNA sequence analysis research and supplements the previously studied stochastic models for evolution and variability. The discovery of novel patterns from genetic databases as described is quite significant because biological patterns play an important role in a large variety of cellular processes and constitute the basis for gene therapy. Biological databases containing the genetic codes from a wide variety of organisms, including humans, have continued their exponential growth over the last decade. At the time of this writing, the GenBank database contains over 300 million sequences and over 2.5 billion characters of sequenced nucleotides. The focus of this paper is on developing a general data mining algorithm for discovering regions of locus control, i.e. those regions that are instrumental for determining cell type. One such type of element of locus control are the MARs or the Matrix Association Regions. Our limited knowledge about MARs has hampered their detection using classical pattern recognition techniques. Consequently, their detection is formulated by utilizing a statistical interestingness measure derived from a set of empirical features that are known to be associated with MARs. This paper presents a systematic approach for finding associations between such empirical features in genomic sequences, and for utilizing this knowledge to detect biologically interesting

  15. The nature of waste associated with closed mines in England and Wales

    OpenAIRE

    B. Palumbo-Roe; Colman, T.

    2010-01-01

    This report has been prepared for the Environment Agency (EA) to provide information on mineral waste associated with closed mining and quarrying sites in England and Wales as part of the provisions of the EU Mine Waste Directive 2006 (MWD). The Environment Agency is the regulatory body for England and Wales (E&W) responsible for producing an inventory of closed mining waste facilities, including abandoned waste facilities, as required by Article 20 of the European Mine Wastes Directive, by M...

  16. Overview of the Texas Mining and Reclamation Association`s education project

    Energy Technology Data Exchange (ETDEWEB)

    Hutchins, M.F. [Texas Mining and Reclamation Association, Austin, TX (United States)

    1997-12-31

    The Texas Mining and Reclamation Association (TMRA) sponsors {open_quotes}Resources and the Environment,{close_quotes} a teacher workshop held at a lignite mine each summer. Over a period of five years more than two hundred science teachers have participated in the 4-day workshop, and through them approximately 50,000 middle school students have been exposed to the curriculum. The workshop was developed with a grant from Phillips Petroleum Foundation, provided to the Center for Engineering Geosciences at Texas A&M University. The funding enabled the development of a program consisting of a science education curriculum addressing the earth-science concepts associated with lignite production and reclamation activities. The workshop is currently being instructed by Jim Luppens, Phillips Coal Company, and two assisting earth science specialists. The workshop includes classroom instruction, presentations by guest speakers, hands-on activities, and a tour of a lignite mine. The workshop ends with a mock public hearing involving role-playing. Roles include mining personnel, regulatory agencies, local townspeople, and adjacent landowners. The curriculum is provided as a resource for teachers and includes 55 teaching units; each comprised of student story, teacher outline, and classroom/lab activities. The objective of the curriculum is to provide middle school students with an opportunity to learn about earth science and apply that knowledge to a real situation. The unifying theme of the workshop is geology and the development of lignite coal resources; from the planning stages of a mine to final reclamation.

  17. Preliminary study on regulatory limits of coal mines associated with radionuclides in Xinjiang

    International Nuclear Information System (INIS)

    In this paper, limits of radon concentration and gamma radiation dose rate of coal mines associated with radionuclide in Xinjiang were studied to provide theoretical bases in developing scientific and practical regulatory standards of radiation protection for coal mines associated with radionuclides. It is meaningful in strengthening the supervision to coal mines associated with radionuclides, boosting their development of exploitation and utilization, as well as the protection to the health of worker and public and the environment. It may also provide references in defining the limits of regulatory standards for NORMs associated mining and processing of ore resources. (authors)

  18. Mining Rules from Crisp Attributes by Rough Sets on the Fuzzy Class Sets

    OpenAIRE

    Mojtaba MadadyarAdeh; Dariush Dashchi Rezaee; Ali Soultanmohammadi

    2012-01-01

    Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving classification rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. The rough-set theory has served as a good mathematical tool for dealing with data classification problems. In the past, the rough-set theory was widely used in dealing with data classificati...

  19. Data Mining Rules for Ultrasonic B-Type Detection and Diagnosis for Cholecystolithiasis

    Institute of Scientific and Technical Information of China (English)

    LOUWei; YANLi-min; HEGuo-sen

    2004-01-01

    This paper presents realistic data mining based on the data of B-type ultrasonic detection and diagnosis for cholrcystolithiasis (gallbladder stone in biliary tract) recorded by a district central hospital in Shanghai during the past several years. Computer simulation and modeling is described.

  20. A Depth-first Algorithm of Finding All Association Rules Generated by a Frequent Itemset

    Institute of Scientific and Technical Information of China (English)

    WU Kun; JIANG Bao-qing; WEI Qing

    2006-01-01

    The classical algorithm of finding association rules generated by a frequent itemset has to generate all nonempty subsets of the frequent itemset as candidate set of consequents. Xiongfei Li aimed at this and proposed an improved algorithm. The algorithm finds all consequents layer by layer, so it is breadth-first. In this paper, we propose a new algorithm Generate Rules by using Set-Enumeration Tree (GRSET) which uses the structure of Set-Enumeration Tree and depth-first method to find all consequents of the association rules one by one and get all association rules correspond to the consequents.Experiments show GRSET algorithm to be practicable and efficient.

  1. Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context

    OpenAIRE

    Choy, Murphy; Ong, Cally Claire; Cheong, Michelle

    2012-01-01

    The rapid explosion in retail data calls for more effective and efficient discovery of association rules to develop relevant business strategies and rules.Unlike online shopping sites, most brick and mortar retail shops are located in geographically and demographically diverse areas. This diversity presents a new challenge to the classical association rule model which assumes a homogenous group of customers behaving differently. The focus of this paper is centered on the discovery of associat...

  2. The study of slip line field and upper bound method based on associated flow and non-associated flow rules

    Institute of Scientific and Technical Information of China (English)

    Zheng Yingren; Deng Chujian; Wang Jinglin

    2010-01-01

    At present,associated flow rule of traditional plastic theory is adopted in the slip line field theory and upper bound method of geotechnical materials.So the stress characteristic line conforms to the velocity line.It is proved that geotechnical materials do not abide by the associated flow rule.It is impossible for the stress characteristic line to conform to the velocity line.Generalized plastic mechanics theoretically proved that plastic potential surface intersects the Mohr-Coulomb yield surface with an angle,so that the velocity line must be studied by non-associated flow rule.According to limit analysis theory,the theory of slip line field is put forward in this paper,and then the ultimate boating capacity of strip footing is obtained based on the associated flow rule and the non-associated flow rule individually.These two results are identical since the ultimate bearing capacity is independent of flow rule.On the contrary,the velocity fields of associated and non-associated flow rules are different which shows the velocity field based on the associated flow rule is incorrect.

  3. Design and Realization of user Behaviors Recommendation System Based on Association rules under Cloud Environment

    Directory of Open Access Journals (Sweden)

    Wei Dai

    2013-07-01

    Full Text Available This study introduces the basal principles of association rules, properties and advantages of Map Reduce model and Hbase in Hadoop ecosystem. And giving design steps of the user's actions recommend system in detail, many time experiences proves that the exploration combined association rules theory with cloud computing is successful and effective.

  4. An Efficient Method for Mining Event-Related Potential Patterns

    Directory of Open Access Journals (Sweden)

    Seyed Aliakbar Mousavi

    2011-11-01

    Full Text Available In the present paper, we propose a Neuroelectromagnetic Ontology Framework (NOF for mining Event-related Potentials (ERP patterns as well as the process. The aim for this research is to develop an infrastructure for mining, analysis and sharing the ERP domain ontologies. The outcome of this research is a Neuroelectromagnetic knowledge-based system. The framework has 5 stages: 1 Data pre-processing and preparation; 2 Data mining application; 3 Rule Comparison and Evaluation; 4 Association rules Post-processing 5 Domain Ontologies. In 5th stage a new set of hidden rules can be discovered base on comparing association rules by domain ontologies and expert rules.

  5. An Efficient Method for Mining Event-Related Potential Patterns

    CERN Document Server

    Mousavi, Seyed Aliakbar; Mohamed, Hasimah Hj; Alomari, Saleh Ali

    2012-01-01

    In the present paper, we propose a Neuroelectromagnetic Ontology Framework (NOF) for mining Event-related Potentials (ERP) patterns as well as the process. The aim for this research is to develop an infrastructure for mining, analysis and sharing the ERP domain ontologies. The outcome of this research is a Neuroelectromagnetic knowledge-based system. The framework has 5 stages: 1) Data pre-processing and preparation; 2) Data mining application; 3) Rule Comparison and Evaluation; 4) Association rules Post-processing 5) Domain Ontologies. In 5th stage a new set of hidden rules can be discovered base on comparing association rules by domain ontologies and expert rules.

  6. Rule-based statistical data mining agents for an e-commerce application

    Science.gov (United States)

    Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar

    2003-03-01

    Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.

  7. Fusion: a Visualization Framework for Interactive Ilp Rule Mining With Applications to Bioinformatics

    OpenAIRE

    Indukuri, Kiran Kumar

    2004-01-01

    Microarrays provide biologists an opportunity to find the expression profiles of thousands of genes simultaneously. Biologists try to understand the mechanisms underlying the life processes by finding out relationships between gene-expression and their functional categories. Fusion is a software system that aids the biologists in performing microarray data analysis by providing them with both visual data exploration and data mining capabilities. Its multiple view visual framework allows the u...

  8. Text Association Analysis and Ambiguity in Text Mining

    Science.gov (United States)

    Bhonde, S. B.; Paikrao, R. L.; Rahane, K. U.

    2010-11-01

    Text Mining is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Text Mining will enhance human's ability to process massive quantities of information, and it has high commercial values. Firstly, the paper discusses the introduction of TM its definition and then gives an overview of the process of text mining and the applications. Up to now, not much research in text mining especially in concept/entity extraction has focused on the ambiguity problem. This paper addresses ambiguity issues in natural language texts, and presents a new technique for resolving ambiguity problem in extracting concept/entity from texts. In the end, it shows the importance of TM in knowledge discovery and highlights the up-coming challenges of document mining and the opportunities it offers.

  9. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  10. URANIUM MINING AND ASSOCIATED ENVIRONMENTAL PROBLEMS IN UKRAINE

    OpenAIRE

    Dudar, T.; Zakrytnyi, Ye.; Bugera, M.

    2015-01-01

    Nuclear power demands in uranium resources are expected to increase in the nearest future. So, the problem of uranium mining impact into the environment is a challenge and requires straightway actions.   The tendencies in uranium mining in the world and in Ukraine for the period of 2003-2013 are considered in this paper. It is especially noted the increase in uranium raw material demands and as a consequence in its mining. The available and potential uranium resources are overviewed. It shoul...

  11. Fast Vertical Mining Using Boolean Algebra

    Directory of Open Access Journals (Sweden)

    Hosny M. Ibrahim

    2015-01-01

    Full Text Available The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The sizes of bit vectors for itemsets are the main space expense of the algorithm that restricts its expansibility. Therefore, in this paper, a proposed algorithm that compresses the bit vectors of frequent itemsets will be presented. The new bit vector schema presented here depends on Boolean algebra rules to compute the intersection of two compressed bit vectors without making any costly decompression operation. The experimental results show that the proposed algorithm, Vertical Boolean Mining (VBM algorithm is better than both Apriori algorithm and the classical vertical association rule mining algorithm in the mining time and the memory usage.

  12. Clinic-Genomic Association Mining for Colorectal Cancer Using Publicly Available Datasets

    OpenAIRE

    Fang Liu; Yaning Feng; Zhenye Li; Chao Pan; Yuncong Su; Rui Yang; Liying Song; Huilong Duan; Ning Deng

    2014-01-01

    In recent years, a growing number of researchers began to focus on how to establish associations between clinical and genomic data. However, up to now, there is lack of research mining clinic-genomic associations by comprehensively analysing available gene expression data for a single disease. Colorectal cancer is one of the malignant tumours. A number of genetic syndromes have been proven to be associated with colorectal cancer. This paper presents our research on mining clinic-genomic assoc...

  13. E-commerce Website Recommender System Based on Dissimilarity and Association Rule

    OpenAIRE

    MingWang Zhang; ShuWen Yang; LiFeng Zhang

    2013-01-01

    By analyzing the current electronic commerce recommendation algorithm analysis, put forward a kind to use dissimilarity clustering and association recommendation algorithm, the algorithm realized web website shopping user data clustering by use of the dissimilarity, and then use the association rules algorithm for clustering results of association recommendation, experiments show that the algorithm compared with traditional clustering association algorithm of iteration times decrease, improve...

  14. 数据挖掘发展研究%The Develepment Research on the Data Mining

    Institute of Scientific and Technical Information of China (English)

    张伟; 刘勇国; 彭军; 廖晓峰; 吴中福

    2001-01-01

    Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.

  15. A study of trends in occupational risks associated with coal mining

    International Nuclear Information System (INIS)

    The coal industry is well known as a major source of specific types of risk and harmful effects including, for instance, harm to the environment, pollution from various surface installations and hazards associated with the actual task of mining. We shall confine our attention to the third group and discuss only the occupational risks facing miners and ex-miners. Unlike the nuclear and oil industries, coal-mines employ very large work-forces, and the risks associated with mining therefore have a considerable impact. Mining is also a highly integrated industry: a mine's own work-force carries out all the underground engineering work (preparatory excavations, installation work, etc.) as well as maintenance. In this narrow field, a distinction should immediately be drawn between two main areas: industrial accidents; and occupational diseases, which include silicosis or, more precisely, coal-miner's pneumoconiosis

  16. Application of Multidimensional Association Rules Method in Psychological Measurement%多维关联规则在心理测量中的应用

    Institute of Scientific and Technical Information of China (English)

    王冬燕

    2015-01-01

    利用多维关联规则方法提取心理测量不同量表属性间的关联规则,样本包括1958名大学新生。鉴于量表属性较多,且数据库庞大,传统的关联规则Apriori算法较难实现,因此基于Apriori算法设计并实现了多维关联规则的挖掘算法,并应用于心理测量量表属性的关系研究。实验表明,多维关联规则方法能够较快速且更加准确地挖掘出属性间的多维关联规则,并且这些规则在心理测量工作中能够起到指导作用,说明该方法是十分有效的。%The use of multidimensional association rules to extract the psychometric properties of the scale between different association rules, the sample includes 1 958 freshmen.Given the large scale property and huge databases, traditional Apri-ori algorithm of association rules difficult to achieve, so based on Apriori algorithm design and implementation of multidi-mensional association rules mining algorithm, and study the relationship between psychometric properties of the scales ap-plied.Experimental results show that the multidimensional association rules can more quickly and more accurately excava-ted multidimensional association rules between attributes, and these rules work in psycho-metrics can play a guiding role, indicating that this method is very effective.

  17. Service Composition Design Pattern for Autonomic Computing Systems Using Association Rule Based Learning and Service-Oriented Architecture

    Directory of Open Access Journals (Sweden)

    Vishnuvardhan Mannava

    2012-10-01

    Full Text Available In this paper we will compose the design patterns which will satisfy the properties of autonomic computingsystem: for the Decision-Making phase we will introduce Case-Based Reasoning design pattern, and forReconfiguration phase we will introduce Reactor design pattern. The most important proposal in ourcomposite design pattern is that we will use the Association Rule Learning method of Data Mining to learnabout new services that can be added along with the requested service to make the service as a dynamiccomposition of two or more services. Then we will include the new service as an aspectual feature modulecode without interrupting the user.As far as we know, there are no studies on composition of designpatterns and pattern languages for autonomic computing domain. We will authenticate our work by asimple case study work. A simple Class and Sequence diagrams are depicted.

  18. [Analysis on medication rules of state medical master yan zhenghua's prescriptions that including Polygoni Multiflori Caulis based on data mining].

    Science.gov (United States)

    Wu, Jia-rui; Guo, Wei-xian; Zhang, Xiao-meng; Yang, Bing; Zhang, Bing; Zhao, Meng-di; Sheng, Xiao-guang

    2014-11-01

    The prescriptions including Polygoni Multiflori Caulis that built by Pro. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. The method of association rules with apriori algorithm was used to achieve frequency of single medicine, frequency of drug combinations, association rules between drugs and core drug combinations. The datamining results indicated that in the prescriptions that including Polygoni Multiflori Caulis, the highest frequency used drugs were parched Ziziphi Spinosae Semen, Ostreae Concha, Ossis Mastodi Fossilia, Salviae Miltiorrhizae Radix Et Rhizoma, Paeoniae Rubra Radix, and so on. The most frequent drug combinations were "Polygoni Multiflori Caulis-parched Ziziphi Spinosae Semen", "Ostreae Concha-Polygoni Multiflori Caulis", and "Polygoni Multiflori Caulis-Ossis Mastodi Fossilia". The drug association rules of confidence coefficient 1 were "Ostreae Concha-->Polygoni Multiflori Caulis", "Poria-->Polygoni Multiflori Caulis", "parched Ziziphi Spinosae Semen-->Polygoni Multiflori Caulis", and "Paeoniae Alba Radix-->Polygoni Multiflori Caulis". The core drug combinations in the treatment of insomnia were Ossis Mastodi Fossilia, Polygoni Multiflori Caulis, Salviae Miltiorrhizae Radix et Rhizoma, Ostreae Concha, Polygalae Radix, Margaritifera Concha, Poria, and parched Ziziphi Spinosae Semen. And the core drug combinations in the treatment of obstruction of Qi in chest were Salviae Miltiorrhizae Radix Et Rhizoma, Polygoni Multiflori Caulis, parched Ziziphi Spinosae Semen, Trichosanthis Fructus, Allii Macrostemonis Bulbus, and Paeoniae Rubra Radix.

  19. MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RULE DISCOVERY

    OpenAIRE

    Irene Kahvazadeh; Mohammad Saniee Abadeh

    2015-01-01

    Extracting association rules from numeric features involves searching a very large search space. To deal with this problem, in this paper a meta-heuristic algorithm is used that we have called MOCANAR. The MOCANAR is a Pareto based multi-objective cuckoo search algorithm which extracts high quality association rules from numeric datasets. The support, confidence, interestingness and comprehensibility are the objectives that have been considered in the MOCANAR. The MOCANAR extra...

  20. 基于MDPI的多维关联规则算法的研究%The Research for Multidimensional Association Rules Algorithm Based on MDPI

    Institute of Scientific and Technical Information of China (English)

    彭硕; 吴昊

    2011-01-01

    Multidimensional data mining association rules is an important research direction. In this paper, we propose an efficient algorithm for mining multidimensial association rules,which combine data cube technique with FP-Growth efficiently by constructing a MDPI-tree,the algorithm can explores both inter-dimension and hybrid-dimension association rules. Lastly this algorithm is applied to cross-selling model of mobile communication, and we can verificate the practicality and effectiveness of the algorithm by experiment.%多维关联规则是数据挖掘中的一个重要研究方向,由此提出了一种高效的多维关联规则挖掘算法,该方法通过引入MDPI-tree(多维谓词索引树)结构,有效地将数据立方体技术和频繁项集挖掘算法FP-Growth结合起来,能用于挖掘维间和混合维关联规则.最后将此算法应用于移动通信交叉销售模型,通过实验验证算法的有效性和实用性.

  1. 基于包含与演绎分析的无冗余序列规则挖掘%NON-REDUNDANT SEQUENCE RULES MINING BASED ON INCLUSION AND DEDUCTION ANALYSIS

    Institute of Scientific and Technical Information of China (English)

    周新; 王乙民; 刘婧; 尤涛

    2016-01-01

    序列规则挖掘旨在发现频繁序列之间的因果关联,当前最优的序列规则产生方法仅考虑两规则间的包含关系而没有考虑多规则间的演绎关系,故而存在大量冗余。引入演绎无冗余规则的概念,分析演绎冗余的原因,重新定义了无冗余规则的概念。在频繁闭序列及其生成子的基础上,基于最大重叠项冗余性检查给出了无冗余规则抽取算法。理论分析和实验评估表明该算法在处理效率基本不变的前提下,提高了序列规则的生成质量。%Sequence rule mining aims at finding the casual association between frequent sequences,current best sequence rules generation approach just considers the inclusion relationship between two rules but does not consider the deduction relationship among multi rules, therefore has lots redundancies.We introduce the concept of deductive non-redundant rules and analyse the reasons for deductive redundancy, as well as redefine the concept of non-redundant rules.We also present the non-redundant sequence rules extraction algorithm based on the maximum overlap term redundancy checking on the basis of frequent closed sequence and its generator.Theoretical analysis and experimental assessment demonstrate that this algorithm improves the generation quality of sequence rules with almost the same efficiency.

  2. The methodology and problems associated with corrosion testing in South African mines

    Energy Technology Data Exchange (ETDEWEB)

    McEwan, J.J.; Enright, D.P. [Mintek, Randburg (South Africa); Leitch, J.E. [Hulett Aluminium (Pty) Ltd., Pietermaritzburg (South Africa)

    1995-10-01

    The mining industry is of fundamental importance to the South African economy, contributing over 10 per cent of the gross domestic product (GDP). It is one of the largest and oldest industrial sectors in the country, with gold, coal and diamonds as the most valuable exports. The corrosion costs to the South African mining industry are in excess of R1 billion (US$ 300 million) per year. An area of particular concern is in the mine shafts where not only can the conditions be very corrosive, but shaft utilization entails that there is very little time available for maintenance, repair etc. The problems are compounded in the newer ultra-deep gold mines (up to 4 km or 2{1/2} miles deep), where large volumes of sometimes untreated water are required for both the mining operation and cooling. This paper details the corrosion problems encountered in shafts, and also the problems associated with performing in situ evaluations.

  3. Mine dumps, wheeze, asthma, and rhinoconjunctivitis among adolescents in South Africa: any association?

    Science.gov (United States)

    Nkosi, Vusumuzi; Wichmann, Janine; Voyi, Kuku

    2015-01-01

    The study investigated the association between community proximity to mine dumps, and current wheeze, rhinoconjunctivitis, and asthma among adolescents. This study was conducted during May-November 2012 around five mine dumps in South Africa. Communities in close proximity to mine dumps had an increased likelihood of current wheeze OR 1.38 (95 % CI: 1.10-1.71), rhinoconjunctivitis OR 1.54 (95 % CI: 1.29-1.82), and a protective association with asthma OR 0.29 (95 % CI: 0.23-0.35). Factors associated with health outcomes included other indoor and outdoor pollution sources. Wheeze and rhinoconjunctivitis appear to be a public health problem in these communities. The findings of this study serve as a base for further detailed epidemiological studies for communities in close proximity to the mine dumps e.g. a planned birth cohort study.

  4. Mining and Visualizing Family History Associations in the Electronic Health Record: A Case Study for Pediatric Asthma.

    Science.gov (United States)

    Chen, Elizabeth S; Melton, Genevieve B; Wasserman, Richard C; Rosenau, Paul T; Howard, Diantha B; Sarkar, Indra Neil

    2015-01-01

    Asthma is the most common chronic childhood disease and has seen increasing prevalence worldwide. While there is existing evidence of familial and other risk factors for pediatric asthma, there is a need for further studies to explore and understand interactions among these risk factors. The goal of this study was to develop an approach for mining, visualizing, and evaluating association rules representing pairwise interactions among potential familial risk factors based on information documented as part of a patient's family history in the electronic health record. As a case study, 10,260 structured family history entries for a cohort of 1,531 pediatric asthma patients were extracted and analyzed to generate family history associations at different levels of granularity. The preliminary results highlight the potential of this approach for validating known knowledge and suggesting opportunities for further investigation that may contribute to improving prediction of asthma risk in children.

  5. NIOSH (National Institute for Occupational Safety and Health) testimony to DOL (Department of Labor) on the Mine Safety and Health Administration's proposed rule on safety standards for underground coal-mine ventilation

    Energy Technology Data Exchange (ETDEWEB)

    Millar, J.D.

    1988-06-16

    The testimony summarized information contained in written comments on the proposed rule on mine ventilation submitted on April 28, 1988. The proposed regulation included a number of significant improvements over existing standards. The requirements eliminated aluminum overcasts, required fireproof mortar in permanent stoppings and fire-resistant coatings on timbers used as stoppings, and improved the requirements for underground electrical installations. New rules for escapeways provided additional protection in the event of an emergency under ground. A carbon monoxide fire-detection system provided a major margin in risk reduction. The testimony indicated that NIOSH did not support the use of belt haulageways as intake air courses at any time. Anthracite miners appear to be at twice the risk of coalworkers' pneumoconiosis compared to bituminous coal miners. Special attention must be given to controlling dust levels in anthracite mines.

  6. Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

    OpenAIRE

    Cheng Dongmei; Zuo Xuejun; Liu Zhaohua

    2015-01-01

    Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test r...

  7. Air Pollution Monitoring & Tracking System Using Mobile Sensors and Analysis of Data Using Data Mining

    OpenAIRE

    Umesh M. Lanjewar, J. J. Shah

    2012-01-01

    This study proposes air pollution monitoring systemand analysis of pollution data using association ruledata mining technique. Association rule datamining technique aims at finding associationpatterns among various parameters. In this paper,association rule mining is presented for findingassociation patterns among various air pollutants.For this, Apriori algorithm of association rule datamining is used. Apriori is characterized as a level -by-level complete search algorithm. This algorithmis ...

  8. Data Mining Foundations and Intelligent Paradigms Volume 1 Clustering, Association and Classification

    CERN Document Server

    Jain, Lakhmi

    2012-01-01

    Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 1of this three volume series, we have brought together contributions from some of the most prestigious researchers in the fundamental data mining tasks of clustering, association and classification. Each of the chapters is self contained. Theoreticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in these aspects of data mining.

  9. Data Mining Techniques: A Source for Consumer Behavior Analysis

    CERN Document Server

    Raorane, Abhijit

    2011-01-01

    Various studies on consumer purchasing behaviors have been presented and used in real problems. Data mining techniques are expected to be a more effective tool for analyzing consumer behaviors. However, the data mining method has disadvantages as well as advantages. Therefore, it is important to select appropriate techniques to mine databases. The objective of this paper is to know consumer behavior, his psychological condition at the time of purchase and how suitable data mining method apply to improve conventional method. Moreover, in an experiment, association rule is employed to mine rules for trusted customers using sales data in a super market industry

  10. 大数据分析中的关联挖掘磁%Data Mining Association in the Data Analysis

    Institute of Scientific and Technical Information of China (English)

    金宗泽; 冯亚丽; 纪博; 张希; 高快

    2014-01-01

    In this era with the amount information explosion ,the big data is more and more close to our lives .Firstly where the big data came from and how to study the big data are introduced .Then ,the framework of the data analysis pro-cessing is introduced and the importance of the big data mining is elaborated .It provided the studying ways of the big data mining ,and the analytic system can analyze the mining scheme ,meanwhile ,the users can use the artificial selection of pa-rameters to manage the parameters for analysis ,selection and retention .In the course of big data analysis ,if we can use min-ing association rules better ,it will bring more value .%在这个信息量爆炸的年代,大数据越来越贴近我们的生活。论文从大数据从何而来、如何研究大数据入手,通过对大数据分析流程框架进行阐述,提出了大数据分析中关联挖掘的重要性。并通过对大数据关联挖掘给出了相应的研究方案,通过系统对其关联模式进行分析,同时也可通过人为的参数选择对研究的参数进行分析、筛选和保留。在大数据分析的过程中,若能很好地利用关联规则的挖掘,将会带来更广阔的实际价值。

  11. Prospectors and Developers Association of Canada Mining Matters: A Model of Effective Outreach

    Science.gov (United States)

    Hymers, L.; Heenan, S.

    2009-05-01

    Prospectors and Developers Association of Canada Mining Matters is a charitable organization whose mandate is to bring the wonders of Canada's geology and mineral resources to students, educators and industry. The organization provides current information about rocks, minerals, metals, and mining and offers exceptional educational resources, developed by teachers and for teachers that meet Junior, Intermediate and Senior Provincial Earth Science and Geography curriculum expectations. Since 1994, Mining Matters has reached more than 400,000 educators, students, industry representatives, and Aboriginal Youth through Earth Science resources. At the time of the program's inception, members of the Prospectors and Developers Association of Canada (PDAC) realized that their mining and mineral industry expertise could be of help to teachers and students. Consulting experts in education, government, and business, and the PDAC worked together to develop the first Mining Matters Earth Science curriculum kit for Grades 6 and 7 teachers in Ontario. PDAC Mining Matters became the official educational arm of the Association and a charitable organization in 1997. Since then, the organization has partnered with government, industry, and educators to develop bilingual Earth science teaching units for Grades 4 and 7, and senior High School. The teaching units consist of kits that contain curriculum correlated lesson plans, inform bulletins, genuine data sets, rock and mineral samples, equipment and additional instructional resources. Mining Matters offers instructional development workshops for the purposes of training pre-service and in- service educators to use our teaching units in the classroom. The workshops are meant to provide teachers with the knowledge and confidence they need to successfully employ the units in the classroom. Formal mechanisms for resource and workshop evaluations are in place. Overwhelmingly teacher feedback is positive, describing the excellence

  12. Atomic data mining numerical methods, source code SQlite with Python

    OpenAIRE

    Khwaldeh, Ali; Tahat, Amani; Martí Rabassa, Jordi; Tahat, Mofleh

    2013-01-01

    This paper introduces a recently published Python data mining book (chapters, topics, samples of Python source code written by its authors) to be used in data mining via world wide web and any specific database in several disciplines (economic, physics, education, marketing. etc). The book started with an introduction to data mining by explaining some of the data mining tasks involved classification, dependence modelling, clustering and discovery of association rules. The book addressed that ...

  13. Improved association rules and its application in Computer Forensics%关联规则改进及其在计算机取证中的应用

    Institute of Scientific and Technical Information of China (English)

    刘锋; 詹焰霞; 陈玉萍

    2012-01-01

    随着科学技术的发展,计算机早已走进千家万户,由此带来的计算机犯罪等一系列问题也越发引起社会的关注,而计算机取证是遏制这种行为的一个强有力的工具。本文将计算机取证技术与数据挖掘中的关联规则挖掘结合起来,首先介绍了数据挖掘和关联规则的相关概念,提出了关联规则挖掘中最典型的Apriori算法,并总结了其不足之处,然后针对不足提出了基于排序的apriori改进算法,提高了算法的效率,并将之运用到计算机取证中,通过具体实例验证了其可行性。%With the development of science and technology, computer has already gone into thousands of households, which brings a series of problems such as computer crime which is also increasingly attracted the attention of the society, and computer forensics is a powerful tool to curb the behavior. In this paper, the technology of computer forensics and association rules mining are combined, first introduced the data mining, association rule and the related concept, and then proposed the typical Apriori algorithm of associa- tion rules mining, and summarizes its deficiency, then put forward the improved Apriori algorithm based on sort, improves the effi- ciency of algorithm, and apply it to computer forensics, through specific example test and verify its feasibility.

  14. Fast Vertical Mining Using Boolean Algebra

    OpenAIRE

    Hosny M. Ibrahim; Marghny, M. H.; Noha M. A. Abdelaziz

    2015-01-01

    The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The sizes of bit vectors for itemsets are the main space expense of the algorithm that restricts its ex...

  15. Study on structuring the supervision system of coal mine associated with radionuclides in Xinjiang

    International Nuclear Information System (INIS)

    Xinjiang is one of China's rich coal provinces (areas) and it accounts for about 40% national coal reserves. In the long-term radioactive scientific research, monitoring and environmental impact assessment works, we found parts of Yili and Hetian's coal was associated with higher radionuclide, and parts of coal seam even reached nuclear mining level. However the laws and regulations about associated radioactive coal mine supervision were not perfect, and the supervision system is still in the exploration. This article mainly started with the coal mine enterprises' geological prospecting reports, radiation environmental impact assessment and monitoring report preparation for environment acceptance checking and supervisory monitoring, controlled the coal radioactive pollution from the sources, and carried out the research of building Xinjiang associated radioactive coal mine supervision system. The establishment of supervision system will provide technical guidance for the enterprises' coal exploitation and cinders using on the one hand, and on the other hand will provide decision-making basis for strengthening the associated radioactive coal mine supervision for Xinjiang environmental regulators. (authors)

  16. Improved Maximal Length Frequent Item Set Mining

    Directory of Open Access Journals (Sweden)

    P.C.S.Nagendra setty

    2012-09-01

    Full Text Available Association rule mining is one of the most important technique in data mining. Which wide range of applications It aims it searching for intersecting relationships among items in large data sets and discovers association rules. The important of association rule mining is increasing with the demand of finding frequent patterns from large data sources. The exploitation of frequent item set has been restricted by the large number of generated frequent item set and high computational cost in real world applications. To avoid these problems we can use maximum length frequent item sets in generating association rules. The maximum length frequent item sets can be efficiently discovered on very large data sets. At present in research we have LFIMiner algorithm and MaxLFI algorithm to generate maximum length frequent item sets. Here we are proposing a new algorithm called FPMAX for generating maximum length frequent item sets that uses lattice graph data structure.

  17. Mining of Datasets with an Enhanced Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    V. P. Arunachalam

    2012-01-01

    Full Text Available Problem statement: Classical association rules are mostly mining intra-transaction associations i.e., associations among items within the same transaction where the idea behind the transaction could be the items bought by the same customer on the same day. The goal of inter-transaction association rules is to represent the associations between various events found in different transactions. Approach: In this study, we break the barrier of transactions and extend the scope of mining association rules from traditional single-dimensional, intratransaction associations to N-Dimensional, inter-transaction associations. With the introduction of dimensional attributes, we lose the luxury of simple representational form of the classical association rules. Mining inter-transaction associations pose more challenges on efficient processing than mining intra-transaction associations because the number of potential association rules becomes extremely large after the boundary of transactions is broken. Results: Various tests also conducted using the data set collected from different Stock Exchange (SE.Various experimental results are reported by comparing with real life and synthetic datasets and we show the effectiveness of our work in generating rules and in finding acceptable set of rules under varying conditions. Conclusion/Recommendations: This study introduce the notion of N-Dimensional inter-transaction association rule, define its measurements: support and confidence and develop an efficient algorithm called Modified Apriori.

  18. The Contribution of „Ruda 12 Apostoli” Mining Association in Brad to the Development of Transylvanian Gold Mining Between 1884 – 1921

    OpenAIRE

    MIRCEA BARON

    2012-01-01

    One of the major gold mining regions in Romania is part of the gold rectangle in the Apuseni Mountains and lies around the town of Brad. It is here that the ”Ruda 12 Apostoli” Mining Association of cuxas was established at the end of the XVIIIth century. This association was to become the most important unit for the mining of precious metals in the entire Austrian – Hungarian Empire after 1884, when it was taken over by the German company ”Harkortschen Bergwerke und Chemische Fabriken zu Schw...

  19. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  20. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  1. Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices

    OpenAIRE

    Xun Yi; Md. Golam Kaosar

    2010-01-01

    In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary t...

  2. ON MINING ENTREPRENEURSHIP IN BANOVINA REGION (CROATIA

    Directory of Open Access Journals (Sweden)

    Berislav Šebečić

    2000-12-01

    Full Text Available Mining activities in exploitation of iron, copper, and lead (-silver ores in Trgovska gora Mountain had been developed back in Illyrian and Roman times as well as in the Middle Ages and recent times whereas in Petrova gora Mountain exploitation of iron oreš and coal developed as late as in 19 and 20 centuries. In the Middle Ages and more recent times, Croatian nobility (counts of Zrinski and Keglević and later on also the foreign nobility or foreign and domestic mining associations were given mining concessions.The mining enterprise in the Banovina Region passed to different owners and managers from mid —19 century to mid — 20 century. During the Austro-Hungarian rule the main mining concession was owned by »Gewerkschaft der Eisenbergwerke und Huttenwerke Petrova gora zu Topusko« or its shorter version »Petrova gora Gewerkschaft«. The major mining entrepreneurs on the Trgovska gora Mountain at Bešlinac were Desire Gilain, Joseph Steinauer and Alois Frohm. After the World War I and confiscation of properties of foreign mining associations and entrepreneurs, there were constituted and bankrupted rather quickly the Petrova gora Association of Mines and Foundry at Topusko, the Slavenska Bank Zagreb (until 1923, as well as the Iron Mine and Foundry Inc. at Topusko. After the bancruptey of National Industrial Enterprise Zagreb (1929, the Mining Association and (Iron Foundry was founded at Bešlinac (1934. In the region of Banovina there were operating also: the Kupa-Glina Mining Association (active also during the Austro-Hungarian rule, Mineral Mining Association from Topusko, as vvell as the Iron Mine and Foundry Topusko-Vojnić Headquarters. All the mentioned associations and entrepreneurs were confiscated by the Federal People's Republic of Yugoslavia in 1946.

  3. Fuzzy association rules for biological data analysis: a case study on yeast

    OpenAIRE

    Cano Carlos; Garcia Fernando; Blanco Armando; Lopez Francisco J; Marin Antonio

    2008-01-01

    Abstract Background Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data. Results In ...

  4. Health Effects Associated with Inhalation of Airborne Arsenic Arising from Mining Operations

    Directory of Open Access Journals (Sweden)

    Rachael Martin

    2014-08-01

    Full Text Available Arsenic in dust and aerosol generated by mining, mineral processing and metallurgical extraction industries, is a serious threat to human populations throughout the world. Major sources of contamination include smelting operations, coal combustion, hard rock mining, as well as their associated waste products, including fly ash, mine wastes and tailings. The number of uncontained arsenic-rich mine waste sites throughout the world is of growing concern, as is the number of people at risk of exposure. Inhalation exposures to arsenic-bearing dusts and aerosol, in both occupational and environmental settings, have been definitively linked to increased systemic uptake, as well as carcinogenic and non-carcinogenic health outcomes. It is therefore becoming increasingly important to identify human populations and sensitive sub-populations at risk of exposure, and to better understand the modes of action for pulmonary arsenic toxicity and carcinogenesis. In this paper we explore the contribution of smelting, coal combustion, hard rock mining and their associated waste products to atmospheric arsenic. We also report on the current understanding of the health effects of inhaled arsenic, citing results from various toxicological, biomedical and epidemiological studies. This review is particularly aimed at those researchers engaged in the distinct, but complementary areas of arsenic research within the multidisciplinary field of medical geology.

  5. E-commerce Website Recommender System Based on Dissimilarity and Association Rule

    Directory of Open Access Journals (Sweden)

    MingWang Zhang

    2013-07-01

    Full Text Available By analyzing the current electronic commerce recommendation algorithm analysis, put forward a kind to use dissimilarity clustering and association recommendation algorithm, the algorithm realized web website shopping user data clustering by use of the dissimilarity, and then use the association rules algorithm for clustering results of association recommendation, experiments show that the algorithm compared with traditional clustering association algorithm of iteration times decrease, improve operational efficiency, to prove the method by use of the actual users purchase the recommended, and evidence of the effectiveness of the algorithm in recommendation.  

  6. Algorithm of Intrusion Detection Based on Data Mining and Its Implementation

    Institute of Scientific and Technical Information of China (English)

    SUN Hai-bin; XU Liang-xian; CHEN Yan-hua

    2004-01-01

    Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined from audit logs. Some attributes in audit logs are important for detecting intrusion but their values are distributed skewedly. A relative support concept is proposed to deal with such situation. To mine class association rules effectively, an algorithms based on FP-tree is exploited. Experiment result proves that this method has better performance.

  7. Short-term optimal operation of Three-gorge and Gezhouba cascade hydropower stations in non-flood season with operation rules from data mining

    International Nuclear Information System (INIS)

    Highlights: ► Short-term optimal operation of Three-gorge and Gezhouba hydropower stations was studied. ► Key state variable and exact constraints were proposed to improve numerical model. ► Operation rules proposed were applied in population initiation step for faster optimization. ► Culture algorithm with difference evolution was selected as optimization method. ► Model and method proposed were verified by case study with feasible operation solutions. - Abstract: Information hidden in the characteristics and relationship data of a cascade hydropower stations can be extracted by data-mining approaches to be operation rules and optimization support information. In this paper, with Three-gorge and Gezhouba cascade hydropower stations as an example, two operation rules are proposed due to different operation efficiency of water turbines and tight water volume and hydraulic relationship between two hydropower stations. The rules are applied to improve optimization model with more exact decision and state variables and constraints. They are also used in the population initiation step to develop better individuals with culture algorithm with differential evolution as an optimization method. In the case study, total feasible population and the best solution based on an initial population with an operation rule can be obtained with a shorter computation time than that of a pure random initiated population. Amount of electricity generation in a dispatch period with an operation rule also increases with an average increase rate of 0.025%. For a fixed water discharge process of Three-gorge hydropower station, there is a better rule to decide an operation plan of Gezhouba hydropower station in which total hydraulic head for electricity generation is optimized and distributed with inner-plant economic operation considered.

  8. Data Mining E-protokol - Applying data mining techniques on student absence

    OpenAIRE

    Shrestha, Amardip; Bro Lilleås, Lauge; Hansen, Asbjørn

    2014-01-01

    The scope of this project is to explore the possibilities in applying data mining techniques for discovering new knowledge about student absenteeism in primary school. The research consists in analyzing a large dataset collected through the digital protocol system E-protokol. The data mining techniques used for the analysis involves clustering, classification and association rule mining, which are utilized using the machine learning toolset WEKA. The findings includes a number of suggestions ...

  9. Building associations between markers of environmental stressors and adverse human health impacts using frequent itemset mining

    Science.gov (United States)

    Building associations between markers of exposure and effect using frequent itemset mining The human-health impact of environmental contaminant exposures is unclear. While some exposure-effect relationships are well studied, health effects are unknown for the vast majority of the...

  10. NIOSH comments to DOL on the Mine Safety and Health Administration proposed rule on safety standards for underground coal mine ventilation by R. W. Niemeier, April 28, 1988

    Energy Technology Data Exchange (ETDEWEB)

    Niemeier, R.W.

    1988-04-28

    The testimony concerns data from NIOSH which indicate that the ambient carbon-monoxide (630080) levels in mines with diesel or electric equipment are almost always less than 10 parts per million (ppm) and that a carbon-monoxide system capable of detecting a 5ppm increase would be useful as an early warning system for fires. NIOSH would strongly recommend that this provision apply to all belt haulageways, trolley haulageways, or airways containing major electrical or mechanical installations being used to supply air to a work area in underground coal mines. Limiting the carbon-monoxide fire system only to mines using belt haulages for intake air would deprive other miners of an important safety benefit. Also discussed in the testimony are the continued use of electrical power for up to 30 minutes in the event of a mine fan shutdown, the relaxation of some of the requirements in the current standard, the mean entry air velocity, air quality, atmospheric monitoring systems, air quality detectors and air measurement devices, underground electrical installations, and belt conveyor entries.

  11. Metagenome-wide association studies: fine-mining the microbiome.

    Science.gov (United States)

    Wang, Jun; Jia, Huijue

    2016-08-01

    Metagenome-wide association studies (MWAS) have enabled the high-resolution investigation of associations between the human microbiome and several complex diseases, including type 2 diabetes, obesity, liver cirrhosis, colorectal cancer and rheumatoid arthritis. The associations that can be identified by MWAS are not limited to the identification of taxa that are more or less abundant, as is the case with taxonomic approaches, but additionally include the identification of microbial functions that are enriched or depleted. In this Review, we summarize recent findings from MWAS and discuss how these findings might inform the prevention, diagnosis and treatment of human disease in the future. Furthermore, we highlight the need to better characterize the biology of many of the bacteria that are found in the human microbiota as an essential step in understanding how bacterial strains that have been identified by MWAS are associated with disease. PMID:27396567

  12. Data Mining as Support to Knowledge Management in Marketing

    OpenAIRE

    Zekić-Sušac Marijana; Has Adela

    2015-01-01

    Background: Previous research has shown success of data mining methods in marketing. However, their integration in a knowledge management system is still not investigated enough. Objectives: The purpose of this paper is to suggest an integration of two data mining techniques: neural networks and association rules in marketing modeling that could serve as an input to knowledge management and produce better marketing decisions. Methods/Approach: Association rules and artificial neural networks ...

  13. The explore of corresponding rules of " syndrome-symptom-prescription-drugs"of drpression by the method of text mining%文本挖掘探索抑郁症“证-症-方-药”相应规律

    Institute of Scientific and Technical Information of China (English)

    展俊平; 姜淼; 张彤; 郑光; 吕诚; 蔡峰; 杨静; 何晓鹃; 梁非; 吕爱平

    2012-01-01

    Background: Explore the rules of Chinese herbal medicine under the framework of syndrome in traditional Chinese medicine (TCM). Methods: Download the data set on depression from Chinese BioMedical literature database. Then, the rules of TCM syndrome, symptoms, and Chinese herbal medicines were mined out. The results are shown in frequency explanation and 2-dimention networks. Results: For TCM pattern, depression is mainly focused on intermingled deficiency and excess syndrome. For TCM internal organs, liver , heart, spleen and are involved. For syndromes, insomnia,depression,pessimism are the most significant , For Chinese herbal medicines, medicines on tonification are involved. Conclusions: Text mining, together with artificial reading for anti-noising, is an important approach in exploring the rules of TCM syndrome, symptom, and their association with Chinese herbal medicines.%目的:探索抑郁症证药相应规律.方法:采用基于敏感关键词频数统计的数据分层算法,挖掘抑郁症的证候、症状、汤药及中药的规律.结果:抑郁症虚实夹杂,脏腑以肝为主,涉及心、脾、肾;证侯以肝气郁结和肝郁脾虚为主;核心症状为失眠、情绪低落等精神障碍;汤药和中药的使用均以疏肝解郁、健脾养心安神为主.结论:文本挖掘技术、结合文献回溯和人工阅读降噪,能够比较客观地总结中医“证-症-方-药”的规律.

  14. A Survey of latest Algorithms for Frequent Itemset Mining in Data Stream

    Directory of Open Access Journals (Sweden)

    U.Chandrasekhar

    2013-03-01

    Full Text Available Association rule mining and finding frequent patterns in data base has been a very old topic. With the advent of Big Data, the need for stream mining has increased. Hence the paper surveys various latest frequent pattern mining algorithms on data streams to understand various problems to be solved, their short comings and advantages over others.

  15. Integrated assessmet of the impacts associated with uranium mining and milling

    Energy Technology Data Exchange (ETDEWEB)

    Parzyck, D.C.; Baes, C.F. III; Berry, L.G.

    1979-07-01

    The occupational health and safety impacts are assessed for domestic underground mining, open pit mining, and milling. Public health impacts are calculated for a population of 53,000 located within 88 km (55 miles) of a typical southwestern uranium mill. The collective annual dose would be 6.5 man-lung rem/year, 89% of which is from /sup 222/Rn emitted from mill tailings. The dose to the United States population is estimated to be 6 x 10/sup 4/ man-lung rem from combined mining and milling operations. This may be comparedd with 5.7 x 10/sup 5/ man-lung rem from domestic use of natural gas and 4.4 x 10/sup 7/ man-lung rem from building interiors. Unavoidable adverse environmental impacts appear to be severe in a 250 ha area surrounding a mill site but negligible in the entire potentially impacted area (500,000 ha). The contemporary uranium resource and supply industry and its institutional settings are described in relation to the socio-economic impacts likely to emerge from high levels of uranium mining and milling. Radon and radon daughter monitoring techniques associated with uranium mining and milling are discussed.

  16. Integrated assessmet of the impacts associated with uranium mining and milling

    International Nuclear Information System (INIS)

    The occupational health and safety impacts are assessed for domestic underground mining, open pit mining, and milling. Public health impacts are calculated for a population of 53,000 located within 88 km (55 miles) of a typical southwestern uranium mill. The collective annual dose would be 6.5 man-lung rem/year, 89% of which is from 222Rn emitted from mill tailings. The dose to the United States population is estimated to be 6 x 104 man-lung rem from combined mining and milling operations. This may be comparedd with 5.7 x 105 man-lung rem from domestic use of natural gas and 4.4 x 107 man-lung rem from building interiors. Unavoidable adverse environmental impacts appear to be severe in a 250 ha area surrounding a mill site but negligible in the entire potentially impacted area (500,000 ha). The contemporary uranium resource and supply industry and its institutional settings are described in relation to the socio-economic impacts likely to emerge from high levels of uranium mining and milling. Radon and radon daughter monitoring techniques associated with uranium mining and milling are discussed

  17. Impact of gold mining associated with mercury contamination in soil, biota sediments and tailings in Kenya.

    Science.gov (United States)

    Odumo, Benjamin Okang'; Carbonell, Gregoria; Angeyo, Hudson Kalambuka; Patel, Jayanti Purshottam; Torrijos, Manuel; Rodríguez Martín, José Antonio

    2014-11-01

    This work considered the environmental impact of artisanal mining gold activity in the Migori-Transmara area (Kenya). From artisanal gold mining, mercury is released to the environment, thus contributing to degradation of soil and water bodies. High mercury contents have been quantified in soil (140 μg kg(-1)), sediment (430 μg kg(-1)) and tailings (8,900 μg kg(-1)), as expected. The results reveal that the mechanism for transporting mercury to the terrestrial ecosystem is associated with wet and dry depositions. Lichens and mosses, used as bioindicators of pollution, are related to the proximity to mining areas. The further the distance from mining areas, the lower the mercury levels. This study also provides risk maps to evaluate potential negative repercussions. We conclude that the Migori-Transmara region can be considered a strongly polluted area with high mercury contents. The technology used to extract gold throughout amalgamation processes causes a high degree of mercury pollution around this gold mining area. Thus, alternative gold extraction methods should be considered to reduce mercury levels that can be released to the environment.

  18. Association text classification of mining ItemSet significance%挖掘重要项集的关联文本分类

    Institute of Scientific and Technical Information of China (English)

    蔡金凤; 白清源

    2011-01-01

    针对在关联规则分类算法的构造分类器阶段中只考虑特征词是否存在,忽略了文本特征权重的问题,基于关联规则的文本分类方法(ARC-BC)的基础上提出一种可以提高关联文本分类准确率的ISARC(ItemSet Significance-based ARC)算法.该算法利用特征项权重定义了k-项集重要度,通过挖掘重要项集来产生关联规则,并考虑提升度对待分类文本的影响.实验结果表明,挖掘重要项集的ISARC算法可以提高关联文本分类的准确率.%Text classification technology is an important basis of information retrieval and text mining,and its main task is to mark category according to a given category set.Text classification has a wide range of applications in natural language processing and understanding、information organization and management、information filtering and other areas.At present,text classification can be mainly divided into three groups: based on statistical methods、based on connection method and the method based on rules. The basic idea of the traditional association text classification algorithm associative rule-based classifier by category(ARC-BC) is to use the association rule mining algorithm Apriori which generates frequent items that appear frequently feature items or itemsets,and then use these frequent items as rule antecedent and category is used as rule consequent to form the rule set and then make these rules constitute a classifier.During classifying the test samples,if the test sample matches the rule antecedent,put the rule that belongs to the class counterm to the cumulative confidence.If the confidence of the category counter is the maximum,then determine the test sample belongs to that category. However,ARC-BC algorithm has two main drawbacks:(1) During the structure classifier,it only considers the existence of feature words and ignores the weight of text features for mining frequent itemsets and generated association rules

  19. DATA MINING TECHNIQUES: A SOURCE FOR CONSUMER BEHAVIOR ANALYSIS

    Directory of Open Access Journals (Sweden)

    Abhijit Raorane

    2011-09-01

    Full Text Available Various studies on consumer purchasing behaviors have been presented and used in real problems. Datamining techniques are expected to be a more effective tool for analyzing consumer behaviors. However, thedata mining method has disadvantages as well as advantages.Therefore, it is important to selectappropriate techniques to mine databases. The objective of this paper is to know consumer behavior, hispsychological condition at the time of purchase and how suitable data mining method apply to improveconventional method. Moreover, in an experiment, association rule is employed to mine rules for trustedcustomers using sales data in a super market industry

  20. Applied data mining for business and industry

    CERN Document Server

    Giudici, Paolo

    2009-01-01

    The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...

  1. Air Pollution Monitoring & Tracking System Using Mobile Sensors and Analysis of Data Using Data Mining

    Directory of Open Access Journals (Sweden)

    Umesh M. Lanjewar, J. J. Shah

    2012-12-01

    Full Text Available This study proposes air pollution monitoring systemand analysis of pollution data using association ruledata mining technique. Association rule datamining technique aims at finding associationpatterns among various parameters. In this paper,association rule mining is presented for findingassociation patterns among various air pollutants.For this, Apriori algorithm of association rule datamining is used. Apriori is characterized as a level -by-level complete search algorithm. This algorithmis applied on data captured by various gas sensorsfor CO, NO2 and SO2 sensors. As association rulemining can produce several sequence rules ofcontaminants, the proposed system design canenhance the reproducibility, reliability andselectivity of air pollution sensor output.

  2. 数据挖掘中的关联规则%The Relationship Rule of Data Mining

    Institute of Scientific and Technical Information of China (English)

    戴稳胜; 匡宏波; 谢邦昌

    2002-01-01

    The paper describes the classification of relationship rule and its effects judging standards and the realization procedure through computers。The paper fully introduces to the relevant knowledge about rela-tionship rule。

  3. Mining geographic episode association patterns of abnormal events in global earth science data

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Abnormal events in earth science have great influence on both the natural envi-ronment and the human society. Finding association patterns among these events has great significance. Because data in earth science has characteristics of mass,high dimension,spatial autocorrelation and time delay,existing mining technolo-gies cannot be directly used on it. We propose a RSNN (range-based searching nearest neighbors) spatial clustering algorithm to reduce the data size and auto-correlation. Based on the clustered data,we propose a GEAM (geographic episode association pattern mining) algorithm which can deal with events time lags and find interesting patterns with specific constraints,to mine the association patterns. We carried out experiments on global climate datasets and found many interesting association patterns. Some of the patterns are coincident with known knowledge in climate science,which indicates the correctness and feasibilities of our methods,and the others are unknown to us before,which will give new information to this research field.

  4. Mining floating train data sequences for temporal association rules within a predictive maintenance framework

    OpenAIRE

    SAMMOURI, Wissam; COME, Etienne; OUKHELLOU, Latifa; Aknin, Patrice

    2013-01-01

    In order to meet the mounting social and economic demands, railway operators and manufacturers are striving for a longer availability and a better reliability of railway transportation systems. Commercial trains are being equipped with state-of-the-art onboard intelligent sensors monitoring various subsystems all over the train. These sensors provide real-time spatio-temporal data consisting of georeferenced timestamped events that tend sometimes to occur in bursts. Once ordered with respect ...

  5. Text Classification using the Concept of Association Rule of Data Mining

    OpenAIRE

    Rahman, Chowdhury Mofizur; Sohel, Ferdous Ahmed; Naushad, Parvez; Kamruzzaman, S. M.

    2010-01-01

    As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. In this paper we will discuss a procedure of...

  6. The Reviewer's Assistant: Recommending Topics to Writers by Association Rule Mining and Case-base Reasoning

    OpenAIRE

    Dong, Ruihai; Schaal, Markus; O'Mahony, Michael P.; Smyth, Barry

    2012-01-01

    Today, online reviews for products and services have become an important class of user-generated content and they play a valuable role for countless online businesses by helping to convert casual browsers into informed and satisfied buyers. As users gravitate towards sites that offer insightful and objective reviews, the ability to source helpful reviews from a community of users is increasingly important. In this extended abstract we describe the Reviewer’s Assistant, a case-based reasoning ...

  7. Association Rule Mining Based Extraction of Semantic Relations Using Markov Logic Network

    Directory of Open Access Journals (Sweden)

    K.Karthikeyan

    2014-10-01

    Full Text Available Ontology may be a conceptualization of a website into a human understandable, however machine - readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the in tentional aspects of a site, whereas the denotative part is provided by a mental object that contains assertions about instances of concepts and relations. Semantic relation it might be potential to extract the whole family - tree of a outstanding personalit y employing a resource like Wikipedia. In a way, relations describe the linguistics relationships among the entities involve that is beneficial for a higher understanding of human language. The relation can be identified from the result of concept hierarch y extraction. The existing ontology learning process only produces the result of concept hierarchy extraction. It does not produce the semantic relation between the concepts. Here, we have to do the process of constructing the predicates and also first ord er logic formula. Here, also find the inference and learning weights using Markov Logic Network. To improve the relation of every input and also improve the relation between the contents we have to propose the concept of ARSRE. This method can find the fre quent items between concepts and converting the extensibility of existing lightweight ontologies to formal one. The experimental results can produce the good extraction of semantic relations compared to state - of - art method

  8. Leaf Associated Microbial Activities in a Stream Affected by Acid Mine Drainage

    Science.gov (United States)

    Schlief, Jeanette

    2004-11-01

    Microbial activity was assessed on birch leaves and plastic strips during 140 days of exposure at three sites in an acidic stream of the Lusatian post-mining landscape, Germany. The sites differed in their degrees of ochre deposition and acidification. The aim of the study was (1) to follow the microbial activities during leaf colonization, (2) to compare the effect of different environmental conditions on leaf associated microbial activities, and (3) to test the microbial availability of leaf litter in acidic mining waters. The activity peaked after 49 days and subsequently decreased gradually at all sites. A formation of iron plaques on leaf surfaces influenced associated microbial activity. It seemed that these plaques inhibit the microbial availability of leaf litter and serve as a microbial habitat by itself. (

  9. Redirect Technology Based on Interest Association Rules in NAT%NAT环境下基于兴趣关联规则的重定向技术

    Institute of Scientific and Technical Information of China (English)

    叶鑫; 刘宏志; 安思

    2011-01-01

    针对IP地址资源的匮乏和网络攻击手段的多样化,NAT环境下的网络安全性要求不断提高.文章研究了在NAT环境下的安全策略,并引入入侵重定向的概念.在Apriori算法挖掘的基础上,引入关联规则兴趣度,对入侵行为进行深度挖掘,将具有威胁的网络访问重定向到特定环境中,通过访问规则数据库与防火墙规则、IDS数据库的交互,提高了网络防御的主动性,与结合NAT的防火墙技术实现了对网络的双重保护.%For the lack of resources of IP address and diversification of network attack measures, the requirement of network security in NAT environment is raised continuously. This paper research on the security policy in NAT environment, and presents the concept of intrusion redirection. Based on Apriori algorithm mining, this paper proposes Interest degree of association rules, performs deep mining of intrusion behaviors, redirects the network access with threat into a specific environment, performs interaction with firewall rules and IDS database through accessing the rule database, improving the network defense initiative, and the firewall technology combined with NAT can implement duplex protection of the network as well.

  10. EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    D.Kerana Hanirex

    2011-03-01

    Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.

  11. A Meta-information-Based Method for Rough Sets Rule Parallel Mining%基于元信息的粗糙集规则并行挖掘方法

    Institute of Scientific and Technical Information of China (English)

    苏健; 高济

    2003-01-01

    Rough sets is one important method of data mining. Data mining processes such a great quantity of data inlarge database that the speed of Rough Sets Data Mining Algorithm is critical to Data Mining System. Utilizing net-work computing resources is an effective approach to improve the performance of Data Mining System. This paperproposes the concept of meta-information,which is used to describes the result of Rough Sets Data Mining in informa-tion system,and a meta-information-based method for rule parallel mining. This method decomposes the information-system into a lot of sub-information-system,dispatchs the task of generating meta-information of sub-information-sys-tem to some task performer in the network,and lets them parallel compute meta-information,then synthesizes themeta-information of sub-information-system to the meta-information of information system in the task synthesizer,and finally produces the rule according to the meta-information.

  12. An assessment of microbial communities associated with surface mining-disturbed overburden.

    Science.gov (United States)

    Poncelet, Dominique M; Cavender, Nicole; Cutright, Teresa J; Senko, John M

    2014-03-01

    To assess the microbiological changes that occur during the maturation of overburden that has been disturbed by surface mining of coal, a surface mining-disturbed overburden unit in southeastern Ohio, USA was characterized. Overburden from the same unit that had been disturbed for 37 and 16 years were compared to undisturbed soil from the same region. Overburden and soil samples were collected as shallow subsurface cores from each subregion of the mined area (i.e., land 16 years and 37 years post-mining, and unmined land). Chemical and mineralogical characteristics of overburden samples were determined, as were microbial respiration rates. The composition of microbial communities associated with overburden and soil were determined using culture-independent, nucleic acid-based approaches. Chemical and mineralogical evaluation of overburden suggested that weathering of disturbed overburden gave rise to a setting with lower pH and more oxidized chemical constituents. Overburden-associated microbial biomass and respiration rates increased with time after overburden disturbance. Evaluation of 16S rRNA gene libraries that were produced by "next-generation" sequencing technology revealed that recently disturbed overburden contained an abundance of phylotypes attributable to sulfur-oxidizing Limnobacter spp., but with increasing time post-disturbance, overburden-associated microbial communities developed a structure similar to that of undisturbed soil, but retained characteristics of more recently disturbed overburden. Our results indicate that over time, the biogeochemical weathering of disturbed overburden leads to the development of geochemical conditions and microbial communities that approximate those of undisturbed soil, but that this transition is incomplete after 37 years of overburden maturation.

  13. The Stability of Memory Rules Associative with the Mathematical Thinking Core

    Directory of Open Access Journals (Sweden)

    Xiuzhen Wang

    2011-02-01

    Full Text Available Activation of how and where arithmetic operations are displayed in the brain has been observed in various number-processing tasks. However, it remains poorly understood whether stabilized memory of Boolean rules are associated with background knowledge. The present study reviewed behavioral and imaging evidence demonstrating that Boolean problem-solving abilities depend on the core systems of number-processing. The core systems account for a mathematical cultural background, and serve as the foundation for sophisticated mathematical knowledge. The Ebbinghaus paradigm was used to investigate learning-induced changes by functional magnetic resonance imaging (fMRI in a retrieval task of Boolean rules. Functional imaging data revealed a common activation pattern in the left inferior parietal lobule and left inferior frontal gyrus during all Boolean tasks, which has been used for number-processing processing in former studies. All other regional activations were tasks-specific and prominently distributed in the left thalamus, bilateral parahippocampal gyrus, bilateral occipital lobe, and other subcortices during contrasting stabilized memory retrieval of Boolean tasks and number-processing tasks. The present results largely verified previous studies suggesting that activation patterns due to number-processing appear to reflect a basic anatomical substrate of stability of Boolean rules memory, which are derived from a network originally related to the core systems of number-processing.

  14. 基于Apriori算法的购物篮关联规则分析%Apriori Algorithm Based on Association Rules Analysis of the Shopping Basket

    Institute of Scientific and Technical Information of China (English)

    赵祖应; 丁勇; 邓平

    2012-01-01

    Data mining is the new discipline evolved due to the need of information retrieval from immense amount of data in databases.It relates to subjects in statistics,machine learning,database technique,pattern recognition,artificial intelligence,etc.The competition in IT jobs market is enormous,and data mining-the core technique in data processingis gaining more and more attention.Association rules are commonly used to figure out what relations exist between different data sets in transactional databases and to find out further the customers′purchasing behavior pattern,for example,the influence on customers′buying other products after having bought some kind of products.These rules can be applied in supermarkets to product shelf design,goods deposit and classification of customers according to customers′purchasing pattern.Through discovering of the association rules the development and trend of the underlying objects can be better realized and mastered.In marketing and business investment data mining plays an important role.%数据挖掘是适应信息社会从海量的数据库中提取信息的需要而产生的新学科。它是统计学、机器学习、数据库、模式识别、人工智能等学科的交叉。IT就业市场竞争已经相当激烈,而数据处理的核心技术——数据挖掘更是得到了前所未有的重视。关联规则一般用以发现交易数据库中不同商品(项)之间的联系,用这些规则找出顾客的购买行为模式,比如购买了某一种商品对购买其他商品的影响,这种规则可以应用于超市商品货架设计、货物摆放以及根据购买模式对用户进行分类等。通过发现这个关联的规则,可以更好地了解和掌握事物的发展、动向等。在市场营销、企业投资中具有重要的作用。

  15. Study on Association between Spatial Distribution of Metal Mines and Disease Mortality: A Case Study in Suxian District, South China

    Directory of Open Access Journals (Sweden)

    Wei Chen

    2013-10-01

    Full Text Available Metal mines release toxic substances into the environment and can therefore negatively impact the health of residents in nearby regions. This paper sought to investigate whether there was excess disease mortality in populations in the vicinity of the mining area in Suxian District, South China. The spatial distribution of metal mining and related activities from 1985 to 2012, which was derived from remote sensing imagery, was overlapped with disease mortality data. Three hotspot areas with high disease mortality were identified around the Shizhuyuan mine sites, i.e., the Dengjiatang metal smelting sites, and the Xianxichong mine sites. Disease mortality decreased with the distance to the mining and smelting areas. Population exposure to pollution was estimated on the basis of distance from town of residence to pollution source. The risk of dying according to disease mortality rates was analyzed within 7–25 km buffers. The results suggested that there was a close relationship between the risk of disease mortality and proximity to the Suxian District mining industries. These associations were dependent on the type and scale of mining activities, the area influenced by mining and so on.

  16. DDMGD: the database of text-mined associations between genes methylated in diseases from different species

    KAUST Repository

    Raies, A. B.

    2014-11-14

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  17. Could parental rules play a role in the association between short sleep and obesity in young children?

    Science.gov (United States)

    Jones, Caroline H D; Pollard, Tessa M; Summerbell, Carolyn D; Ball, Helen

    2014-05-01

    Short sleep duration is associated with obesity in young children. This study develops the hypothesis that parental rules play a role in this association. Participants were 3-year-old children and their parents, recruited at nursery schools in socioeconomically deprived and non-deprived areas of a North-East England town. Parents were interviewed to assess their use of sleep, television-viewing and dietary rules, and given diaries to document their child's sleep for 4 days/5 nights. Children were measured for height, weight, waist circumference and triceps and subscapular skinfold thicknesses. One-hundred and eight families participated (84 with complete sleep data and 96 with complete body composition data). Parental rules were significantly associated together, were associated with longer night-time sleep and were more prevalent in the non-deprived-area compared with the deprived-area group. Television-viewing and dietary rules were associated with leaner body composition. Parental rules may in part confound the association between night-time sleep duration and obesity in young children, as rules cluster together across behavioural domains and are associated with both sleep duration and body composition. This hypothesis should be tested rigorously in large representative samples.

  18. An investigation of the factors associated with interpretation of mine atmosphere for spontaneous combustion in coal mines

    OpenAIRE

    Adamus, Alois; Šancer, Jindřich; Guřanová, Pavla; Zubíček, Václav

    2011-01-01

    The risk of spontaneous combustion of coal is highly serious especially in gaseous underground coal mines. In many cases such a spontaneous combustion is a source of initiation of methane-explosive mixture with tragic consequences. Early indication of spontaneous combustion and determination of its seat temperature is in a given environment a key part of safety of underground coal mines. A commonly used method for the detection of spontaneous combustion is an interpretation of coal oxidation ...

  19. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  20. A Descriptive Framework for the Multidimensional Medical Data Mining and Representation

    Directory of Open Access Journals (Sweden)

    Veeramalai Sankaradass

    2011-01-01

    Full Text Available Problem statement: Association rule mining with fuzzy logic was explored by research for effective datamining and classification. Approach: It was used to find all the rules existing in the transactional database that satisfy some minimum support and minimum confidence constraints. Results: In this study, we propose new rule mining technique using fuzzy logic for mining medical data in order to understand and better serve the needs of Multidimensional Breast cancer Data applications. Conclusion: The main objective of multidimensional Medical data mining is to provide the end user with more useful and interesting patterns. Therefore, the main contribution of this study is the proposed and implementation of fuzzy temporal association rule mining algorithm to classify and detect breast cancer from the dataset.

  1. 基于关联规则的软件多故障定位技术%Software-based multi-fault location technology based on association rule

    Institute of Scientific and Technical Information of China (English)

    张泽林; 赵洋

    2015-01-01

    为了提高软件故障的定位效率,提出一种基于关联规则的软件多故障定位技术。通过使用聚类方法把失败的测试用例分成针对特定错误的聚类,使用基于交叉表的软件故障定位方法发现软件中的故障,在定位过程中使用关联规则挖掘高可疑代码与软件故障的关系,提高故障定位的效率,最后对Siemens用例集和Tarantula方法进行对比。实验表明基于关联规则的软件多故障定位技术在软件多故障定位方面效率优于Tarantula方法。%In order to improve the efficiency of software⁃based fault localization,a software⁃based multi⁃fault localization technology based on association rule is proposed in this paper. With the clustering method,the failed test cases are sorted into clusters of specific errors,and then the software⁃based fault location method based on crosstab is used to find software faults. In positioning process,association rule is adopted to mine the relationship between high suspicious code and software failure to im⁃prove the efficiency of fault location. Finally,the proposed method and Tarantula method are compared on Siemens case set. The experiment results show that the multi⁃fault software location technology based on association rule is more efficient than Tarantu⁃la method.

  2. Data Mining for Quality Prediction in Textile Engineering

    Institute of Scientific and Technical Information of China (English)

    YANG Jian-guo; LI Bei-zhi; ZHAO Ya-mei

    2006-01-01

    A data mining method for quality prediction using association rule (DMAR) is presented in this paper.Association rule is used to mine the valuable relations of items among amounts of textile process data for ANN prediction model. DMAR consists of three main steps: setup knowledge data set; data cleaning and converting; find the item set with large supports and generate the expected rules.DMAR effectively improves the precision of prediction in yarn breaking. It rapidly gets rid of the negative influence of training parameters on prediction model. Then more satisfactory quality prediction result can be reached.

  3. Gaseous Oxidized Mercury Flux from Substrates Associated with Industrial Scale Gold Mining in Nevada, USA

    Science.gov (United States)

    Miller, M. B.

    2015-12-01

    Gaseous elemental and oxidized mercury (Hg) fluxes were measured in a laboratory setting from substrate materials derived from industrial-scale open pit gold mining operations in Nevada, USA. Mercury is present in these substrates at a range of concentrations (10 - 40000 ng g-1), predominantly of local geogenic origin in association with the mineralized gold ores, but altered and redistributed to a varying degree by subsequent ore extraction and processing operations, including deposition of Hg recently emitted to the atmosphere from large point sources on the mines. Waste rock, heap leach, and tailings material usually comprise the most extensive and Hg emission relevant substrate surfaces. All three of these material types were collected from active Nevada mine sites in 2010 for previous research, and have since been stored undisturbed at the University of Nevada, Reno. Gaseous elemental Hg (GEM) flux was previously measured from these materials under a variety of conditions, and was re-measured in this study, using Teflon® flux chambers and Tekran® 2537A automated ambient air analyzers. GEM flux from dry undisturbed materials was comparable between the two measurement periods. Gaseous oxidized Hg (GOM) flux from these materials was quantified using an active filter sampling method that consisted of polysulfone cation-exchange membranes deployed in conjunction with the GEM flux apparatus. Initial measurements conducted within greenhouse laboratory space indicate that in dry conditions GOM is deposited to relatively low Hg cap and leach materials, but may be emitted from the much higher Hg concentration tailings material.

  4. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases. PMID:27073839

  5. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

  6. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  7. Enterprise Human Resources Information Mining Based on Improved Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Lei He

    2013-05-01

    Full Text Available With the unceasing development of information and technology in today’s modern society, enterprises’ demand of human resources information mining is getting bigger and bigger. Based on the enterprise human resources information mining situation, this paper puts forward a kind of improved Apriori algorithm based model on the enterprise human resources information mining, this model introduced data mining technology and traditional Apriori algorithm, and improved on its basis, divided the association rules mining task of the original algorithm into two subtasks of producing frequent item sets and producing rule, using SQL technology to directly generating frequent item sets, and using the method of establishing chart to extract the information which are interested to customers. The experimental results show that the improved Apriori algorithm based model on the enterprise human resources information mining is better in efficiency than the original algorithm, and the practical application test results show that the improved algorithm is practical and effective.

  8. Classification Rule Mining Based on Improved Ant-miner Algorithm%基于改进Ant-miner算法的分类规则挖掘

    Institute of Scientific and Technical Information of China (English)

    肖菁; 梁燕辉

    2012-01-01

    为提高基于传统Ant-miner算法分类规则的预测准确性,提出一种基于改进Ant-miner的分类规则挖掘算法.利用样例在总样本中的密度及比例构造启发式函数,以避免在多个具有相同概率的选择条件下造成算法偏见.对剪枝规则按变异系数进行单点变异,由此扩大规则的搜索空间,提高规则的预测准确度.在Ant-miner算法的信息素更新公式中加入挥发系数,使其更接近现实蚂蚁的觅食行为,防止算法过早收敛.基于UCI标准数据的实验结果表明,该算法相比传统Ant-miner算法具有更高的预测准确度.%In order to improve the classification rule accuracy of the classical Ant-miner algorithm, this paper proposes an improved Ant-miner algorithm for classification rule mining. Heuristic function with sample density and sample proportion is constructed to avoid the bias caused by the same probability in Ant-miner. A pruning strategy with mutation probability is emploied to expand the search space and improve the rule accuracy. An evaporation coefficient in Ant-miner's pheromone update formula is added to slow down the convergence rate of the algorithm. Experimental results on UCI datasets show that the proposed algorithm is promising and can obtain higher predication accuracy than the original Ant-miner algorithm.

  9. The speed of learning instructed stimulus-response association rules in human: experimental data and model.

    Science.gov (United States)

    Bugmann, Guido; Goslin, Jeremy; Duchamp-Viret, Patricia

    2013-11-01

    Humans can learn associations between visual stimuli and motor responses from just a single instruction. This is known to be a fast process, but how fast is it? To answer this question, we asked participants to learn a briefly presented (200ms) stimulus-response rule, which they then had to rapidly apply after a variable delay of between 50 and 1300ms. Participants showed a longer response time with increased variability for short delays. The error rate was low and did not vary with the delay, showing that participants were able to encode the rule correctly in less than 250ms. This time is close to the fastest synaptic learning speed deemed possible by diffusive influx of AMPA receptors. Learning continued at a slower pace in the delay period and was fully completed in average 900ms after rule presentation onset, when response latencies dropped to levels consistent with basic reaction times. A neural model was proposed that explains the reduction of response times and of their variability with the delay by (i) a random synaptic learning process that generates weights of average values increasing with the learning time, followed by (ii) random crossing of the firing threshold by a leaky integrate-and-fire neuron model, and (iii) assuming that the behavioural response is initiated when all neurons in a pool of m neurons have fired their first spike after input onset. Values of m=2 or 3 were consistent with the experimental data. The proposed model is the simplest solution consistent with neurophysiological knowledge. Additional experiments are suggested to test the hypothesis underlying the model and also to explore forgetting effects for which there were indications for the longer delay conditions. This article is part of a Special Issue entitled Neural Coding 2012.

  10. Intrusion detection: a novel approach that combines boosting genetic fuzzy classifier and data mining techniques

    Science.gov (United States)

    Ozyer, Tansel; Alhajj, Reda; Barker, Ken

    2005-03-01

    This paper proposes an intelligent intrusion detection system (IDS) which is an integrated approach that employs fuzziness and two of the well-known data mining techniques: namely classification and association rule mining. By using these two techniques, we adopted the idea of using an iterative rule learning that extracts out rules from the data set. Our final intention is to predict different behaviors in networked computers. To achieve this, we propose to use a fuzzy rule based genetic classifier. Our approach has two main stages. First, fuzzy association rule mining is applied and a large number of candidate rules are generated for each class. Then the rules pass through pre-screening mechanism in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the specified classes. Classes are defined as Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L- remote to local. Second, an iterative rule learning mechanism is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. A Boosting mechanism evaluates the weight of each data item in order to help the rule extraction mechanism focus more on data having relatively higher weight. Finally, extracted fuzzy rules having the corresponding weight values are aggregated on class basis to find the vote of each class label for each data item.

  11. A Case Investigation of Product Structure Complexity in Mass Customization Using a Data Mining Approach

    DEFF Research Database (Denmark)

    Nielsen, Peter; Brunø, Thomas Ditlev; Nielsen, Kjeld

    2014-01-01

    This paper presents a data mining method for analyzing historical configuration data providing a number of opportunities for improving mass customization capabilities. The overall objective of this paper is to investigate how specific quantitative analyses, more specifically the association rule...

  12. Research on Product Family Configuration Based on Multidimensional Association Rules%基于多维关联规则的产品族配置研究

    Institute of Scientific and Technical Information of China (English)

    罗妤; 郭钢; 徐建萍

    2011-01-01

    A product configuration method based on association rule was introduced aiming at the choice of optional parts in product configuration of large complex products.A component-constraint-rule data warehouse(CCRDW) could be built with the information of product family BOM(bill of material) structure.According to the parameters of customer requirements,a data cube from the data warehouse could be established and by using multidimensional association rule data mining algorithm on the data cube,the appropriate components in materials store would be found.The configuration method of a gear box and its instance was presented.The method has advantages in enhancing the efficiency and reusability of the components.%针对产品配置中可选零部件的选择问题,提出了基于多维关联规则的产品族配置方法:根据产品族BOM结构,构建零部件约束规则数据仓库,设计人员根据客户的产品需求参数建立数据立方,并运用多维关联规则挖掘出物料库中潜在的、能满足配置需求的物料信息,实现产品的个性化配置。实例验证了该配置方法的可行性。该配置方法有效地提高了产品配置效率及零部件的重用性。

  13. A Knowledge Mining Model for Ranking Institutions using Rough Computing with Ordering Rules and Formal Concept Analysis

    Directory of Open Access Journals (Sweden)

    D P Acharjya

    2011-03-01

    Full Text Available Emergences of computers and information technological revolution made tremendous changes in the real world and provides a different dimension for the intelligent data analysis. Well formed fact, the information at right time and at right place deploy a better knowledge. However, the challenge arises when larger volume of inconsistent data is given for decision making and knowledge extraction. To handle such imprecise data certain mathematical tools of greater importance has developed by researches in recent past namely fuzzy set, intuitionistic fuzzy set, rough Set, formal concept analysis and ordering rules. It is also observed that many information system contains numerical attribute values and therefore they are almost similar instead of exact similar. To handle such type of information system, in this paper we use two processes such as pre process and post process. In pre process we use rough set on intuitionistic fuzzy approximation space with ordering rules for finding the knowledge whereas in post process we use formal concept analysis to explore better knowledge and vital factors affecting decisions.

  14. Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the demand of text classification is increasing tremendously. Keeping this demand into consideration, new and updated techniques are being developed for the purpose of automated text classification. This paper presents a new algorithm for text classification. Instead of using words, word relation i.e. association rules is used to derive feature set from pre-classified text documents. The concept of Naive Bayes Classifier is then used on derived features and finally a concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental ...

  15. Seismic failure mechanisms for loaded slopes with associated and nonassociated flow rules

    Institute of Scientific and Technical Information of China (English)

    YANG Xiao-li; SUI Zhi-rong

    2008-01-01

    Seismic failure mechanisms were investigated for soil slopes subjected to strip load with upper bound method of limit analysis and finite difference method of numerical simulation, considering the influence of associated and nonassociated flow rules. Quasi-static representation of soil inertia effects using a seismic coefficient concept was adopted for seismic failure analysis. Numerical study was conducted to investigate the influences of dilative angle and earthquake on the seismic failure mechanisms for the loaded slope, and the failure mechanisms for different dilation angles were compared. The results show that dilation angle has influences on the seismic failure surfaces, that seismic maximum displacement vector decreases as the dilation angle increases, and that seismic maximum shear strain rate decreases as the dilation angle increases.

  16. Radio-Ecological Situation in the Area of the Priargun Production Mining and Chemical Association - 13522

    Energy Technology Data Exchange (ETDEWEB)

    Semenova, M.P.; Seregin, V.A.; Kiselev, S.M.; Titov, A.V. [FSBI SRC A.I. Burnasyan Federal Medical Biophysical Center of FMBA of Russia, Zhivopisnaya Street, 46, Moscow (Russian Federation); Zhuravleva, L.A. [FSHE ' Centre of Hygiene and Epidemiology no. 107' under FMBA of Russia (Russian Federation); Marenny, A.M. [Ltd ' Radiation and Environmental Researches' (Russian Federation)

    2013-07-01

    'The Priargun Production Mining and Chemical Association' (hereinafter referred to as PPMCA) is a diversified mining company which, in addition to underground mining of uranium ore, carries out refining of such ores in hydrometallurgical process to produce natural uranium oxide. The PPMCA facilities are sources of radiation and chemical contamination of the environment in the areas of their location. In order to establish the strategy and develop criteria for the site remediation, independent radiation hygienic monitoring is being carried out over some years. In particular, this monitoring includes determination of concentration of the main dose-forming nuclides in the environmental media. The subjects of research include: soil, grass and local foodstuff (milk and potato), as well as media of open ponds (water, bottom sediments, water vegetation). We also measured the radon activity concentration inside surface workshops and auxiliaries. We determined the specific activity of the following natural radionuclides: U-238, Th-232, K-40, Ra-226. The researches performed showed that in soil, vegetation, groundwater and local foods sampled in the vicinity of the uranium mines, there is a significant excess of {sup 226}Ra and {sup 232}Th content compared to areas outside the zone of influence of uranium mining. The ecological and hygienic situation is as follows: - at health protection zone (HPZ) gamma dose rate outdoors varies within 0.11 to 5.4 μSv/h (The mean value in the reference (background) settlement (Soktui-Molozan village) is 0.14 μSv/h); - gamma dose rate in workshops within HPZ varies over the range 0.14 - 4.3 μSv/h. - the specific activity of natural radionuclides in soil at HPZ reaches 12800 Bq/kg and 510 Bq/kg for Ra-226 and Th-232, respectively. - beyond HPZ the elevated values for {sup 226}Ra have been registered near Lantsovo Lake - 430 Bq/kg; - the radon activity concentration in workshops within HPZ varies over the range 22 - 10800 Bq

  17. Radio-Ecological Situation in the Area of the Priargun Production Mining and Chemical Association - 13522

    International Nuclear Information System (INIS)

    'The Priargun Production Mining and Chemical Association' (hereinafter referred to as PPMCA) is a diversified mining company which, in addition to underground mining of uranium ore, carries out refining of such ores in hydrometallurgical process to produce natural uranium oxide. The PPMCA facilities are sources of radiation and chemical contamination of the environment in the areas of their location. In order to establish the strategy and develop criteria for the site remediation, independent radiation hygienic monitoring is being carried out over some years. In particular, this monitoring includes determination of concentration of the main dose-forming nuclides in the environmental media. The subjects of research include: soil, grass and local foodstuff (milk and potato), as well as media of open ponds (water, bottom sediments, water vegetation). We also measured the radon activity concentration inside surface workshops and auxiliaries. We determined the specific activity of the following natural radionuclides: U-238, Th-232, K-40, Ra-226. The researches performed showed that in soil, vegetation, groundwater and local foods sampled in the vicinity of the uranium mines, there is a significant excess of 226Ra and 232Th content compared to areas outside the zone of influence of uranium mining. The ecological and hygienic situation is as follows: - at health protection zone (HPZ) gamma dose rate outdoors varies within 0.11 to 5.4 μSv/h (The mean value in the reference (background) settlement (Soktui-Molozan village) is 0.14 μSv/h); - gamma dose rate in workshops within HPZ varies over the range 0.14 - 4.3 μSv/h. - the specific activity of natural radionuclides in soil at HPZ reaches 12800 Bq/kg and 510 Bq/kg for Ra-226 and Th-232, respectively. - beyond HPZ the elevated values for 226Ra have been registered near Lantsovo Lake - 430 Bq/kg; - the radon activity concentration in workshops within HPZ varies over the range 22 - 10800 Bq/m3. The seasonal dependence of

  18. Prevalence and factors associated with obesity amongst employees of open-cast diamond mine in Namibia

    Directory of Open Access Journals (Sweden)

    Desderius Haufiku

    2015-09-01

    Full Text Available The study investigated the prevalence and factors associated with obesity amongst employees of Pocket Beaches mine. Obesity rates are increasing at an alarming rate worldwide; 1.2 billion people worldwide are overweight of which 300 million are clinically obese. Of concern, is that obesity is a risk factor for many diseases, including hypertension, diabetes and other forms of cancers. Although there are several mine workers who on reporting to occupational health services for minor ailment are found to be overweight or obese, we are not certain about the extent with the problem. The health risk associated with obesity could cause a big loss to NAMDEB in terms of care cost, low productivity and absenteeism. The aim of this study was to investigate the prevalence and determinants of obesity amongst NAMDEB employees working at Pocket Beaches diamond mine.a descriptive; cross-sectional study measured the prevalence of obesity and describes the factors that are associated with obesity and overweight. Study population: NAMDEB employees who were working at Pocket Beaches mine. A simple random sampling technique was used to select participants. Eighty seven employees were selected from 188 total NAMDEB employees working at Pocket Beaches mine. Data was collected through interviews. Anthropometric measurements namely, weight, height and abdominal circumference were collected using a standard protocol. Data was analyzed using Epi Info 2002. Body Mass Index (BMI was calculated as kg/m2. Overweight was defined as BMI = 25 to 29.9 kg/m2 and obesity as BMI ≥ 30 kg/m2. Waist Circumference ≥80 cm was used to identify central obesity in women and ≥90 cm in men. The frequency of participation in physical activity, barriers to physical activity and food consumption is reported in percent and means. The study found prevalence 42% overweight and 32% obesity among employees of NAMDEB. A significant number of participants 48% never participate in moderate

  19. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    Directory of Open Access Journals (Sweden)

    Joanna F Dipnall

    Full Text Available Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010. Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30, serum glucose (OR 1.01; 95% CI 1.00, 1.01 and total bilirubin (OR 0.12; 95% CI 0.05, 0.28. Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016, and current smokers (p<0.001.The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling

  20. Hierarchical Approach for Online Mining--Emphasis towards Software Metrics

    CERN Document Server

    Saradhi, M V Vijaya; Satish, P

    2010-01-01

    Several multi-pass algorithms have been proposed for Association Rule Mining from static repositories. However, such algorithms are incapable of online processing of transaction streams. In this paper we introduce an efficient single-pass algorithm for mining association rules, given a hierarchical classification amongest items. Processing efficiency is achieved by utilizing two optimizations, hierarchy aware counting and transaction reduction, which become possible in the context of hierarchical classification. This paper considers the problem of integrating constraints that are Boolean expression over the presence or absence of items into the association discovery algorithm. This paper present three integrated algorithms for mining association rules with item constraints and discuss their tradeoffs. It is concluded that the variation of complexity depends on the measure of DIT (Depth of Inheritance Tree) and NOC (Number of Children) in the context of Hierarchical Classification.

  1. Extract Knowledge and Association Rule from Free Log Data using an Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Hemant N. Randhir

    2013-09-01

    Full Text Available This paper aims to present technique to make private log information public and apply Apriori algorithm on collected log file to extract knowledge from public and free log files with Web Usages Mining Technique.

  2. Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test results analysis module of team psychological health assessment system, and will extract the relationship between each psychological impact factor, so as to provide certain guidance for psychologists in their mental illness treatment.

  3. Pollution of the stream waters and sediments associated with the Crucea uranium mine (East Carpathians, Romania)

    Science.gov (United States)

    Petrescu, L.; Bilal, E.; Iatan, E. L.

    2009-04-01

    standards limits. The uranium concentration ranged from a value of 0.016-mg•L-1 to 1.43-mg•L-1, with a mean of 0.365-mg•L-1. A remarkably good correlation exists between dissolved U and the total anion concentrations, indicating that uranium in these stream waters derived mainly from oxidation of uraniferous bitumen and/or dissolution of carbonates. Based on the correlation dependence (r= 0.69) between U and the sum of Ca + Mg + K + Na major cations and the linear correlation (r= 0.70) between U and silica, we find silicate weathering as an additional source of soluble uranium. The concentrations of dissolved Th are quite low, with median values of 0.015- mg•L-1. The linear variation of dissolved thorium concentration with carbonate alkalinity (r = 0.86) strongly suggests that these concentrations are due to the increase alkalinity. The metals released (U, Th and Pb) are amplified by mining activities. The pollution degree of the sediments was classified using the index of geo-accumulation (Igeo). The Igeo of U, Th and Pb presents medium and punctual high values that represent sediments with strongly to extremely polluted classification (Igeo > 6), while the rest of the elements presents concentration close to the background values or lowers to them. 71% of uranium from bottom sediments is present as primary fractions and 21% is associated to carbonates. Thorium resulted even more insoluble (94% in primary fractions). In view of the substantial mobility and bioavailability of the fractions, this is not an alarming feature. Although neither U nor Th has an appreciable "exchangeable" fraction, the isolation of specific U- and Th-rich sediment fractions helped to identify connections between bioavailability and genesis of sediments, which control ecosystem cycling of U and Th. The measurements carried out in the surroundings of a local uranium mine show that the impact of Crucea mine on water quality downstream of mining area is insignificant.

  4. Mercury contamination associated with small-scale gold mining in Tanzania and Zimbabwe.

    Science.gov (United States)

    van Straaten, P

    2000-10-01

    Mercury contamination associated with small-scale gold mining and processing represents a major environmental and human health concern in Eastern and Southern Africa. Approximately 200,000-300,000 persons are involved in small-scale gold mining activities in Tanzania and > 200,000 persons in Zimbabwe. Mercury (Hg) is used mainly for the processing of primary gold quartz veins and supergene gold mineralizations. Gravimetric material flow analyses show that 70-80% of the Hg is lost to the atmosphere during processing, 20-30% are lost to tailings, soils, stream sediments and water. For every 1 g Au produced, 1.2-1.5 g Hg are lost to the environment. Cumulatively, the anthropogenic Hg released annually into the atmosphere is approximately 3-4 t in the whole Lake Victoria Goldfields of Tanzania and > 3 t in Zimbabwe. Tailings are local 'hot spots' with high concentrations of As, Pb, Cu and Hg. Lateral and vertical dispersion of Hg lost to soils and stream sediments is very limited (laterally Dispersion of mercury from tailings is low because Hg is transported largely in the elemental, metallic form. In addition, Fe-oxide rich laterites and swamps appear to be natural barriers for the dispersion of metals in soils and streams. Ground and surface water quality data indicate very low dispersion rates during the dry season.

  5. Web Log Mining using Improved Version of Proposed Algorithm

    OpenAIRE

    Manish Shrivastava; Kapil Sharma; Angad Singh

    2011-01-01

    Association Rule mining is one of the important and most popular data mining technique. It extracts interesting correlations, frequent patterns and associations among sets of items in the transaction databases or other data repositories. Most of the existing algorithms require multiple passes over the database for discovering frequent patterns resulting in a large number of disk reads and placing a huge burden on the input/output subsystem. In order to reduce repetitive disk read, a novel met...

  6. 文本挖掘探讨青风藤用药规律研究%Treatment Rules of Sinomenium Acutum by Text Mining

    Institute of Scientific and Technical Information of China (English)

    李雨彦; 郑光; 刘良

    2015-01-01

    Objective:The study summarized the treatment rules of Sinomenium acutum (Menispermaceae,SA)using text mining techniques.Methods:Firstly,we conducted text-mining by collecting related literatures about SA from Chinese Biomedical Litera-ture (CBM)Database.Then structured query language was used to do data processing as well as data stratification.Algorithm was used to analyze the basic laws of symptom,TCM pattern,TCM herb compatibility and drug combination.Results:Sinomenium Acutum was mainly used to treat diseases with symptoms such as ache,swelling,stiffness,malformation,etc.Wind,cold,wet-ness,heat,sputum,stasis and deficiency were the main etiology and pathology.Sinomenium Acutum was always used in combina-tion with herbs with the functions of dispelling wind and eliminating dampness,nourishing the blood and promoting blood circula-tion,dredging collaterals,warming meridians and nourishing kidney.Conclusion:By text mining we summarized the treatment rules of Sinomenium Acutum in a systematic,comprehensive and precise way,providing literature basis for future clinical applica-tion and drug research.%目的:基于文本挖掘技术探讨青风藤用药规律。方法:在 CBM数据库中检索、下载所有涉及青风藤的文献,通过清洗、降噪及关键词频统计的数据分层算法,挖掘青风藤治疗疾病的规律,症状、证型的分布规律,中药配伍、中成药、西药、汤剂、针灸联用规律,并进行规律的可视化展示。结果:青风藤主要治疗以疼痛、肿胀、强直、畸形为主的病证,中医病证要素涉及风、寒、湿、热、痰、瘀、虚。疾病以现代医学的类风湿关节炎为主,涉及多种风湿类疾病以及慢性肾炎、肝炎、心律失常等。中药应用方面,青风藤多与祛风除湿类、养血活血类、通络类、温经类及补肾类中药合用。此外,青风藤多与雷公藤多苷、活络丸等调节免疫、通络药物联用。结论:数据

  7. Characterization and resource recovery potential of precipitates associated with abandoned coal mine drainage

    Energy Technology Data Exchange (ETDEWEB)

    Kairies, C.L.; Watzlaf, G.R.; Hedin, R.S.; Capo, R.C. [University of Pittsburgh, Pittsburgh, PA (United States). Dept. of Geology and Planetary Science

    2001-07-01

    Sludge samples from untreated and passively treated coal mine drainage discharges were characterized using NAA, ICP-AES, XRD and SEM. Iron content ranges from 25 to 68 dry wt%, and goethite is the dominant mineral (40-90 dry wt%). The majority of particles have a spiky spherical morphology (0.5-2.0 {mu}m diameter). Within several passive treatment systems, iron content remains relatively constant, and concentrations of Mn, Co, Ni and Zn increase, while As concentration decrease. Initial findings indicate that some sludges are suitable for industrial and manufacturing uses although high concentrations of trace elements such as As may prevent use in cosmetics or foods. These associations could be related to the depositional environment of the coal seam from which the discharge originates. Subsurface cation exchange and sorption processes can influence the trace elements that accumulate in the sludge. 5 refs., 1 tab.

  8. Associated rules between microstructure characterization parameters and contact characteristic parameters of two cylinders

    Institute of Scientific and Technical Information of China (English)

    周炜; 唐进元; 何艳飞; 廖东日

    2015-01-01

    The contact strength calculation of two curved rough surfaces is a forefront issue of Hertz contact theory and method. Associated rules between rough surface characterization parameters(correlation length, and root mean square deviation) and contact characteristic parameters(contact area, maximum contact pressure, contact number, and contact width) of two rough cylinders are mainly studied. The contact model of rough cylinders is deduced based on GW model. As there is no analytical solution for the pressure distribution equation, an approximate iterative solution method for the pressure distribution is adopted. Furthermore, the quantitative relationships among the correlation length, the root mean square deviation, the asperity radius of curvature and the asperity density are also obtained based on a numerical simulation method. The maximum contact pressure and the contact number decrease with the increase of correlation length, while the contact width and the contact area are on the contrary. The contact width increases with the increase of root mean square deviation while the maximum contact pressure, the contact area and the contact number decrease.

  9. Associated rules between microstructure characterization parameters and contact characteristic parameters of two cylinders

    Institute of Scientific and Technical Information of China (English)

    周炜; 唐进元; 何艳飞; 廖东日

    2015-01-01

    The contact strength calculation of two curved rough surfaces is a forefront issue of Hertz contact theory and method. Associated rules between rough surface characterization parameters (correlation length, and root mean square deviation) and contact characteristic parameters (contact area, maximum contact pressure, contact number, and contact width) of two rough cylinders are mainly studied. The contact model of rough cylinders is deduced based on GW model. As there is no analytical solution for the pressure distribution equation, an approximate iterative solution method for the pressure distribution is adopted. Furthermore, the quantitative relationships among the correlation length, the root mean square deviation, the asperity radius of curvature and the asperity density are also obtained based on a numerical simulation method. The maximum contact pressure and the contact number decrease with the increase of correlation length, while the contact width and the contact area are on the contrary. The contact width increases with the increase of root mean square deviation while the maximum contact pressure, the contact area and the contact number decrease.

  10. Domain-oriented evaluation method of association rules and its application%面向领域的关联规则评价方法及其应用

    Institute of Scientific and Technical Information of China (English)

    陈鹏; 谭励; 于重重

    2011-01-01

    To deal with the problems of evaluation criteria of support-confidence framework in association rule mining, such as being lack of specific applications analysis and hard to use mining results for decision-making, a method for evaluating domain-oriented association rules is proposed. Taking domain knowledge as a basis, the rules that meet the degrees of technical interest and commercial interest are given out. According to 40 healthy residential survey data in the pilot project of national housing engineer center, some experiments and analysis are carried out. Meanwhile, a data mining system for health living domain is constructed. The system is designed by multi-level software architecture with several modules, including knowledge base management, mining data selection, data preprocessing, domain-driven mining and results evaluation. Consequently, performances of the proposed method are demonstrated by experiments and the application system.%针对关联规则挖掘中,基于支持度-置信度框架的关联规则评价标准存在缺乏具体应用领域的分析,挖掘结果很难用于用户决策等问题,提出一种面向领域关联规则评价方法.该方法以领域知识为基准,发现满足技术兴趣度和商业兴趣度的规则,以国家住宅工程中心40个健康住宅试点项目的实际调查数据为例,进行试验和分析.在此基础上,设计并开发了居住健康领域挖掘系统,该系统采用多层次软件架构,包括知识库管理、挖掘数据选择、数据预处理、领域挖掘和结果评价等功能.实验结果和系统应用结果表明了面向领域关联规则评价方法的有效性.

  11. Law 19.126. It dictate Regulatory standards about Mining of great bearing

    International Nuclear Information System (INIS)

    It statute rules for regulating mining projects of great size, ownership, location, related mining activities, mine closure plan, exploitation concession contract, taxation regime, canon, infractions and sanctions

  12. Fish assemblages and environmental variables associated with hard-rock mining in the Coeur d'Alene River basin, Idaho

    Science.gov (United States)

    Maret, Terry R.; MacCoy, Dorene E.

    2002-01-01

    As part of the U.S. Geological Survey's National Water Quality Assessment Program, fish assemblages, environmental variables, and associated mine densities were evaluated at 18 test and reference sites during the summer of 2000 in the Coeur d'Alene and St. Regis river basins in Idaho and Montana. Multimetric and multivariate analyses were used to examine patterns in fish assemblages and the associated environmental variables representing a gradient of mining intensity. The concentrations of cadmium (Cd), lead (Pb), and zinc (Zn) in water and streambed sediment found at test sites in watersheds where production mine densities were at least 0.2 mines/km2 (in a 500-m stream buffer) were significantly higher than the concentrations found at reference sites. Many of these metal concentrations exceeded Ambient Water Quality Criteria (AWQC) and the Canadian Probable Effect Level guidelines for streambed sediment. Regression analysis identified significant relationships between the production mine densities and the sum of Cd, Pb, and Zn concentrations in water and streambed sediment (r2 = 0.69 and 0.66, respectively; P River basin contained fewer native fish and lower abundances as a result of metal enrichment, not physical habitat degradation. Typically, salmonids were the predominant species at test sites where Zn concentrations exceeded the acute AWQC. Cottids were absent at these sites, which suggests that they are more severely affected by elevated metals than are salmonids.

  13. Identifying the Association Rules between Clinicopathologic Factors and Higher Survival Performance in Operation-Centric Oral Cancer Patients Using the Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Jen-Yang Tang

    2013-01-01

    Full Text Available This study computationally determines the contribution of clinicopathologic factors correlated with 5-year survival in oral squamous cell carcinoma (OSCC patients primarily treated by surgical operation (OP followed by other treatments. From 2004 to 2010, the program enrolled 493 OSCC patients at the Kaohsiung Medical Hospital University. The clinicopathologic records were retrospectively reviewed and compared for survival analysis. The Apriori algorithm was applied to mine the association rules between these factors and improved survival. Univariate analysis of demographic data showed that grade/differentiation, clinical tumor size, pathology tumor size, and OP grouping were associated with survival longer than 36 months. Using the Apriori algorithm, multivariate correlation analysis identified the factors that coexistently provide good survival rates with higher lift values, such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP. Without the OP, the lift values are lower. In conclusion, this hospital-based analysis suggests that early OP and other treatments starting from OP are the key to improving the survival of OSCC patients, especially for early stage tongue cancer with moderate differentiation, having a better survival (>36 months with varied OP approaches.

  14. 一种面向时空数据的关联规则更新算法%An Updating Algorithm for Spatial and Temporal Data Association Rule

    Institute of Scientific and Technical Information of China (English)

    刘伯红; 王娟娟

    2015-01-01

    Most of the present updating association rule algorithms have drawbacks that produce a large number of can‐didate sets ,multiple scans of the database ,and have a little research on the spatial and temporal data .To solve this problem , an updating association rule algorithm based on sliding window is proposed in this paper which encodes access data in memory and then only mines the encoding data in memory directly ,without repeatedly reading the database information .Meanwhile , the algorithm adds a space constraints to filter irrelevant space data when generating candidate sets by frequent itemsets to improve the execution speed and processing performance .Experiment results show that the algorithm has higher mining effi‐ciency and has important application value for intelligent transportation ,command and control ,etc .%现有的关联规则更新算法大多具有产生大量候选项集和多次扫描数据库的弊端,而且对时空数据的研究少之又少。针对此问题,论文提出一种基于滑动窗口的关联规则更新算法,此算法将访问数据进行行程长度编码并存储于存储器中,然后只需对存储器中的编码数据进行挖掘,不需反复读取数据库信息。同时该算法在由频繁项集产生候选项集时添加了空间约束条件,过滤了空间不相关数据,提高了算法的执行速度和处理效能。通过实验论证,此算法具有更高的挖掘效率,对智能交通、指挥控制等领域有着重要的应用价值。

  15. Analysis on Composition Rules of TCM Tranquilizer Based on Association Rules and Clustering Algorithm%基于关联规则与熵聚类的安神类中成药组方规律研究

    Institute of Scientific and Technical Information of China (English)

    吴嘉瑞; 金燕萍; 张晓朦; 张冰; 盛晓光

    2015-01-01

    目的:分析常用安神类中成药的处方用药规律。方法:收集《新编国家中成药》中的安神类药品处方,基于中医传承辅助系统建立处方数据库,采用关联规则apriori算法、复杂系统熵聚类等方法开展研究,确定处方中各种药物的使用频次及药物之间的关联规则等。结果:高频次药物包括茯苓、甘草、当归、麦冬、朱砂等;高频次药物组合包括“当归、茯苓”“茯苓、炒酸枣仁”“甘草、茯苓”等;置信度较高的关联规则包括“牛黄、朱砂”“酸枣仁、茯苓”等,新处方包括“茯苓、炒酸枣仁、熟地黄、五味子、丹参、麦冬、生地黄”等。结论:安神类中成药处方药物多具有养血定志,补气滋阴和重镇安神之功效。%Objective:To explore composition rules of TCM tranquilizer prescriptions.Methods:The tranquilizer prescriptions in“The New National Medicine”were collected to build a database based on traditional Chinese medicine inheritance assist system. The methods of association rules with apriori algorithm and complex system entropy cluster were used to achieve the frequency of medicines and association rules between drugs.Results:The data-mining results indicated that in the tranquilizer prescriptions,the highest frequently used drugs were Poria Cocos Wolff,Radix Glycyrrhizae,Angelica sinensis,Radix Ophiopogonis,Cinnabaris. The most frequent drug combinations were “Angelica sinensis,Poria Cocos Wolff”,“Poria Cocos Wolff,Parched Semen Ziziphi Spinosae”,“Radix Glycyrrhizae,Poria Cocos Wolff”.The drugs with a high degree confidence coefficient of association rules in-cluded “Calculus Bovis,Cinnabaris”,“Semen Ziziphi Spinosae,Poria Cocos Wolff”.The new prescriptions contained Poria Co-cos Wolff,Parched Semen Ziziphi Spinosae,Radix Rehmanniae Preparata,Fructus Schisandrae Chinensis,Radix Salviae Miltior-rhizae,Radix Ophiopogonis,and Radix Rehmanniae

  16. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    Science.gov (United States)

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and

  17. Habituation: a non-associative learning rule design for spiking neurons and an autonomous mobile robots implementation

    International Nuclear Information System (INIS)

    This paper presents a novel bio-inspired habituation function for robots under control by an artificial spiking neural network. This non-associative learning rule is modelled at the synaptic level and validated through robotic behaviours in reaction to different stimuli patterns in a dynamical virtual 3D world. Habituation is minimally represented to show an attenuated response after exposure to and perception of persistent external stimuli. Based on current neurosciences research, the originality of this rule includes modulated response to variable frequencies of the captured stimuli. Filtering out repetitive data from the natural habituation mechanism has been demonstrated to be a key factor in the attention phenomenon, and inserting such a rule operating at multiple temporal dimensions of stimuli increases a robot's adaptive behaviours by ignoring broader contextual irrelevant information. (paper)

  18. Habituation: a non-associative learning rule design for spiking neurons and an autonomous mobile robots implementation.

    Science.gov (United States)

    Cyr, André; Boukadoum, Mounir

    2013-03-01

    This paper presents a novel bio-inspired habituation function for robots under control by an artificial spiking neural network. This non-associative learning rule is modelled at the synaptic level and validated through robotic behaviours in reaction to different stimuli patterns in a dynamical virtual 3D world. Habituation is minimally represented to show an attenuated response after exposure to and perception of persistent external stimuli. Based on current neurosciences research, the originality of this rule includes modulated response to variable frequencies of the captured stimuli. Filtering out repetitive data from the natural habituation mechanism has been demonstrated to be a key factor in the attention phenomenon, and inserting such a rule operating at multiple temporal dimensions of stimuli increases a robot's adaptive behaviours by ignoring broader contextual irrelevant information.

  19. Using Association Rules to Study the Co-evolution of Production & Test Code

    NARCIS (Netherlands)

    Lubsen, Z.; Zaidman, A.; Pinzger, M.

    2009-01-01

    Paper accepted for publication in the proceedings of the 6th International Working Conference on Mining Software Repositories (MSR 2009). Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on the correctness of their

  20. OPTIMAL RULE SELECTION BASED DEFECT CLASSIFICATION SYSTEM USING NAÏVE BAYES CLASSIFIER

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-08-01

    Full Text Available Defect Management process plays key role during Software Testing life cycle, since one of the objectives of testing is to find defects, the discrepancies between actual and expected outcomes need to be logged as defects or bugs or incidents. In order to manage all defects to completion, an organization should establish a process and rules for classification. Software defects are more expensive and time consuming. The cost of finding and correcting defects represents one of the most expensive software development activities. In our previous work, the defect classification was done by association rule mining and decision tree algorithm. Association rule mining algorithm sometimes leads to insignificant rules. So it is very difficult to classify the defects based on these insignificant rules. In order to avoid such issues, we have to optimize the rules before classification based on support and confidence value. In the present work, the rules were extracted from the database using association rule mining. The association rules are optimized using ABC algorithm. Then the defects were classified using Naïve bayes classifier. This performs defect classification in an efficient way. Finally the quality will be assured by using various quality metrics such as defect density, Sensitivity etc.

  1. Response of benthic invertebrate assemblages to metal exposure and bioaccumulation associated with hard-rock mining in northwestern streams, USA

    Science.gov (United States)

    Maret, T.R.; Cain, D.J.; MacCoy, D.E.; Short, T.M.

    2003-01-01

    Benthic macroinvertebrate assemblages, environmental variables, and associated mine density were evaluated during the summer of 2000 at 18 reference and test sites in the Coeur d'Alene and St. Regis River basins, northwestern USA as part of the US Geological Survey's National Water-Quality Assessment Program. Concentrations of Cd, Pb, and Zn in water and (or) streambed sediment at test sites in basins where production mine density was ???0.2 mines/km2 (in a 500-m stream buffer) were significantly higher than concentrations at reference sites. Zn and Pb were identified as the primary contaminants in water and streambed sediment, respectively. These metal concentrations often exceeded acute Ambient Water Quality Criteria for aquatic life and the National Oceanic and Atmospheric Administration Probable Effect Level for streambed sediment. Regression analysis identified significant correlations between production mine density in each basin and Zn concentrations in water and Pb in streambed sediment (r2 = 0.69 and 0.65, p effective in discriminating changes in assemblage structure between reference and mining sites were total number of taxa, number of Ephemeroptera, Plecoptera, and Trichoptera (EPT) taxa, and densities of total individuals, EPT individuals, and metal-sensitive Ephemeroptera individuals.

  2. Geomorphological changes associated with underground coal mining in the Fushun area, northeast China revealed by multitemporal satellite remote sensing data

    Energy Technology Data Exchange (ETDEWEB)

    Dong, Y.F.; Fu, B.H.; Ninomiya, Y. [China Earthquake Administration, Beijing (China). Inst. of Earthquake Science

    2009-07-01

    Fushun is a famous coal-mining city in northeastern China with more than 100 years of history. Long-term underground coal mining has caused serious surface subsidence in the eastern part of the city. In this study, multitemporal and multi-source satellite remote sensing data were used to detect subsidence and geomorphological changes associated with underground coal mining over a 10-year period (1996-2006). A digital elevation model (DEM) was generated through Synthetic Aperture Radar (SAR) interferometry processing using data from a pair of European Remote Sensing Satellite (ERS) SAR images acquired in 1996. In addition, a Shuttle Radar Topography Mission (SRTM) DEM obtained from data in 2000 and an Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) DEM from 2006 were used for this study. The multitemporal DEMs indicated that the maximum vertical displacement due to subsidence was around 13 m from 1996 to 2006. Multitemporal ASTER images showed that the flooded water area associated with subsidence had increased by 1.73 km{sup 2} over the same time period. Field investigations and ground level measurements confirmed that the results obtained from the multitemporal remote sensing data agreed well with ground truth data. This study demonstrates that DEMs derived from multisource satellite remote sensing data can provide a powerful tool to map geomorphological changes associated with underground mining activities.

  3. Application of a New Probabilistic Model for Mining Implicit Associated Cancer Genes from OMIM and Medline

    Directory of Open Access Journals (Sweden)

    Shanfeng Zhu

    2006-01-01

    Full Text Available An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As bio-medical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man. We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html

  4. Application of data mining techniques to a selected business organization with special reference to buying behavior

    Directory of Open Access Journals (Sweden)

    Tejaswini Abhijit Hilage

    2011-12-01

    Full Text Available Data mining is a new concept & an exploration and analysis of large data sets, in order to discover meaningful patterns and rules. Many organizations are now using the data mining techniques to find outmeaningful patterns from the database. The present paper studies how data mining techniques can be apply to the large database. These data mining techniques give certain behavioral pattern from the database. The results which come after analysis of the database are useful for organization. This paper examines theresult after applying association rule mining technique, rule induction technique and Apriori algorithm. These techniques are applied to the database of shopping mall. Market basket analysis is performing by the above mentioned techniques and some important results are found such as buying behavior.

  5. Application of Data Mining Techniques to a Selected Business Organisation with Special Reference to Buying Behaviour

    CERN Document Server

    Hilage, Tejaswini

    2011-01-01

    Data mining is a new concept & an exploration and analysis of large data sets, in order to discover meaningful patterns and rules. Many organizations are now using the data mining techniques to find out meaningful patterns from the database. The present paper studies how data mining techniques can be apply to the large database. These data mining techniques give certain behavioral pattern from the database. The results which come after analysis of the database are useful for organization. This paper examines the result after applying association rule mining technique, rule induction technique and Apriori algorithm. These techniques are applied to the database of shopping mall. Market basket analysis is performing by the above mentioned techniques and some important results are found such as buying behavior.

  6. Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

    Science.gov (United States)

    Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

    2016-03-01

    Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

  7. Pushing Multiple Convertible Constrains into Frequent Itemsets Mining

    Institute of Scientific and Technical Information of China (English)

    SONG Baoli; QIN Zheng

    2006-01-01

    Constraint pushing techniques have been developed for mining frequent patterns and association rules. However, multiple constraints cannot be handled with existing techniques in frequent pattern mining. In this paper, a new algorithm MCFMC (mining complete set of frequent itemsets with multiple constraints) is introduced. The algorithm takes advantage of the fact that a convertible constraint can be pushed into mining algorithm to reduce mining research spaces. By using a sample database, the algorithm develops techniques which select an optimal method based on a sample database to convert multiple constraints into multiple convertible constraints, disjoined by conjunction and/or, and then partition these constraints into two parts. One part is pushed deep inside the mining process to reduce the research spaces for frequent itemsets, the other part that cannot be pushed in algorithm is used to filter the complete set of frequent itemsets and get the final result. Results from our detailed experiment show the feasibility and effectiveness of the algorithm.

  8. Pattern Generation for Complex Data Using Hybrid Mining

    Directory of Open Access Journals (Sweden)

    Manish Kumar

    2013-07-01

    Full Text Available Combined mining is a hybrid mining approach for mining informative patterns from single or multipledata-sources, multiple-features extraction and applying multiple-methods as per the requirements. Datamining applications often involve complex data likemultiple heterogeneous data sources, different userpreference and create decision-making actions. Thecomplete useful information may not be obtained byusing single data mining method in the form of informative patterns as that would consume more time andspace. This paper implements hybrid or combined mining approach that applies Lossy-counting algorithmon each data-source to get the frequent data item-sets and then generates the combined association rules.Applying multi-feature approach, we generate incremental pair patterns and incremental cluster patterns.In multi-method combined mining approach, FP-growthand Bayesian Belief Network are combined togenerate classifier to get more informative knowledge. This paper uses two different data-sets to get moreuseful knowledge and compare the results.

  9. Disease Prediction in Data Mining Technique – A Survey

    Directory of Open Access Journals (Sweden)

    S. Sudha

    2013-01-01

    Full Text Available Data mining is defined as sifting through very large amounts of data for useful information. Some of the most important and popular data mining techniques are association rules, classification, clustering, prediction and sequential patterns. Data mining techniques are used for variety of applications. In health care industry, data mining plays an important role for predicting diseases. For detecting a disease number of tests should be required from the patient. But using data mining technique the number of test should be reduced. This reduced test plays an important role in time and performance. This technique has an advantages and disadvantages. This research paper analyzes how data mining techniques are used for predicting different types of diseases. This paper reviewed the research papers which mainly concentrated on predicting heart disease, Diabetes and Breast cancer.

  10. Application of association rules in stock plate cointegration analysis%关联规则在股票板块联动分析中的应用

    Institute of Scientific and Technical Information of China (English)

    张建林; 周超良

    2013-01-01

    Apriori algorithm is a classical algorithm of association rules mining, in view of the deficiency of the Apriori algorithm some improvements are made. The new algorithm uses vertical data format, and improves the method of connecting the item to generate candidate. In order to study the stock plate cointegration phenomenon, the improved algorithm is applied to the stock plate index analysis. Experimental result shows that the improved algorithm can quickly find the corelation of the stock plate, it has a certain guiding role in the stock market analysis and investment decisions.%Apriori算法是关联规则挖掘中的经典算法,针对Apriori算法的不足进行了一些改进.新算法使用垂直数据格式,并改进了产生候选项的连接方法.为了研究股票板块的联动关系,将改进算法应用于股票板块指数分析中.实验结果表明,改进算法能快速发现板块之间的联动关系,对股市分析和投资决策有一定的指导作用.

  11. Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases

    OpenAIRE

    d'Amato, Claudia; Staab, Steffen; Tettamanzi, Andrea G. B.; Van Minh, Tran; Gandon, Fabien

    2016-01-01

    International audience In the Semantic Web context, OWL ontologies represent the con-ceptualization of domains of interest while the corresponding as-sertional knowledge is given by the heterogeneous Web resources referring to them. Being strongly decoupled, ontologies and assertion can be out-of-sync. An ontology can be incomplete, noisy and sometimes inconsistent with regard to the actual usage of its conceptual vocabulary in the assertions. Data mining can support the discovery of hidde...

  12. An Efficient TDTR Algorithm for Mining Frequent Itemsets

    Directory of Open Access Journals (Sweden)

    D.Kerana Hanirex

    2013-01-01

    Full Text Available Research on mining frequent itemsets is one the emerging task in data mining.The purchasing of one product when another product is purchased represents an association rule. Association rules are useful for analyzing the customer behavior. It takes an important part in shopping basket data analysis, clustering. The FP-Growth algorithm is the basic algorithm for mining association rules. This paper presents an efficient algorithm for mining frequent itemsets using Two Dimensional Transactions Reduction(TDTR approach which reduces the original database(D transactions to the reduced data base transactions D1 based on the min_sup count. Then for each item it finds the number of transactions that the item present and hence find the largest frequent itemset using the two dimensional approach. Using the largest item set property ,it finds the subset of frequent item sets. Thus TDTR approach reduces the number of scans in the database and hence improve the efficiency & accuracy by finding the number of association rules and reduces time to find the rules.

  13. 基于增量队列的在全置信度下的关联挖掘%Association Mining on Massive Text under Full Confidence Based on Incremental Queue

    Institute of Scientific and Technical Information of China (English)

    刘炜

    2015-01-01

    关联挖掘是一种重要的数据分析方法, 提出了一种在全置信度下的增量队列关联挖掘算法模型, 在传统的 FP-Growth 及 PF-Tree 算法的关联挖掘中使用了全置信度规则, 算法的适应性得到提升, 由此提出FP4W-Growth 算法并运用到对文本数据的关联计算以及对增量式的数据进行关联性挖掘的研究中, 通过实验验证了此算法及模型的可行性与优化性, 为在庞大的文本数据中发现隐藏着的先前未知的并潜在有用的新信息和新模式, 提供了科学的决策方法.%Association mining is an important data analysis method, this article proposes an incremental queue association mining algorithm model under full confidence,using the full confidence rules in the traditional FP-Growth and PF-Tree association mining algorithm can improve the algorithm adaptability. Thus, the article proposes FP4W-Growth algorithm, and applies this algotithm to the association calculation of text data and association mining of incremental data. Then this paper conducted verification experiment. The experimental results show the feasibility of this algorithm and model. The article provides a scientific approach to finding hidden but useful information and patterns from large amount of text data.

  14. Simple rules to modify pre-planned paths and improve gross robot motions associated with pick & place assembly tasks

    OpenAIRE

    Sanders, David; Tewkesbury, Giles; Graham-Jones, J.

    2011-01-01

    Purpose – This paper aims to describe real time improvements to the performance and trajectories of robots for which paths had already been planned by some means, automatic or otherwise. The techniques are applied to industrial robots during the gross motions associated with pick and place tasks. Simple rules for path improvement are described. Design/methodology/approach – The dynamics of the manipulator in closed form Lagrange equations are used to represent the dynamics by a set of second-...

  15. Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies

    Directory of Open Access Journals (Sweden)

    Lehr Thorsten

    2011-03-01

    Full Text Available Abstract Background Several methods have been presented for the analysis of complex interactions between genetic polymorphisms and/or environmental factors. Despite the available methods, there is still a need for alternative methods, because no single method will perform well in all scenarios. The aim of this work was to evaluate the performance of three selected rule based classifier algorithms, RIPPER, RIDOR and PART, for the analysis of genetic association studies. Methods Overall, 42 datasets were simulated with three different case-control models, a varying number of subjects (300, 600, SNPs (500, 1500, 3000 and noise (5%, 10%, 20%. The algorithms were applied to each of the datasets with a set of algorithm-specific settings. Results were further investigated with respect to a the Model, b the Rules, and c the Attribute level. Data analysis was performed using WEKA, SAS and PERL. Results The RIPPER algorithm discovered the true case-control model at least once in >33% of the datasets. The RIDOR and PART algorithm performed poorly for model detection. The RIPPER, RIDOR and PART algorithm discovered the true case-control rules in more than 83%, 83% and 44% of the datasets, respectively. All three algorithms were able to detect the attributes utilized in the respective case-control models in most datasets. Conclusions The current analyses substantiate the utility of rule based classifiers such as RIPPER, RIDOR and PART for the detection of gene-gene/gene-environment interactions in genetic association studies. These classifiers could provide a valuable new method, complementing existing approaches, in the analysis of genetic association studies. The methods provide an advantage in being able to handle both categorical and continuous variable types. Further, because the outputs of the analyses are easy to interpret, the rule based classifier approach could quickly generate testable hypotheses for additional evaluation. Since the algorithms are

  16. Mechanisms underlying the rules for associative plasticity at adult human neocortical synapses

    NARCIS (Netherlands)

    M.B. Verhoog (Matthijs); N.A. Goriounova (Natalia); J. Obermayer (Joshua); J. Stroeder (Jasper); J.J. Johannes Hjorth (J.); G. Testa-Silva (Guilherme); J.C. Baayen; C.P.J. de Kock (Christiaan); R.M. Meredith (Rhiannon); H.D. Mansvelder (Huibert)

    2013-01-01

    textabstractThe neocortex in our brain stores long-term memories by changing the strength of connections between neurons. To date, the rules and mechanisms that govern activity-induced synaptic changes at human cortical synapses are poorly understood and have not been studied directly at a cellular

  17. Energy analysis of stability on shallow tunnels based on non-associated flow rule and non-linear failure criterion

    Institute of Scientific and Technical Information of China (English)

    张佳华; 王成洋

    2015-01-01

    On the basis of upper bound theorem, non-associated flow rule and non-linear failure criterion were considered together. The modified shear strength parameters of materials were obtained with the help of the tangent method. Employing the virtual power principle and strength reduction technique, the effects of dilatancy of materials, non-linear failure criterion, pore water pressure, surface loads and buried depth, on the stability of shallow tunnel were studied. In order to validate the effectiveness of the proposed approach, the solutions in the present work agree well with the existing results when the non-associated flow rule is reduced to the associated flow rule and the non-linear failure criterion is degenerated to the linear failure criterion. Compared with dilatancy of materials, the non-linear failure criterion exerts greater impact on the stability of shallow tunnels. The safety factor of shallow tunnels decreases and the failure surface expands outward when the dilatancy coefficient decreases. While the increase of nonlinear coefficient, the pore water pressure coefficient, the surface load and the buried depth results in the small safety factor. Therefore, the dilatancy as well as non-linear failure criterion should be taken into account in the design of shallow tunnel supporting structure. The supporting structure must be reinforced promptly to prevent potential mud from gushing or collapse accident in the areas with abundant pore water, large surface load or buried depth.

  18. Natural radioactivity level of associated bone-coal mining area in Zhejiang province

    International Nuclear Information System (INIS)

    The geographic distribution, γ-radiation level and specific activity of radionuclides of the bone-coal mines in Zhejiang Province were reported. The weighted average of γ-radiation dose rate of the bone-coal mines is 566 nGy/h for 107 main bone-coal mines. The weighted mean activity of 238U, 226Ra, 232Th and 40K in the samples are 949, 918, 34 and 554 Bq/kg for 171 samples of bone-coal, respectively. (authors)

  19. Natural radioactivity level of associated bone-coal mining area in Zhejiang Province

    Institute of Scientific and Technical Information of China (English)

    YE Ji-Da; ZHENG Hui-Di; SONG Wei-Li; ZENG Guang-Jian; WANG Sha-Ling; WU Zong-Mei

    2005-01-01

    The geographic distribution, γ-radiation level and specific activity of radionuclides of the bone-coal mines in Zhejiang Province were reported. The weighted average of γ-radiation dose rate of the bone-coal mines is 566 nGy/h for 107 main bone-coal mines. The weighted mean activity of 238U, 226Ra, 232Th and 40K in the samples are 949, 918, 34 and 554 Bq/kg for 171 samples of bone-coal, respectively.

  20. Application of data mining techniques for nuclear data and instrumentation

    International Nuclear Information System (INIS)

    Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly

  1. Evaluation of long-lived Alpha (llα) activity associated with respirable dust in the underground Narwapahar uranium mine in India

    International Nuclear Information System (INIS)

    Uranium mining activities, in general, produce dust particle of different size in and around the location of operations being actually carried out. The most prominent being that of respirable size. Meticulously, the airborne uranium ore dust in underground uranium mines contains long-lived alpha (llα) emitters of the natural uranium decay chain. The main mining operations such as drilling, blasting, mucking, loading-dumping etc. generate ore dust of different particle size which becomes dispersed in the mine environment and gives rise to an inhalation hazard. The present work has been done in underground U mine at Narwapahar (ore grade about 0.043 % U3O8). The objective of the present study is to estimate the long lived alpha activity associated with the airborne respirable particulate in the underground mine at Narwapahar

  2. Spatiotemporal Data Mining: Issues, Tasks And Applications

    Directory of Open Access Journals (Sweden)

    K.Venkateswara Rao

    2012-03-01

    Full Text Available Spatiotemporal data usually contain the states of an object, an event or a position in space over a period of time. Vast amount of spatiotemporal data can be found in several application fields such as trafficmanagement, environment monitoring, and weather forecast. These datasets might be collected at different locations at various points of time in different formats. It poses many challenges in representing, processing, analysis and mining of such datasets due to complex structure of spatiotemporal objects and the relationships among them in both spatial and temporal dimensions. In this paper, the issues and challenges related to spatiotemporal data representation, analysis, mining and visualization of knowledge are presented. Various kinds of data mining tasks such as association rules, classification clustering for discovering knowledge from spatiotemporal datasets are examined and reviewed. System functional requirements for such kind of knowledge discovery and database structure are discussed. Finally applications of spatiotemporal data mining are presented.

  3. 76 FR 70075 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines

    Science.gov (United States)

    2011-11-10

    ... Period On August 31, 2011 (76 FR 54163), MSHA published a proposed rule, Proximity Detection Systems for... Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health Administration, Labor. ACTION... addressing Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines....

  4. Direct Characterization of Airborne Particles Associated with Arsenic-rich Mine Tailings: Particle Size Mineralogy and Texture

    Energy Technology Data Exchange (ETDEWEB)

    M Corriveau; H Jamieson; M Parsons; J Campbell; A Lanzirotti

    2011-12-31

    Windblown and vehicle-raised dust from unvegetated mine tailings can be a human health risk. Airborne particles from As-rich abandoned Au mine tailings from Nova Scotia, Canada have been characterized in terms of particle size, As concentration, As oxidation state, mineral species and texture. Samples were collected in seven aerodynamically fractionated size ranges (0.5-16 {micro}m) using a cascade impactor deployed at three tailings fields. All three sites are used for recreational activities and off-road vehicles were racing on the tailings at two mines during sample collection. Total concentrations of As in the <8 {micro}m fraction varied from 65 to 1040 ng/m{sup 3} of air as measured by proton-induced X-ray emission (PIXE) analysis. The same samples were analysed by synchrotron-based microfocused X-ray absorption near-edge spectroscopy ({micro}XANES) and X-ray diffraction ({micro}XRD) and found to contain multiple As-bearing mineral species, including Fe-As weathering products. The As species present in the dust were similar to those observed in the near-surface tailings. The action of vehicles on the tailings surface may disaggregate material cemented with Fe arsenate and contribute additional fine-grained As-rich particles to airborne dust. Results from this study can be used to help assess the potential human health risks associated with exposure to airborne particles from mine tailings.

  5. Groin pain associated with ultrasound finding of inguinal canal posterior wall deficiency in Australian Rules footballers

    OpenAIRE

    Orchard, J. W.; Read, J. W.; Neophyton, J.; Garlick, D

    1998-01-01

    OBJECTIVES: To investigate the prevalence of inguinal canal posterior wall deficiency (sports hernia) in professional Australian Rules footballers using an ultrasound technique and correlate the results with the clinical symptom of groin pain. METHODS: Thirty five professional Australian footballers with and without groin pain were investigated blind with a dynamic high resolution ultrasound technique for presence of posterior wall deficiency. RESULTS: Fourteen players had a history of ...

  6. Spatial-temporal analysis and projection of extreme particulate matter (PM10 and PM2.5) levels using association rules: A case study of the Jing-Jin-Ji region, China

    Science.gov (United States)

    Qin, Shanshan; Liu, Feng; Wang, Chen; Song, Yiliao; Qu, Jiansheng

    2015-11-01

    The Jing-Jin-Ji region of Northern China has experienced serious extreme PM concentrations, which could exert considerable negative impacts on human health. However, only small studies have focused on extreme PM concentrations. Therefore, joint regional PM research and air pollution control has become an urgent issue in this region. To characterize PM pollution, PM10 and PM2.5 hourly samples were collected from 13 cities in Jing-Jin-Ji region for one year. This study initially analyzed extreme PM data using the Apriori algorithm to mine quantitative association rules in PM spatial and temporal variations and intercity influences. The results indicate that 1) the association rules of intercity PM are distinctive, and do not completely rely on their spatial distributions; 2) extreme PM concentrations frequently occur in southern cities, presenting stronger spatial and temporal associations than in northern cities; 3) the strength of the spatial and temporal associations of intercity PM2.5 are more substantial than those of intercity PM10.

  7. A Frame Work for Frequent Pattern Mining Using Dynamic Function

    Directory of Open Access Journals (Sweden)

    Sunil Joshi

    2011-05-01

    Full Text Available Discovering frequent objects (item sets, sequential patterns is one of the most vital fields in data mining. It is well understood that it require running time and memory for defining candidates and this is the motivation for developing large number of algorithm. Frequent patterns mining is the paying attention research issue in association rules analysis. Apriori algorithm is a standard algorithm of association rules mining. Plenty of algorithms for mining association rules and their mutations are projected on the foundation of Apriori Algorithm. Most of the earlier studies adopted Apriori-like algorithms which are based on generate-and-test candidates theme and improving algorithm approach and formation but no one give attention to the structure of database. Several modifications on apriori algorithms are focused on algorithm Strategy but no one-algorithm emphasis on least transaction and more attribute representation of database. We presented a new research trend on frequent pattern mining in which generate Transaction pair to lighten current methods from the traditional blockage, providing scalability to massive data sets and improving response time. In order to mine patterns in database with more columns than rows, we proposed a complete framework for the frequent pattern mining. A simple approach is if we generate pair of transaction instead of item id where attributes are much larger then transaction so result is very fast. Newly, different works anticipated a new way to mine patterns in transposed databases where there is a database with thousands of attributes but merely tens of stuff. We suggest a novel dynamic algorithm for frequent pattern mining in which generate transaction pair and for generating frequent pattern we find out by longest common subsequence using dynamic function. Our solutions give result more rapidly. A quantitative investigation of these tradeoffs is conducted through a wide investigational study on artificial and

  8. A fuzzy hill-climbing algorithm for the development of a compact associative classifier

    Science.gov (United States)

    Mitra, Soumyaroop; Lam, Sarah S.

    2012-02-01

    Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.

  9. Use of Data Mining to Predict the Risk Factors Associated With Osteoporosis and Osteopenia in Women.

    Science.gov (United States)

    Pedrassani de Lira, Carolina; Toniazzo de Abreu, Larissa Letieli; Veiga Silva, Ana Carolina; Mazzuchello, Leandro Luiz; Rosa, Maria Inês; Comunello, Eros; de Souza Pires, Maria Marlene; Ceretta, Luciane Bisognin; Martins, Paulo João; Simões, Priscyla Waleska

    2016-08-01

    Osteoporosis has recently been acknowledged as a major public health issue in developed countries because of the decrease in the quality of life of the affected person and the increase in public costs due to complete or partial physical disability. The aim of this study was to use the J48 algorithm as a classification task for data from women exhibiting changes in bone densitometry. The study population included all patients treated at the diagnostic center for bone densitometry since 2010. Census sample data collection was conducted as all elements of the population were included in the sample. The service in question provides care to patients via the Brazilian Unified Health System and private plans. The results of the classification task were analyzed using the J48 algorithm, and among the dichotomized variables associated with a diagnosis of osteoporosis, the mean accuracy was 74.0 (95% confidence interval [CI], 61.0-68.0) and the mean area under the curve of the receiver operating characteristic (ROC) curve was 0.65 (95% CI, 0.64-0.66), with a mean sensitivity of 76.0 (95% CI, 76.0-76.0) and a mean specificity of 48.0 (95% CI, 46.0-49.0). The analyzed results showed higher values of sensitivity, accuracy, and curve of the ROC area in experiments conducted with individuals with osteoporosis. Most of the generated rules were consistent with the literature, and the few differences might serve as hypotheses for further studies. PMID:27270629

  10. Direct characterization of airborne particles associated with arsenic-rich mine tailings: Particle size, mineralogy and texture

    Energy Technology Data Exchange (ETDEWEB)

    Corriveau, M.C. [Department of Geological Sciences and Geological Engineering, Queen' s University, Kingston, Ontario, K7L 3N6 (Canada); Jamieson, H.E., E-mail: jamieson@geol.queensu.ca [Department of Geological Sciences and Geological Engineering, Queen' s University, Kingston, Ontario, K7L 3N6 (Canada); Parsons, M.B. [Geological Survey of Canada (Atlantic), Natural Resources Canada, Dartmouth, Nova Scotia, B2Y 4A2 (Canada); Campbell, J.L. [Guelph-Waterloo Physics Institute, University of Guelph, Guelph, Ontario, N1G 2W1 (Canada); Lanzirotti, A. [Center for Advanced Radiation Sources, University of Chicago, Chicago, IL 60637 (United States)

    2011-09-15

    Highlights: > Airborne dust from As-rich gold mine tailings used for recreation was collected. > Total concentrations of arsenic in the <8 {mu}m fraction varied from 65 to 1040 ng/m{sup 3}. > Multiple As minerals in dust are comparable with near-surface tailings samples. - Abstract: Windblown and vehicle-raised dust from unvegetated mine tailings can be a human health risk. Airborne particles from As-rich abandoned Au mine tailings from Nova Scotia, Canada have been characterized in terms of particle size, As concentration, As oxidation state, mineral species and texture. Samples were collected in seven aerodynamically fractionated size ranges (0.5-16 {mu}m) using a cascade impactor deployed at three tailings fields. All three sites are used for recreational activities and off-road vehicles were racing on the tailings at two mines during sample collection. Total concentrations of As in the <8 {mu}m fraction varied from 65 to 1040 ng/m{sup 3} of air as measured by proton-induced X-ray emission (PIXE) analysis. The same samples were analysed by synchrotron-based microfocused X-ray absorption near-edge spectroscopy ({mu}XANES) and X-ray diffraction ({mu}XRD) and found to contain multiple As-bearing mineral species, including Fe-As weathering products. The As species present in the dust were similar to those observed in the near-surface tailings. The action of vehicles on the tailings surface may disaggregate material cemented with Fe arsenate and contribute additional fine-grained As-rich particles to airborne dust. Results from this study can be used to help assess the potential human health risks associated with exposure to airborne particles from mine tailings.

  11. Application of text mining for customer evaluations in commercial banking

    Science.gov (United States)

    Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

    2015-07-01

    Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.

  12. Regional scale selenium loading associated with surface coal mining, Elk Valley, British Columbia, Canada.

    Science.gov (United States)

    Wellen, Christopher C; Shatilla, Nadine J; Carey, Sean K

    2015-11-01

    Selenium (Se) concentrations in surface water downstream of surface mining operations have been reported at levels in excess of water quality guidelines for the protection of wildlife. Previous research in surface mining environments has focused on downstream water quality impacts, yet little is known about the fundamental controls on Se loading. This study investigated the relationship between mining practices, stream flows and Se concentrations using a SPAtially Referenced Regression On Watershed attributes (SPARROW) model. This work is part of a R&D program examining the influence of surface coal mining on hydrological and water quality responses in the Elk Valley, British Columbia, Canada, aimed at informing effective management responses. Results indicate that waste rock volume, a product of mining activity, accounted for roughly 80% of the Se load from the Elk Valley, while background sources accounted for roughly 13%. Wet years were characterized by more than twice the Se load of dry years. A number of variables regarding placement of waste rock within the catchments, length of buried streams, and the construction of rock drains did not significantly influence the Se load. The age of the waste rock, the proportion of waste rock surface reclaimed, and the ratio of waste rock pile side area to top area all varied inversely with the Se load from watersheds containing waste rock. These results suggest operational practices that are likely to reduce the release of Se to surface waters.

  13. Regional scale selenium loading associated with surface coal mining, Elk Valley, British Columbia, Canada.

    Science.gov (United States)

    Wellen, Christopher C; Shatilla, Nadine J; Carey, Sean K

    2015-11-01

    Selenium (Se) concentrations in surface water downstream of surface mining operations have been reported at levels in excess of water quality guidelines for the protection of wildlife. Previous research in surface mining environments has focused on downstream water quality impacts, yet little is known about the fundamental controls on Se loading. This study investigated the relationship between mining practices, stream flows and Se concentrations using a SPAtially Referenced Regression On Watershed attributes (SPARROW) model. This work is part of a R&D program examining the influence of surface coal mining on hydrological and water quality responses in the Elk Valley, British Columbia, Canada, aimed at informing effective management responses. Results indicate that waste rock volume, a product of mining activity, accounted for roughly 80% of the Se load from the Elk Valley, while background sources accounted for roughly 13%. Wet years were characterized by more than twice the Se load of dry years. A number of variables regarding placement of waste rock within the catchments, length of buried streams, and the construction of rock drains did not significantly influence the Se load. The age of the waste rock, the proportion of waste rock surface reclaimed, and the ratio of waste rock pile side area to top area all varied inversely with the Se load from watersheds containing waste rock. These results suggest operational practices that are likely to reduce the release of Se to surface waters. PMID:26136156

  14. Analysis of roof and pillar failure associated with weak floor at a limestone mine

    Institute of Scientific and Technical Information of China (English)

    Murphy Michael M.; Ellenberger John L.; Esterhuizen Gabriel S.; Miller Tim

    2016-01-01

    A limestone mine in Ohio has had instability problems that have led to massive roof falls extending to the surface. This study focuses on the role that weak, moisture-sensitive floor has in the instability issues. Previous NIOSH research related to this subject did not include analysis for weak floor or weak bands and recommended that when such issues arise they should be investigated further using a more advanced analysis. Therefore, to further investigate the observed instability occurring on a large scale at the Ohio mine, FLAC3D numerical models were employed to demonstrate the effect that a weak floor has on roof and pillar stability. This case study will provide important information to limestone mine operators regarding the impact of weak floor causing the potential for roof collapse, pillar failure, and subsequent subsidence of the ground surface.

  15. Regulatory issues associated with exclusion, exemption, and clearance related to the mining and minerals processing industries

    International Nuclear Information System (INIS)

    The concepts of exclusion, exemption and clearance have been established in international recommendations and, standards for radiation protection and the management of radioactive waste in recent years. The consistent application of these concepts has given rise to various problems in different spheres of use. This is particularly the case in the mining and minerals processing industries dealing with materials exhibiting elevated concentrations of naturally occurring radionuclides. This paper takes the South African mining industry as an example and highlights some of the issues that have arisen in applying these concepts within a regulatory control regime. (author)

  16. Human mercury exposure associated with small-scale gold mining in Burkina Faso.

    OpenAIRE

    Tomicic Catherine; Vernez David; Belem Tounaba; Berode Michèle

    2011-01-01

    PURPOSE: In Burkina Faso, gold ore is one of the main sources of income for an important part of the active population. Artisan gold miners use mercury in the extraction, a toxic metal whose human health risks are well known. The aim of the present study was to assess mercury exposure as well as to understand the exposure determinants of gold miners in Burkinabe small-scale mines.METHODS: The examined gold miners' population on the different selected gold mining sites was composed by persons ...

  17. Associate editors' foreword: entrepreneurship in health education and health promotion: five cardinal rules.

    Science.gov (United States)

    Cottrell, Randall R; Cooper, Hanna

    2009-07-01

    A career in health education or health promotion (HE/HP) can be developed in many ways. In past editions of this department, career development has been discussed in relation to distance (Balonna, 2001), consulting (Bookbinder, 2001), certifications (Hayden, 2005), graduate school (Cottrell & Hayden, 2007), and many other topics. This article looks at a less traditional means of career development-entrepreneurship. Health education is a field ripe with opportunities for consulting and for selling health-related products and services. Entrepreneurship can not only create financial rewards but can also provide high visibility and networking contacts that can advance one's career. This article combines both theory and practical applications to assist readers in developing entrepreneurial activities. The authors are experienced in entrepreneurial development and use that expertise to provide relevant examples and develop a framework using "five cardinal rules" for establishing an entrepreneurial enterprise in HE/HP. PMID:19574585

  18. Application of association rule based on concept lattice for scheduling manage-ment%基于概念格的关联规则在排产管理的应用

    Institute of Scientific and Technical Information of China (English)

    张晓; 龙伟; 卢斌

    2014-01-01

    Aimed at the problem that production data volume has increased dramatically in automotive stamping,how to use association rule mining based on concept lattice in production information data is explored. The concept lattice is structured with the strategy of horizontal split and vertical merge, and the association rules are generated by transforming ordinary concept lattice into quantitative concept lattice. The example results show that this method has high mining effi-ciency and the hidden information among data can be discovered effectively. The theoretical basis of scheduling guidance for companies is provided, and the purpose of optimizing scheduling is also realized. Furthermore, effective analytical results are obtained in the practical application.%针对汽车冲压厂生产数据量急剧增加的问题,研究了如何在冲压厂生产信息数据中运用基于概念格的关联规则挖掘技术,采用横向拆分与纵向合并的策略构造概念格,将普通概念格转化为量化概念格来生成关联规则。实验结果表明,该方法具有较高的挖掘效率,且能有效地寻找数据间隐藏的信息。从而为企业排产管理提供理论依据,达到优化排产的目的,在实际应用中取得了良好的分析效果。

  19. Displaced rocks, strong motion, and the mechanics of shallow faulting associated with the 1999 Hector Mine, California, earthquake

    Science.gov (United States)

    Michael, A.J.; Ross, S.L.; Stenner, H.D.

    2002-01-01

    The paucity of strong-motion stations near the 1999 Hector Mine earthquake makes it impossible to make instrumental studies of key questions about near-fault strong-motion patterns associated with this event. However, observations of displaced rocks allow a qualitative investigation of these problems. By observing the slope of the desert surface and the frictional coefficient between these rocks and the desert surface, we estimate the minimum horizontal acceleration needed to displace the rocks. Combining this information with observations of how many rocks were displaced in different areas near the fault, we infer the level of shaking. Given current empirical shaking attenuation relationships, the number of rocks that moved is slightly lower than expected; this implies that slightly lower than expected shaking occurred during the Hector Mine earthquake. Perhaps more importantly, stretches of the fault with 4 m of total displacement at the surface displaced few nearby rocks on 15?? slopes, suggesting that the horizontal accelerations were below 0.2g within meters of the fault scarp. This low level of shaking suggests that the shallow parts of this rupture did not produce strong accelerations. Finally, we did not observe an increased incidence of displaced rocks along the fault zone itself. This suggests that, despite observations of fault-zone-trapped waves generated by aftershocks of the Hector Mine earthquake, such waves were not an important factor in controlling peak ground acceleration during the mainshock.

  20. Spatial distribution of environmental risk associated to a uranium abandoned mine (Central Portugal)

    Science.gov (United States)

    Antunes, I. M.; Ribeiro, A. F.

    2012-04-01

    The abandoned uranium mine of Canto do Lagar is located at Arcozelo da Serra, central Portugal. The mine was exploited in an open pit and produced about 12430Kg of uranium oxide (U3O8), between 1987 and 1988. The dominant geological unit is the porphyritic coarse-grained two-mica granite, with biotite>muscovite. The uranium deposit consists of two gaps crushing, parallel to the coarse-grained porphyritic granite, with average direction N30°E, silicified, sericitized and reddish jasperized, with a width of approximately 10 meters. These gaps are accompanied by two thin veins of white quartz, 70°-80° WNW, ferruginous and jasperized with chalcedony, red jasper and opal. These veins are about 6 meters away from each other. They contain secondary U-phosphates phases such as autunite and torbernite. Rejected materials (1000000ton) were deposited on two dumps and a lake was formed in the open pit. To assess the environmental risk of the abandoned uranium mine of Canto do Lagar, were collected and analysed 70 samples on stream sediments, soils and mine tailings materials. The relation between samples composition were tested using the Principal Components Analysis (PCA) (multivariate analysis) and spatial distribution using Kriging Indicator. The spatial distribution of stream sediments shows that the probability of expression for principal component 1 (explaining Y, Zr, Nb, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Hf, Th and U contents), decreases along SE-NW direction. This component is explained by the samples located inside mine influence. The probability of expression for principal component 2 (explaining Be, Na, Al, Si, P, K, Ca, Ti, Mn, Fe, Co, Ni, Cu, As, Rb, Sr, Mo, Cs, Ba, Tl and Bi contents), increases to middle stream line. This component is explained by the samples located outside mine influence. The spatial distribution of soils, shows that the probability of expression for principal component 1 (explaining Mg, P, Ca, Ge, Sr, Y, Zr, La, Ce, Pr

  1. Compass: A hybrid method for clinical and biobank data mining

    DEFF Research Database (Denmark)

    Krysiak-Baltyn, Konrad; Petersen, Thomas Nordahl; Audouze, Karine Marie Laure;

    2014-01-01

    We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply...... Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically...... significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we...

  2. 基于数据挖掘的IDS系统数据规则库改进设计%Improvement and Design of IDS System Data Rules Bank Based on Data Mining

    Institute of Scientific and Technical Information of China (English)

    林建伟; 郭彩虹; 许臻

    2013-01-01

    Network attacks is becoming more and more frequent, the existing IDS systems detect is lack of precision, and the defense of the IDS system database has been unable to meet the needs of intrusion prevention, according to these situations. Using C4.5 algorithm of data mining techniques and sequence pattern mining algorithms to data mining of data packets obtained by system, of which the C4.5 algorithm is running for the data characterized by the description of the data system defects and known attack methods, and sequence pattern mining algorithms is running for the system call sequence data, whose goal is to improve the accuracy of the data analysis. The experiments show that these improvements of the IDS system data rules base have greatly improved the accuracy of the intrusion data analysis of system.%针对目前网络攻击越来越频繁,现有的IDS系统检测分析不够精准,IDS系统数据库的防御已经无法满足入侵防御需求的现状.采用数据挖掘技术中的C4.5算法和序列模式挖掘算法,对系统的获取的数据包进行数据挖掘,其中C4.5算法针对的是描述系统缺陷和已知攻击方法的数据,而序列模式挖掘算法针对的是系统调用序列数据,提高数据分析的准确性.实验表明,本文对IDS系统数据规则库的改进,大大提高了系统对入侵数据分析的准确性.

  3. Numerical analysis and geotechnical assessment of mine scale model

    Institute of Scientific and Technical Information of China (English)

    Khanal Manoj; Adhikary Deepak; Balusu Rao

    2012-01-01

    Various numerical methods are available to model,simulate,analyse and interpret the results; however a major task is to select a reliable and intended tool to perform a realistic assessment of any problem.For a model to be a representative of the realistic mining scenario,a verified tool must be chosen to perform an assessment of mine roof support requirement and address the geotechnical risks associated with longwall mining.The dependable tools provide a safe working environment,increased production,efficient management of resources and reduce environmental impacts of mining.Although various methods,for example,analytical,experimental and empirical are being adopted in mining,in recent days numerical tools are becoming popular due to the advancement in computer hardware and numerical methods.Empirical rules based on past experiences do provide a general guide,however due to the heterogeneous nature of mine geology (i.e.,none of the mine sites are identical),numerical simulations of mine site specific conditions would lend better insights into some underlying issues.The paper highlights the use of a continuum mechanics based tool in coal mining with a mine scale model.The continuum modelling can provide close to accurate stress fields and deformation.The paper describes the use of existing mine data to calibrate and validate the model parameters,which then are used to assess geotechnical issues related with installing a new high capacity longwall mine at the mine site.A variety of parameters,for example,chock convergences,caveability of overlying sandstones,abutment and vertical stresses have been estimated.

  4. A Recent Review on XML data mining and FFP

    Directory of Open Access Journals (Sweden)

    Amit Kumar Mishra, Hitesh Gupta

    2013-01-01

    Full Text Available The goal of data mining is to extract or mine" knowledge from large amounts of data. Emerging technologies of semi-structured data have attracted wide attention of networks, e-commerce, information retrieval and databases.XML has become very popular for representing semi structured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. However, the structure of the XML data can be more complex and irregular than that. Association Rule Mining plays a key role in the process of mining data for frequent pattern matching. First Frequent Patterngrowth, for mining the complete set of frequent patterns by pattern fragment growth. First Frequent Pattern-tree based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets and a partition-based, divideand-conquer method is used. This paper shows a complete review of XML data mining using Fast Frequent Pattern mining in various domains.

  5. Introduction of continuous technology at the Vostochnyi surface mine (Ehkibastuzugol' association)

    Energy Technology Data Exchange (ETDEWEB)

    Belik, N.M.; Shal' , R.R.; Utegenov, A.T. (Obedinenie Ehkibastuzugol' (USSR))

    1990-02-01

    Presents advantages provided by continuous mining technology used at the Vostochnyi surface mine as compared to the cyclical technology. Coal and interbeddings with a strength of 8 on the Protod'yakonov scale are excavated from a 2.8 km long and 500-600 m wide field by four excavating sets that consist of bucket wheel excavator, stage loader and a face connecting and loading conveyor. All the conveyors have a capacity of 5,250 m{sup 3}/h that ensures continuous excavator operation. Coal is transported to one of four blending and storage areas and stockpiled by equipment sets of FRG (Weserhuette) and Italian (Italimpianti) make. Projected and actual construction cost of the mine are given as 482.43 and 406.402 million rubles respectively in capital outlay and 231.61 and 180.96 million rubles respectively in assembling work. Output is given as 14 Mt in 1988 at a self-cost of 1.69 rubles per 1 t of coal.

  6. Comprehensive screening for reg1α gene rules out association with tropical calcific pancreatitis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    AIM: To investigate the allelic and haplotypic association of regla gene with tropical calcific pancreatitis (TCP). Since TCP is known to have a variable genetic basis, we investigated the interaction between mutations in the susceptibility genes, SPINK1 and CTSB with reg1α polymorphisms.METHODS: we analyzed the polymorphisms in the regla gene by sequencing the gene including its promoter region in 195 TCP patients and 150 ethnically matched controls, compared their allele and haplotype frequencies, and their association with the pathogenesis and pancreaticolithiasis in TCP and fibro-calculous pancreatic diabetes.RESULTS: We found 8 reported and 2 novel polymo-rphisms including an insertion-deletion polymorphism in the promoter region of reg1α. None of the 5' UTR variants altered any known transcription factor binding sites, neither did any show a statistically significant association with TCP. No association with any reg1α variants was observed on dichotomization of patients based on their N34S SPINK1 or L26V CTSB status.CONCLUSION: Polymorphisms in reg1α gene, including the regulatory variants singly or in combination with the known mutations in SPINK1 and/or CTSB genes, are not associated with tropical calcific pancreatitis.

  7. Event metadata records as a testbed for scalable data mining

    International Nuclear Information System (INIS)

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  8. In Situ Generated Colloid Transport of Cu and Zn in Reclaimed Mine Soil Profiles Associated with Biosolids Application

    Directory of Open Access Journals (Sweden)

    Jarrod O. Miller

    2011-01-01

    Full Text Available Areas reclaimed for agricultural uses following coal mining often receive biosolids applications to increase organic matter and fertility. Transport of heavy metals within these soils may be enhanced by the additional presence of biosolids colloids. Intact monoliths from reclaimed and undisturbed soils in Virginia and Kentucky were leached to observe Cu and Zn mobility with and without biosolids application. Transport of Cu and Zn was observed in both solution and colloid associated phases in reclaimed and undisturbed forest soils, where the presence of unweathered spoil material and biosolids amendments contributed to higher metal release in solution fractions. Up to 81% of mobile Cu was associated with the colloid fraction, particularly when gibbsite was present, while only up to 18% of mobile Zn was associated with the colloid fraction. The colloid bound Cu was exchangeable by ammonium acetate, suggesting that it will release into groundwater resources.

  9. DEVELOPMENT OF PLASTICITY MODEL USING NON ASSOCIATED FLOW RULE FOR HCP MATERIALS INCLUDING ZIRCONIUM FOR NUCLEAR APPLICATIONS

    Energy Technology Data Exchange (ETDEWEB)

    Michael V. Glazoff; Jeong-Whan Yoon

    2013-08-01

    In this report (prepared in collaboration with Prof. Jeong Whan Yoon, Deakin University, Melbourne, Australia) a research effort was made to develop a non associated flow rule for zirconium. Since Zr is a hexagonally close packed (hcp) material, it is impossible to describe its plastic response under arbitrary loading conditions with any associated flow rule (e.g. von Mises). As a result of strong tension compression asymmetry of the yield stress and anisotropy, zirconium displays plastic behavior that requires a more sophisticated approach. Consequently, a new general asymmetric yield function has been developed which accommodates mathematically the four directional anisotropies along 0 degrees, 45 degrees, 90 degrees, and biaxial, under tension and compression. Stress anisotropy has been completely decoupled from the r value by using non associated flow plasticity, where yield function and plastic potential have been treated separately to take care of stress and r value directionalities, respectively. This theoretical development has been verified using Zr alloys at room temperature as an example as these materials have very strong SD (Strength Differential) effect. The proposed yield function reasonably well models the evolution of yield surfaces for a zirconium clock rolled plate during in plane and through thickness compression. It has been found that this function can predict both tension and compression asymmetry mathematically without any numerical tolerance and shows the significant improvement compared to any reported functions. Finally, in the end of the report, a program of further research is outlined aimed at constructing tensorial relationships for the temperature and fluence dependent creep surfaces for Zr, Zircaloy 2, and Zircaloy 4.

  10. An Improved Image Mining Technique For Brain Tumour Classification Using Efficient Classifier

    CERN Document Server

    Rajendran, P

    2010-01-01

    An improved image mining technique for brain tumor classification using pruned association rule with MARI algorithm is presented in this paper. The method proposed makes use of association rule mining technique to classify the CT scan brain images into three categories namely normal, benign and malign. It combines the low level features extracted from images and high level knowledge from specialists. The developed algorithm can assist the physicians for efficient classification with multiple keywords per image to improve the accuracy. The experimental result on prediagnosed database of brain images showed 96 percent and 93 percent sensitivity and accuracy respectively.

  11. Lack of parental rule-setting on eating is associated with a wide range of adolescent unhealthy eating behaviour both for boys and girls

    OpenAIRE

    Holubcikova, Jana; Kolarcik, Peter; Geckova, Andrea Madarasova; van Dijk, Jitse P.; Reijneveld, Sijmen A.

    2016-01-01

    Background: Unhealthy eating habits in adolescence lead to a wide variety of health problems and disorders. The aim of this study was to assess the prevalence of absence of parental rules on eating and unhealthy eating behaviour and to explore the relationships between parental rules on eating and a wide range of unhealthy eating habits of boys and girls. We also explored the association of sociodemographic characteristics such as gender, family affluence or parental education with eating rel...

  12. Lack of parental rule-setting on eating is associated with a wide range of adolescent unhealthy eating behaviour both for boys and girls

    OpenAIRE

    Holubcikova, Jana; Kolarcik, Peter; Madarasova Geckova, Andrea; van Dijk, Jitse P.; Reijneveld, Sijmen A.

    2016-01-01

    Background Unhealthy eating habits in adolescence lead to a wide variety of health problems and disorders. The aim of this study was to assess the prevalence of absence of parental rules on eating and unhealthy eating behaviour and to explore the relationships between parental rules on eating and a wide range of unhealthy eating habits of boys and girls. We also explored the association of sociodemographic characteristics such as gender, family affluence or parental education with eating rela...

  13. Woodpigeons nesting in association with hobby falcons: advantages and choice rules.

    Science.gov (United States)

    Bogliani; Sergio; Tavecchia

    1999-01-01

    Many bird species nest in close association with other bolder and more aggressive birds which provide protection against nest predators. The woodpigeons, Columba palumbus, that nest in poplar plantations in Northern Italy are found almost exclusively clumped around hobby, Falco subbuteo, nests. Woodpigeons settle in the area and build their nests after the hobby has started nesting. We carried out experiments with dummy nests and observations on woodpigeon nests. Dummy woodpigeon nests placed near a hobby's nest suffered less depredation by hooded crows, Corvus corone cornix, than those placed far from it. A logistic regression analysis showed that three variables, hobby nesting stage, distance from the hobby's nest and the hobby's aggressiveness, influenced the probability of nest predation. The degree of protection varied during the hobby's nesting period and was highest when chicks were in the nest. The hobby's aggressiveness against intruders varied both between and within individuals during different nesting phases. The predation rate of dummy nests associated with the falcon was negatively correlated with the aggressiveness score of the hobby during the 6 days of dummy nest exposure. Observations on real nests showed that woodpigeons selected hobbies that had a high fledging success, and a more vigorous defensive behaviour. Clues that would allow woodpigeons to choose the best protector may be early nesting by the hobby and its aggressiveness. Hobbies preyed on adult woodpigeons, but the risk incurred by the woodpigeons was low compared with the very high risk of nest predation in this area. Copyright 1999 The Association for the Study of Animal Behaviour. PMID:10053079

  14. Research of the methods of association rules in image database%图像数据库关联规则的挖掘方法研究

    Institute of Scientific and Technical Information of China (English)

    王远敏

    2012-01-01

      In multimedia applications,the use of the image database is increasingly widespread. In order to use image database more effectively,many data mining techniques is used in image database.This paper uses FP_tree techniques in data mining to mine the rule in image database and constructs an new image database system.%  在多媒体应用中,图像数据库的使用日趋广泛,为了更有效地使用图像数据库,许多数据挖掘技术被用于图像数据库中。本文使用数据挖掘中的关联规则方法来进一步提高图像数据库的性能,基于此构建了一个图像数据库系统,在这个系统中使用了FP增长算法挖掘图像数据的关联规则。

  15. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    DEFF Research Database (Denmark)

    Chen, X; Lee, G; Maher, B S;

    2011-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed...... bioinformatic prioritization for all the markers with P-values ¿0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE...... in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11¿380 cases and 15¿021 controls), we...

  16. Spatial Data Mining Using Novel Neural Networks for Soil Image Classification and Processing

    Directory of Open Access Journals (Sweden)

    S.Nagaprasad ,

    2010-10-01

    Full Text Available The emergent field of spatial data mining uses spatial dependency that is prevalent on spatial data sets, which can be modeled and incorporated into data mining process. Spatial relations are modeled during a data preprocessing step, consisting of the density analysis and vertical view approach, after which an exploration with visualdata mining follows. In this paper we implemented, spatial image processing mining for soil classification using diversified domains like Digital Image Processing, Neural Networks, and Soil fundamentals. The three most important algorithms used in implementation are Back Propagation Network (BPN, Adaptive Resonance Theory 1 (ART and Simplified Fuzzy ARTMAP for soil classification as well as spatial image recognition. Further we are working on our research by combining the visual data mining with spatial data mining algorithms, such as spatial clustering, spatial association rules, a self-organizing map etc. in order to try to detect patterns in the data in an even more effective way.

  17. Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Directory of Open Access Journals (Sweden)

    Lam Nina SN

    2006-02-01

    Full Text Available Abstract Background The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. Results Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. Conclusion Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention.

  18. Market Basket Analysis for a Supermarket based on Frequent Itemset Mining

    Directory of Open Access Journals (Sweden)

    Loraine Charlet Annie M.C.

    2012-09-01

    Full Text Available Market basket analysis is an important component of analytical system in retail organizations to determine the placement of goods, designing sales promotions for different segments of customers to improve customer satisfaction and hence the profit of the supermarket. These issues for a leading supermarket are addressed here using frequent itemset mining. The frequent itemsets are mined from the market basket database using the efficient K-Apriori algorithm and then the association rules are generated.

  19. Market Basket Analysis for a Supermarket based on Frequent Itemset Mining

    OpenAIRE

    Loraine Charlet Annie M.C.; D.Ashok kumar

    2012-01-01

    Market basket analysis is an important component of analytical system in retail organizations to determine the placement of goods, designing sales promotions for different segments of customers to improve customer satisfaction and hence the profit of the supermarket. These issues for a leading supermarket are addressed here using frequent itemset mining. The frequent itemsets are mined from the market basket database using the efficient K-Apriori algorithm and then the association rules are g...

  20. Metals in agricultural produce associated with acid-mine drainage in Mount Morgan (Queensland, Australia).

    Science.gov (United States)

    Vicente-Beckett, Victoria A; McCauley, Gaylene J Taylor; Duivenvoorden, Leo J

    2016-01-01

    Acid-mine drainage (AMD) into the Dee River from the historic gold and copper mine in Mount Morgan, Queensland (Australia) has been of concern to farmers in the area since 1925. This study sought to determine the levels of AMD-related metals and sulfur in agricultural produce grown near the mine-impacted Dee River, compare these with similar produce grown in reference fields (which had no known AMD influence), and assess any potential health risk using relevant Australian or US guidelines. Analyses of lucerne (Medicago sativa; also known as alfalfa) from five Dee fields showed the following average concentrations (mg/kg dry basis): Cd Citrus reticulata) from Dee sites (mg/kg wet weight) were Cd 0.011, Cu 0.59, Fe 2.2, Mn 0.56, Pb 0.18, S 91 and Zn 0.96. Cd and Zn were less than or close to, average Fe and Mn levels were at most twice, Cd 1.8 or 6.5 times, and Pb 8.5 or 72 times the maximum levels in raw oranges reported in the US total diet study (TDS) or the Australian TDS, respectively. Average Cd, Fe, Mn, Pb and Zn levels in the citrus reference samples were found to exceed the maximum reported in one or both TDS surveys. Cu, Fe, Mn, Pb and Zn plant-soil transfer factor (TF) values were citrus fruit samples were 0.14 and 0.73, respectively; lucerne and lucerne hay from both Dee and reference sites gave TF = 10, suggesting some potential risk to cattle, although this conclusion is tentative because Cd levels were close to or less than the detection limit. TF values for S in lucerne, lucerne hay, pasture grass and mandarin oranges from Dee sites were 18, 14, 3 and 3.6, respectively, indicating that S in soil was readily available to plant or fruit. Sulfur in pasture grass and citrus fruit (TF = 11 for both) was apparently more bioavailable at the reference sites than at the Dee sites (TF = 3.0 for pasture grass; TF = 3.6 for citrus fruit). PMID:26979303

  1. Towards a New Approach for Mining Frequent Itemsets on Data Stream

    Directory of Open Access Journals (Sweden)

    Shailendra Jain

    2012-12-01

    Full Text Available From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Nowadays, many organizations generate and utilize vast data streams (Huang, 2002. Employing data mining schemes on such massive data streams can unearth real-time trends and patterns which can be utilized for dynamic and timely decisions. Mining in such a high speed, enormous data streams significantly differs from traditional data mining in several ways. Firstly, the response time of the mining algorithm should be as small as possible due to the online nature of the data and limited resources dedicated to mining activities (Charikar, 2004. Second, the underlying data is highly volatile and subject to change over period of time (Chang, 2003. Moreover, since there is no time for preprocessing the data in order to remove noise, the streamed data can have noise inherent in it. Due to all aforementioned problems, data stream mining is receiving increasing attention and current research is now focused on the efficient resolution to the problem cited above. Although, the field of data stream mining is being heavily investigated, there is still a lack of a holistic and generic approach for mining association rules from data streams. Thus, this research attempts to fill this gap by integrating ideas from previous work in data stream mining. This investigation focuses on the degree of effectiveness of using a probabilistic approach of sampling in the data stream together with an incremental approach to maintenance of frequent

  2. Sequential Extraction Results and Mineralogy of Mine Waste and Stream Sediments Associated With Metal Mines in Vermont, Maine, and New Zealand

    Science.gov (United States)

    Piatak, N.M.; Seal, R.R.; Sanzolone, R.F.; Lamothe, P.J.; Brown, Z.A.; Adams, M.

    2007-01-01

    We report results from sequential extraction experiments and the quantitative mineralogy for samples of stream sediments and mine wastes collected from metal mines. Samples were from the Elizabeth, Ely Copper, and Pike Hill Copper mines in Vermont, the Callahan Mine in Maine, and the Martha Mine in New Zealand. The extraction technique targeted the following operationally defined fractions and solid-phase forms: (1) soluble, adsorbed, and exchangeable fractions; (2) carbonates; (3) organic material; (4) amorphous iron- and aluminum-hydroxides and crystalline manganese-oxides; (5) crystalline iron-oxides; (6) sulfides and selenides; and (7) residual material. For most elements, the sum of an element from all extractions steps correlated well with the original unleached concentration. Also, the quantitative mineralogy of the original material compared to that of the residues from two extraction steps gave insight into the effectiveness of reagents at dissolving targeted phases. The data are presented here with minimal interpretation or discussion and further analyses and interpretation will be presented elsewhere.

  3. A new algorithm to extract hidden rules of gastric cancer data based on ontology.

    Science.gov (United States)

    Mahmoodi, Seyed Abbas; Mirzaie, Kamal; Mahmoudi, Seyed Mostafa

    2016-01-01

    Cancer is the leading cause of death in economically developed countries and the second leading cause of death in developing countries. Gastric cancers are among the most devastating and incurable forms of cancer and their treatment may be excessively complex and costly. Data mining, a technology that is used to produce analytically useful information, has been employed successfully with medical data. Although the use of traditional data mining techniques such as association rules helps to extract knowledge from large data sets, sometimes the results obtained from a data set are so large that it is a major problem. In fact, one of the disadvantages of this technique is a lot of nonsense and redundant rules due to the lack of attention to the concept and meaning of items or the samples. This paper presents a new method to discover association rules using ontology to solve the expressed problems. This paper reports a data mining based on ontology on a medical database containing clinical data on patients referring to the Imam Reza Hospital at Tabriz. The data set used in this paper is gathered from 490 random visitors to the Imam Reza Hospital at Tabriz, who had been suspicions of having gastric cancer. The proposed data mining algorithm based on ontology makes rules more intuitive, appealing and understandable, eliminates waste and useless rules, and as a minor result, significantly reduces Apriori algorithm running time. The experimental results confirm the efficiency and advantages of this algorithm.

  4. REx: An Efficient Rule Generator

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    This paper describes an efficient algorithm REx for generating symbolic rules from artificial neural network (ANN). Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifiable by domain experts, and can be modified, extended and passed on as modular knowledge. REx exploits the first order information in the data and finds shortest sufficient conditions for a rule of a class that can differentiate it from patterns of other classes. It can generate concise and perfect rules in the sense that the error rate of the rules is not worse than the inconsistency rate found in the original data. An important feature of rule extraction algorithm, REx, is its recursive nature. They are concise, comprehensible, order insensitive and do not involve any weight values. Extensive experimental studies on several benchmark classification problems, s...

  5. Identifying users of traditional and Internet-based resources for meal ideas: An association rule learning approach.

    Science.gov (United States)

    Doub, Allison E; Small, Meg L; Levin, Aron; LeVangie, Kristie; Brick, Timothy R

    2016-08-01

    Increasing home cooking while decreasing the consumption of food prepared away from home is a commonly recommended weight management strategy, however research on where individuals obtain ideas about meals to cook at home is limited. This study examined the characteristics of individuals who reported using traditional and Internet-based resources for meal ideas. 583 participants who were ≥50% responsible for household meal planning were recruited to approximate the 2014 United States Census distribution on sex, age, race/ethnicity, and household income. Participants reported demographic characteristics, home cooking frequency, and their use of 4 traditional resources for meal ideas (e.g., cookbooks), and 7 Internet-based resources for meal ideas (e.g., Pinterest) in an online survey. Independent samples t-tests compared home cooking frequency by resource use. Association rule learning identified those demographic characteristics that were significantly associated with resource use. Family and friends (71%), food community websites (45%), and cookbooks (41%) were the most common resources reported. Cookbook users reported preparing more meals at home per week (M = 9.65, SD = 5.28) compared to non-cookbook users (M = 8.11, SD = 4.93; t = -3.55, p < 0.001). Resource use was generally higher among parents and varied systematically with demographic characteristics. Findings suggest that home cooking interventions may benefit by modifying resources used by their target population. PMID:27067739

  6. Identifying users of traditional and Internet-based resources for meal ideas: An association rule learning approach.

    Science.gov (United States)

    Doub, Allison E; Small, Meg L; Levin, Aron; LeVangie, Kristie; Brick, Timothy R

    2016-08-01

    Increasing home cooking while decreasing the consumption of food prepared away from home is a commonly recommended weight management strategy, however research on where individuals obtain ideas about meals to cook at home is limited. This study examined the characteristics of individuals who reported using traditional and Internet-based resources for meal ideas. 583 participants who were ≥50% responsible for household meal planning were recruited to approximate the 2014 United States Census distribution on sex, age, race/ethnicity, and household income. Participants reported demographic characteristics, home cooking frequency, and their use of 4 traditional resources for meal ideas (e.g., cookbooks), and 7 Internet-based resources for meal ideas (e.g., Pinterest) in an online survey. Independent samples t-tests compared home cooking frequency by resource use. Association rule learning identified those demographic characteristics that were significantly associated with resource use. Family and friends (71%), food community websites (45%), and cookbooks (41%) were the most common resources reported. Cookbook users reported preparing more meals at home per week (M = 9.65, SD = 5.28) compared to non-cookbook users (M = 8.11, SD = 4.93; t = -3.55, p Resource use was generally higher among parents and varied systematically with demographic characteristics. Findings suggest that home cooking interventions may benefit by modifying resources used by their target population.

  7. 厚煤层小煤柱巷道采掘交锋应力变化规律研究%Research on Change Rule of Roadway Tunneling and Mining Stress in Thick Coal Seam Small Pillar

    Institute of Scientific and Technical Information of China (English)

    王震

    2014-01-01

    以晋华宫矿12-3#层5107巷为研究对象,利用FLAC3D软件模拟研究了厚煤层在留设小煤柱时采掘交锋过程中围岩的应力变化规律。结果表明:采掘交锋前,第一应力在顶底角形成了小范围的应力集中,最大值为12 MPa,第二应力由巷道表面向深部逐渐增加,在煤帮深处达到最大;采掘交锋后,巷道浅部围岩第一主应力增量波动较大,巷道顶板上部及靠煤柱侧的顶角,第一应力均有不同程度的增加,最大增量达到4.78 MPa,而第二应力基本没增加,反而有不同程度的降低。%Takes No.12 -3 layer of 5 107 roadway in Jinhuagong coal mine as research object,utilizes FLAC3D software simulation researches the stress change rule of surrounding rock in the process of tunneling and mining when small coal pillar is establishes in thick coal seam,the result shows that before tunneling and mining,the major princi-pal stress forms small range of stress concentration in the corner of roof and floor,the maximum is 12 MPa,the third principal stress gradually increases from roadway surface to the deep,reaches maximum in the depths of coal side.Af-ter tunneling and mining,the major principal stress increment of roadway shallow surrounding rock have a larger fluc-tuation,the major principal stress of roadway roof upside and top corner on the side of coal pillar have different de-grees of augment,the biggest increment reaches 4.78 MPa,but the third principal stress almost have no increase,on the contrary have different degrees of decrease.

  8. Analyzing and Mining Ordered Information Tables

    Institute of Scientific and Technical Information of China (English)

    SAI Ying (赛英); Y. Y. Yao

    2003-01-01

    Work in inductive learning has mostly been concentrated on classifying. However,there are many applications in which it is desirable to order rather than to classify instances. For modelling ordering problems, we generalize the notion of information tables to ordered information tables by adding order relations in attribute values. Then we propose a data analysis model by analyzing the dependency of attributes to describe the properties of ordered information tables.The problem of mining ordering rules is formulated as finding association between orderings of attribute values and the overall ordering of objects. An ordering rules may state that "if the value of an object x on an attribute a is ordered ahead of the value of another object y on the same attribute, then x is ordered ahead of y". For mining ordering rules, we first transform an ordered information table into a binary information table, and then apply any standard machine learning and data mining algorithms. As an illustration, we analyze in detail Maclean's universities ranking for the year 2000.

  9. A Novel Approach for Web Page Set Mining

    CERN Document Server

    Geeta, R B; Totad, Shasikumar G; D, Prasad Reddy P V G

    2011-01-01

    The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL), the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine) provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional databa...

  10. Associations of dominant plant species with arbuscular mycorrhizal fungi during vegetation development on coal mine spoil banks

    Energy Technology Data Exchange (ETDEWEB)

    Rydlova, J.; Vosatka, M. [Academy of Science. Pruhonice (Czech Republic). Inst. of Botany

    2001-07-01

    Among plants colonizing mine spoil banks in Northern Bohemia the first colonizers, mainly ruderal annuals from Chenopodiaceae and Brassicaceae were found not to be associated with arbuscular mycorrhizal fungi (AMF). These species cultivated in pots with soil from four sites in different succession stages of the spoil bank did not respond to the presence of native or non-native AMF. All grass species studied (Elytrigia repens, Calamagrostis epigejos and Arrhenatherum elatius) were found moderately colonized in the field. Carduus acanthoides was found to be highly colonized in the field; however, it did not show growth response to AMF in the pot experiment. The AMF native in four sites on the spoil banks showed high infectivity but low effectiveness in association with colonizing plants compared to the non-native isolate G. fistulosum BEG23. In general, dependence on AMF in the cultivation experiment was rather low, regardless of the fact that plants were found to be associated with AMF either in the field or in pots. Occurrence and effectiveness of mycorrhizal associations might relate primarily to the mycotrophic status of each plant species rather than to the age of the spoil bank sites studied.

  11. 基于多维关联规则的本体规则扩展方法%Methods for the Extension Rules of Ontology Based on Multidimensional Association Rules

    Institute of Scientific and Technical Information of China (English)

    董俊; 王锁萍; 熊范纶; 张友华

    2009-01-01

    Currently, the extension and enrichment for ontology have some limitations. Therefore, an approach is presented to extend ontology rules with multi-dimensional association rule technology. The conception ontology is enriched and extended by ontology rules extraction, consistency treatment under guidance of the ontology, rules mapping establishment, and the re-identification and update for conception ontology. The experimental results of tea diseases and pests predicting ontology show that the proposed approach can be easily implemented and has good feasibility and validity.%目前扩充和丰富本体存在很大的局限性.对此,文中提出采用多维关联规则技术扩展本体规则方法.通过对本体规则提取,在本体指导下的一致性处理,规则映射的建立,以及对概念本体的重新识别和更新等技术和方法充实和扩展概念本体.茶病虫害预测本体的实验结果表明该方法易于实现且具有较高的可行性和有效性.

  12. Mine soils associated with open-cast coal mining in Spain: a review; Suelos mineros asociados a la mineria de carbon a cielo abierto en Espana: una revision

    Energy Technology Data Exchange (ETDEWEB)

    Arranz-Gonzalez, J. C.

    2011-07-01

    The different situations that may be found after the closure of coal mines range from the simple abandonment of pits and spoil tips to areas where reclamation work has led to the creation of artificial soils on a reconstituted surface composed of layers of rock and soil or both types of material. Soils of this type are known as mine soils, amongst which those generated by coal mining have been studied most extensively, both to assess their potential for reclamation and to learn more about their pedogenetic evolution. We present here a review of some of the more important works devoted to this subject. We have found evidence to show that in Spain, just as in other countries, the physical and chemical properties of these anthropogenic soils are changing rapidly and so the mine-soil profiles described can be considered as belonging to very young soils still undergoing incipient but rapid development. We have also found that an analysis of information obtained from the soil parameters of surface samples and its interpretation is of great practical use in restoration processes. Nevertheless, the sampling and description of soil profiles has proved to be of much greater interest, allowing us to reach a clearer understanding of the internal processes and properties that are unique to these types of anthropogenic soil. (Author) 64 refs.

  13. 基于关联规则的渔业信息推荐系统设计与实现%Design and implementation of fishery information recommendation system based on association rules

    Institute of Scientific and Technical Information of China (English)

    王立华; 肖慧; 徐硕; 刘树; 杜卫利; 黄其泉; 王宇

    2013-01-01

      为了快速便捷的获取渔业科学数据信息,基于Web日志挖掘技术对渔业科学数据共享平台用户频繁访问模式进行分析,用于发现用户访问规则,进行信息推荐服务.对分析挖掘中涉及的算法进行了分析与改进,提出了IASR(IP agent session referrer)用户识别算法和关联规则Apriori算法的改进算法,试验研究表明,IASR算法将用户识别准确性提高了13%,速度是通用算法的2倍.当事务数目大于500时,改进算法的执行效率远远优于Apriori算法,速度提高6倍以上.在此基础上,讨论了系统的关键设计与实现方法,开发了渔业信息推荐系统.系统采用JAVA、AJAX开发,数据库SQL Server 2005,操作系统为Windows XP.应用结果表明,系统可使用户方便快捷地获取自己感兴趣的渔业数据信息,从而提高信息服务的质量.%In order to obtain fishery scientific data quickly and easily, this article analyzed the user’s interests in visiting fishery scientific data platforms based on data mining, mined rules and gave information recommendations according to the rules. The association rule technique, one of the commonly used algorithms for mining data analysis, attempts to find some relation of transaction items to the mass data. Based on the design of the Fisheries Information Recommendation System, the association rules mining technique was used to access the web log data of the Fishery Scientific Data Platform to find the user access pattern by data processing, pattern discovery and pattern recognition analysis procedures. Researchers analyzed and improved the algorithms involved in the mining analysis, proposed the IASR (IP Agent Session Referrer) algorithm for user identification, and introduced the technology of rewrite URL, IASR used four key informations:IP, Agent, Session and Referrer, and added session identification mechanism to recognized users in order to improve the accuracy of user identification. In the light

  14. Based on the Intelligence of Mining Association Particles Traffic Accident Liability Judge Method%基于智能关联微粒挖掘的交通事故责任判断方法

    Institute of Scientific and Technical Information of China (English)

    魏红娟

    2012-01-01

    道路交通事故的发生原因是多方面的,实际的交通事故存在潜在的规律性,挖掘这种潜在的规律可以更好地改善交通事故现状.提出一种基于智能微粒群的交通事故成因关联挖掘的方法,通过实际的交通事故发生时现场数据对粒子进行编码,使用后代繁殖变异的方法对粒子进行更新,设置支持度与置信度构造优化的目标函数寻找符合的粒子做为挖掘的规则.实验仿真结果证明,本文的算法能够对交通事故的数据进行合理建模,对实际的人工调控交通有很好的指导意义.%Road traffic accident reason is various, the actual traffic accident potentially regularity, and mining the potential law can better improve traffic accident situation. Is proposed based on a particle swarm intelligence of the traffic accident causes associated mining method, through the actual traffic accident happened to particles on data coding, use the method of offspring breeding variation of particle to update and set up support for the confidence and the optimization of the structure of the objective function looking for particle as a mining rules. The experiment the simulation results show that this algorithm can for the traffic accident on the reasonable data modeling, artificial regulation to the actual traffic has a good guide.

  15. 78 FR 39531 - Mine Rescue Teams

    Science.gov (United States)

    2013-07-01

    ... Rescue Teams; CFR Correction #0;#0;Federal Register / Vol. 78 , No. 126 / Monday, July 1, 2013 / Rules... Rescue Teams CFR Correction In Title 30 of the Code of Federal Regulations, Parts 1 to 199, revised as of... Miner Act Requirements for Underground Coal Mine Operators and Mine Rescue Teams Type of mine...

  16. Data mining in radiology

    Directory of Open Access Journals (Sweden)

    Amit T Kharat

    2014-01-01

    Full Text Available Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  17. Data mining in radiology.

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-04-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513

  18. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

    Science.gov (United States)

    Cao, Hui; Markatou, Marianthi; Melton, Genevieve B; Chiang, Michael F; Hripcsak, George

    2005-01-01

    This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.

  19. Dissolved metals and associated constituents in abandoned coal-mine discharges, Pennsylvania, USA. Part 1: Constituent quantities and correlations

    Science.gov (United States)

    Cravotta, C.A.

    2008-01-01

    Complete hydrochemical data are rarely reported for coal-mine discharges (CMD). This report summarizes major and trace-element concentrations and loadings for CMD at 140 abandoned mines in the Anthracite and Bituminous Coalfields of Pennsylvania. Clean-sampling and low-level analytical methods were used in 1999 to collect data that could be useful to determine potential environmental effects, remediation strategies, and quantities of valuable constituents. A subset of 10 sites was resampled in 2003 to analyze both the CMD and associated ochreous precipitates; the hydrochemical data were similar in 2003 and 1999. In 1999, the flow at the 140 CMD sites ranged from 0.028 to 2210 L s-1, with a median of 18.4 L s-1. The pH ranged from 2.7 to 7.3; concentrations (range in mg/L) of dissolved (0.45-??m pore-size filter) SO4 (34-2000), Fe (0.046-512), Mn (0.019-74), and Al (0.007-108) varied widely. Predominant metalloid elements were Si (2.7-31.3 mg L-1), B ( C > P = N = Se) were not elevated in the CMD samples compared to average river water or seawater. Compared to seawater, the CMD samples also were poor in halogens (Cl > Br > I > F), alkalies (Na > K > Li > Rb > Cs), most alkaline earths (Ca > Mg > Sr), and most metalloids but were enriched by two to four orders of magnitude with Fe, Al, Mn, Co, Be, Sc, Y and the lanthanide rare-earth elements, and one order of magnitude with Ni and Zn. The ochre samples collected at a subset of 10 sites in 2003 were dominantly goethite with minor ferrihydrite or lepidocrocite. None of the samples for this subset contained schwertmannite or was Al rich, but most contained minor aluminosilicate detritus. Compared to concentrations in global average shale, the ochres were rich in Fe, Ag, As and Au, but were poor in most other metals and rare earths. The ochres were not enriched compared to commercial ore deposits mined for Au or other valuable metals. Although similar to commercial Fe ores in composition, the ochres are dispersed and

  20. New insight into genes in association with asthma: literature-based mining and network centrality analysis

    Institute of Scientific and Technical Information of China (English)

    LIANG Rui; WANG Lei; WANG Gang

    2013-01-01

    Background Asthma is a heterogeneous disease for which a strong genetic basis has been firmly established.Until now no studies have been undertaken to systemically explore the network of asthma-related genes using an internally developed literature-based discovery approach.This study was to explore asthma-related genes by using literaturebased mining and network centrality analysis.Methods Literature involving asthma-related genes were searched in PubMed from 2001 to 2011.Integration of natural language processing with network centrality analysis was used to identify asthma susceptibility genes and their interaction network.Asthma susceptibility genes were classified into three functional groups by gene ontology (GO) analysis and the key genes were confirmed by establishing asthma-related networks and pathways.Results Three hundred and twenty-six genes related with asthma such as IGHE (IgE),interleukin (IL)-4,5,6,10,13,17A,and tumor necrosis factor (TNF)-alpha were identified.GO analysis indicated some biological processes (developmental processes,signal transduction,death,etc.),cellular components (non-structural extracellular,plasma membrane and extracellular matrix),and molecular functions (signal transduction activity) that were involved in asthma.Furthermore,22 asthma-related pathways such as the Toll-like receptor signaling pathway,hematopoietic cell lineage,JAK-STAT signaling pathway,chemokine signaling pathway,and cytokine-cytokine receptor interaction,and 17 hub genes,such as JAK3,CCR1-3,CCR5-7,CCR8,were found.Conclusions Our study provides a remarkably detailed and comprehensive picture of asthma susceptibility genes and their interacting network.Further identification of these genes and molecular pathways may play a prominent role in establishing rational therapeutic approaches for asthma.