WorldWideScience

Sample records for association rule mining

  1. Controlling False Positives in Association Rule Mining

    CERN Document Server

    Liu, Guimei; Wong, Limsoon

    2011-01-01

    Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often suffers from a high risk of false positive errors. There is a lack of comprehensive study on controlling false positives in association rule mining. In this paper, we adopt three multiple testing correction approaches---the direct adjustment approach, the permutation-based approach and the holdout approach---to control false positives in association rule mining, and conduct extensive experiments to study their performance. Our results show that (1) Numerous spurious rules are generated if no correction is made. (2) The three approaches can control false positives effectively. Among the three approaches, the permutation...

  2. Association Rule Mining and Its Application

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Several algorithms in data mining technique have been studied recently, among which association is one of the most important techniques. In this paper, we introduce theory of association rule in data mining, and analyze the characteristics of postal EMS service. We create a data warehouse model for EMS services and give the procedure of applying association rule mining based on it. In the end, we give an example of the whole mining procedure. This EMS-Data warehouse model and association rule mining technique have been applied in a practical Postal CRM System.

  3. A Collaborative Educational Association Rule Mining Tool

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; de Castro, Carlos

    2011-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the ongoing improvement of e-learning courses and allowing teachers with similar course profiles to share and score the discovered information. The mining tool is oriented to be used by non-expert instructors in data mining so its internal…

  4. Efficient Mining of Intertransaction Association Rules

    NARCIS (Netherlands)

    Tung, A.K.H.; Lu, H.J.; Han, J.W.; Feng, L.

    2003-01-01

    Most of the previous studies on mining association rules are on mining intratransaction associations, i.e., the associations among items within the same transaction where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this s

  5. Association Rule Mining for Web Recommendation

    Directory of Open Access Journals (Sweden)

    R. Suguna

    2012-10-01

    Full Text Available Web usage mining is the application of web mining to discover the useful patterns from the web in order to understand and analyze the behavior of the web users and web based applications. It is theemerging research trend for today’s researchers. It entirely deals with web log files which contain the user website access information. It is an interesting thing to analyze and understand the user behaviorabout the web access. Web usage mining normally has three categories: 1. Preprocessing, 2. Pattern Discovery and 3. Pattern Analysis. This paper proposes the association rule mining algorithms for betterWeb Recommendation and Web Personalization. Web recommendation systems are considered as an important role to understand customers’ behavior, interest, improving customer convenience, increasingservice provider profits and future needs.

  6. Mining Hesitation Information by Vague Association Rules

    Science.gov (United States)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  7. Research on spatial association rules mining in two-direction

    Institute of Scientific and Technical Information of China (English)

    XUE Li-xia; WANG Zuo-cheng

    2007-01-01

    In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.

  8. Efficient mining of association rules based on gravitational search algorithm

    Directory of Open Access Journals (Sweden)

    Fariba Khademolghorani

    2011-07-01

    Full Text Available Association rules mining are one of the most used tools to discover relationships among attributes in a database. A lot of algorithms have been introduced for discovering these rules. These algorithms have to mine association rules in two stages separately. Most of them mine occurrence rules which are easily predictable by the users. Therefore, this paper discusses the application of gravitational search algorithm for discovering interesting association rules. This evolutionary algorithm is based on the Newtonian gravity and the laws of motion. Furthermore, contrary to the previous methods, the proposed method in this study is able to mine the best association rules without generating frequent itemsets and is independent of the minimum support and confidence values. The results of applying this method in comparison with the method of mining association rules based upon the particle swarm optimization show that our method is successful.

  9. AN ALGORITHM FOR GENERATING SINGLE DIMENSIONAL FUZZY ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2006-01-01

    Full Text Available Association rule mining searches for interesting relationship among items in a large data set. Market basket analysis, a typical example of association rule mining, analyzes buying habit of customers by finding association between the different items that customers put in their shopping cart (basket. Apriori algorithm is an influential algorithm for mining frequent itemset for generating association rules. For some reasons, Apriori algorithm is not based on human intuitive. To provide a more human-based concept, this paper proposes an alternative algorithm for generating the association rule by utilizing fuzzy sets in the market basket analysis.

  10. Mining association rule efficiently based on data warehouse

    Institute of Scientific and Technical Information of China (English)

    陈晓红; 赖邦传; 罗铤

    2003-01-01

    The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed.By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.

  11. Compact Weighted Class Association Rule Mining using Information Gain

    CERN Document Server

    Ibrahim, S P Syed

    2011-01-01

    Weighted association rule mining reflects semantic significance of item by considering its weight. Classification constructs the classifier and predicts the new data instance. This paper proposes compact weighted class association rule mining method, which applies weighted association rule mining in the classification and constructs an efficient weighted associative classifier. This proposed associative classification algorithm chooses one non class informative attribute from dataset and all the weighted class association rules are generated based on that attribute. The weight of the item is considered as one of the parameter in generating the weighted class association rules. This proposed algorithm calculates the weight using the HITS model. Experimental results show that the proposed system generates less number of high quality rules which improves the classification accuracy.

  12. Time-Saving Approach for Optimal Mining of Association Rules

    Directory of Open Access Journals (Sweden)

    Mouhir Mohammed

    2016-10-01

    Full Text Available Data mining is the process of analyzing data so as to get useful information to be exploited by users. Association rules is one of data mining techniques used to detect different correlations and to reveal relationships among data individual items in huge data bases. These rules usually take the following form: if X then Y as independent attributes. An association rule has become a popular technique used in several vital fields of activity such as insurance, medicine, banks, supermarkets… Association rules are generated in huge numbers by algorithms known as Association Rules Mining algorithms. The generation of huge quantities of Association Rules may be time-and-effort consuming this is the reason behind an urgent necessity of an efficient and scaling approach to mine only the relevant and significant association rules. This paper proposes an innovative approach which mines the optimal rules from a large set of Association Rules in a distributive processing way to improve its efficiency and to decrease the running time.

  13. An Optimized Weighted Association Rule Mining On Dynamic Content

    CERN Document Server

    Velvadivu, P

    2010-01-01

    Association rule mining aims to explore large transaction databases for association rules. Classical Association Rule Mining (ARM) model assumes that all items have the same significance without taking their weight into account. It also ignores the difference between the transactions and importance of each and every itemsets. But, the Weighted Association Rule Mining (WARM) does not work on databases with only binary attributes. It makes use of the importance of each itemset and transaction. WARM requires each item to be given weight to reflect their importance to the user. The weights may correspond to special promotions on some products, or the profitability of different items. This research work first focused on a weight assignment based on a directed graph where nodes denote items and links represent association rules. A generalized version of HITS is applied to the graph to rank the items, where all nodes and links are allowed to have weights. This research then uses enhanced HITS algorithm by developing...

  14. a Research on Spatial Topological Association Rules Mining

    Science.gov (United States)

    Chen, J.; Liu, S.; Zhang, P.; Sha, Z.

    2012-07-01

    Spatial association rules mining is a process of acquiring information and knowledge from large databases. Due to the nature of geographic space and the complexity of spatial objects and relations, the classical association rule mining methods are not suitable for the spatial association rule mining. Classical association rule mining treats all input data as independent, while spatial association rules often show high autocorrelation among nearby objects. The contiguous, adjacent and neighboring relations between spatial objects are important topological relations. In this paper a new approach based on topological predictions to discover spatial association rules is presented. First, we develop a fast method to get the topological relationship of spatial data with its algebraic structure. Then the interested spatial objects are selected. To find the interested spatial objects, topological relations combining with distance were used. In this step, the frequent topological predications are gained. Next, the attribute datasets of the selected interested spatial objects are mined with Apriori algorithm. Last, get the spatial topological association rules. The presented approach has been implemented and tested by the data of GDP per capita, railroads and roads in China in the year of 2005 at county level. The results of the experiments show that the approach is effective and valid.

  15. Efficient Analysis of Pattern and Association Rule Mining Approaches

    Directory of Open Access Journals (Sweden)

    Thabet Slimani

    2014-02-01

    Full Text Available The process of data mining produces various patterns from a given data source. The most recognized data mining tasks are the process of discovering frequent itemsets, frequent sequential patterns, frequent sequential rules and frequent association rules. Numerous efficient algorithms have been proposed to do the above processes. Frequent pattern mining has been a focused topic in data mining research with a good number of references in literature and for that reason an important progress has been made, varying from performant algorithms for frequent itemset mining in transaction databases to complex algorithms, such as sequential pattern mining, structured pattern mining, correlation mining. Association Rule mining (ARM is one of the utmost current data mining techniques designed to group objects together from large databases aiming to extract the interesting correlation and relation among huge amount of data. In this article, we provide a brief review and analysis of the current status of frequent pattern mining and discuss some promising research directions. Additionally, this paper includes a comparative study between the performance of the described approaches.

  16. Mining multilevel spatial association rules with cloud models

    Institute of Scientific and Technical Information of China (English)

    YANG Bin; ZHU Zhong-ying

    2005-01-01

    The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules.Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.

  17. A Survey of Association Rule Mining Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Anubha Sharma

    2012-08-01

    Full Text Available Data mining is the analysis step of the "Knowledge Discovery in Databases" process, or KDD. It is the process that results in the discovery of new patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract knowledge from an existing data set and transform it into a human-understandable structure. In data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Genetic algorithm (GA is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms, which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. In previous, many researchers have proposed Genetic Algorithms for mining interesting association rules from quantitative data. In this paper we represent a survey of Association Rule Mining Using Genetic Algorithm. The techniques are categorized based upon different approaches. This paper provides the major advancement in the approaches for association rule mining using genetic algorithms.

  18. A Fast Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    黄刘生; 陈华平; 王洵; 陈国良

    2000-01-01

    In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm,BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.

  19. Association Rule Mining from an Intelligent Tutor

    Science.gov (United States)

    Dogan, Buket; Camurcu, A. Yilmaz

    2008-01-01

    Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…

  20. Detection of Attacks on MAODV Association Rule Mining Optimization

    Directory of Open Access Journals (Sweden)

    A. Fidalcastro

    2015-02-01

    Full Text Available Current mining algorithms can generate large number of rules and very slow to generate rules or generate few results, omitting interesting and valuable information. To address this problem, we propose an algorithm Optimized Featured Top Association Rules (OFTAR algorithm, where every attack have many features and some of the features are more important. The Features are selected by genetic algorithm and processed by the OFTAR algorithm to find the optimized rules. The algorithm utilizes Genetic Algorithm feature selection approach to find optimized features. OFTAR incorporate association rules with several rule optimization techniques and expansion techniques to improve efficiency. Increasing popularity of Mobile ad hoc network users of wireless networks lead to threats and attacks on MANET, due to its features. The main challenge in designing a MANET is protecting from various attacks in the network. Intrusion Detection System is required to monitor the network and to detect the malicious node in the network in multi casting mobility environment. The node features are processed in Association Analysis to generate rules, the generated rules are applied to nodes to detect the attacks. Experimental results show that the algorithm has higher scalability and good performance that is an advantageous to several association rule mining algorithms when the rule generation is controlled and optimized to detect the attacks.

  1. Mining Association Rules in Students Assessment Data

    Directory of Open Access Journals (Sweden)

    Anupama Chadha

    2012-09-01

    Full Text Available Higher education, throughout the world is delivered through universities, colleges affiliated to various universities and some other recognized academic institutes. Today one of the biggest challenges, the educational institutions face, is the explosive growth of educational data and to use this data to improve the quality of managerial decisions to deliver quality education. In this paper we will perform a case study of a university that hopes to improve the quality of education by analyzing the data and discover the factors that affect the academic results so as to increase success chances of students. In this perspective we use association rules discovery techniques. Also we will show the importance of data preprocessing in data analysis which has a significant impact on the accuracy of the predicted results.

  2. Optimizing Mining Association Rules for Artificial Immune System based Classification

    Directory of Open Access Journals (Sweden)

    SAMEER DIXIT

    2011-08-01

    Full Text Available The primary function of a biological immune system is to protect the body from foreign molecules known as antigens. It has great pattern recognition capability that may be used to distinguish between foreigncells entering the body (non-self or antigen and the body cells (self. Immune systems have many characteristics such as uniqueness, autonomous, recognition of foreigners, distributed detection, and noise tolerance . Inspired by biological immune systems, Artificial Immune Systems have emerged during the last decade. They are incited by many researchers to design and build immune-based models for a variety of application domains. Artificial immune systems can be defined as a computational paradigm that is inspired by theoretical immunology, observed immune functions, principles and mechanisms. Association rule mining is one of the most important and well researched techniques of data mining. The goal of association rules is to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in thetransaction databases or other data repositories. Association rules are widely used in various areas such as inventory control, telecommunication networks, intelligent decision making, market analysis and risk management etc. Apriori is the most widely used algorithm for mining the association rules. Other popular association rule mining algorithms are frequent pattern (FP growth, Eclat, dynamic itemset counting (DIC etc. Associative classification uses association rule mining in the rule discovery process to predict the class labels of the data. This technique has shown great promise over many other classification techniques. Associative classification also integrates the process of rule discovery and classification to build the classifier for the purpose of prediction. The main problem with the associative classification approach is the discovery of highquality association rules in a very large space of

  3. Parallel mining and application of fuzzy association rules

    Institute of Scientific and Technical Information of China (English)

    LU Jian-jiang; XU Bao-wen; ZOU Xiao-feng; KANG Da-zhou; LI Yan-hui; ZHOU Jin

    2006-01-01

    Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm.Fuzzy c-means algorithm can embody the actual distribution of the data,and fuzzy sets can soften the partition boundary.Then,we improve the search technology of apriori algorithm and present the algorithm for mining fuzzy association rules.As the database size becomes larger and larger,a better way is to mine fuzzy association rules in parallel.In the parallel mining algorithm,quantitative attributes are partitioned into several fuzzy sets by using parallel fuzzy c-means algorithm.Boolean parallel algorithm is improved to discover frequent fuzzy attribute set,and the fuzzy association rules with at least a minimum confidence are generated on all processors.The experiment results implemented on the distributed linked PC/workstation show that the parallel mining algorithm has fine scaleup,sizeup and speedup.Last,we discuss the application of fuzzy association rules in the classification.The example shows that the accuracy of classification systems of the fuzzy association rules is better than that of the two popular classification methods:C4.5 and CBA.

  4. A Novel Approach for Association Rule Mining using Pattern Generation

    Directory of Open Access Journals (Sweden)

    Deepa S. Deshpande

    2014-10-01

    Full Text Available Data mining has become a process of significant interest in recent years due to explosive rate of the accumulation of data. It is used to discover potentially valuable implicit knowledge from the large transactional databases. Association rule mining is one of the well known techniques of data mining. It typically aims at discovering associations between attributes in the large databases. The first and the most influential traditional algorithm for association rule discovery is Apriori. Multiple scans of database, generation of large number of candidates item set and discovery of interesting rules are the main challenging issues for the improvement of Apriori algorithm. Therefore in order to decrease the multiple scanning of database, a new method of association rule mining using pattern generation is proposed in this paper. This method involves three steps. First, patterns are generated using items from the transaction database. Second, frequent item set is obtained using these patterns. Finally association rules are derived. The performance of this method is evaluated with the traditional Apriori algorithm. It shows that behavior of the proposed method is much more similar to Apriori algorithm with less memory space and reduction in multiple times scanning of database. Thus it is more efficient than the traditional Apriori algorithm.

  5. Database Reverse Engineering based on Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Nattapon Pannurat

    2010-03-01

    Full Text Available Maintaining a legacy database is a difficult task especially when system documentation is poor written or even missing. Database reverse engineering is an attempt to recover high-level conceptual design from the existing database instances. In this paper, we propose a technique to discover conceptual schema using the association mining technique. The discovered schema corresponds to the normalization at the third normal form, which is a common practice in many business organizations. Our algorithm also includes the rule filtering heuristic to solve the problem of exponential growth of discovered rules inherited with the association mining technique.

  6. Associative Regressive Decision Rule Mining for Predicting Customer Satisfactory Patterns

    Directory of Open Access Journals (Sweden)

    P. Suresh

    2016-04-01

    Full Text Available Opinion mining also known as sentiment analysis, involves cust omer satisfactory patterns, sentiments and attitudes toward entities, products, service s and their attributes. With the rapid development in the field of Internet, potential customer’s provi des a satisfactory level of product/service reviews. The high volume of customer rev iews were developed for product/review through taxonomy-aware processing but, it was di fficult to identify the best reviews. In this paper, an Associative Regression Decisio n Rule Mining (ARDRM technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decisi on Rule Mining performs two- steps for improving the customer satisfactory level. Initial ly, the Machine Learning Bayes Sentiment Classifier (MLBSC is used to classify the cla ss labels for each service reviews. After that, Regressive factor of the opinion words and Class labels w ere checked for Association between the words by using various probabilistic rules. Based on t he probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review com ments. The Associative Regressive Decision Rule helps the service provider to take decision on imp roving the customer satisfactory level. The experimental results reveal that the Associ ative Regression Decision Rule Mining (ARDRM technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time a nd Review Detection Accuracy of similar pattern.

  7. An Algorithm for Mining Multidimensional Fuzzy Association Rules

    CERN Document Server

    Khare, Neelu; Pardasani, K R

    2009-01-01

    Multidimensional association rule mining searches for interesting relationship among the values from different dimensions or attributes in a relational database. In this method the correlation is among set of dimensions i.e., the items forming a rule come from different dimensions. Therefore each dimension should be partitioned at the fuzzy set level. This paper proposes a new algorithm for generating multidimensional association rules by utilizing fuzzy sets. A database consisting of fuzzy transactions, the Apriory property is employed to prune the useless candidates, itemsets.

  8. Konstruksi Bayesian Network Dengan Algoritma Bayesian Association Rule Mining Network

    OpenAIRE

    Octavian

    2015-01-01

    Beberapa tahun terakhir, Bayesian Network telah menjadi konsep yang populer digunakan dalam berbagai bidang kehidupan seperti dalam pengambilan sebuah keputusan dan menentukan peluang suatu kejadian dapat terjadi. Sayangnya, pengkonstruksian struktur dari Bayesian Network itu sendiri bukanlah hal yang sederhana. Oleh sebab itu, penelitian ini mencoba memperkenalkan algoritma Bayesian Association Rule Mining Network untuk memudahkan kita dalam mengkonstruksi Bayesian Network berdasarkan data ...

  9. Secure Association Rule Mining for Distributed Level Hierarchy in Web

    Directory of Open Access Journals (Sweden)

    Gulshan Shrivastava,

    2011-06-01

    Full Text Available Data mining technology can analyze massive data and it play very important role in many domains, if it used improperly it can also cause some new problem of information security. Thus severalprivacy preserving techniques for association rule mining have also been proposed in the past few years. Various algorithms have been developed for centralized data, while others refer to distributed data scenario. Distributed data Scenarios can also be classified as heterogeneous distributed data and homogenous distributed data and we identify that distributed data could be partitioned as horizontal partition (a.k.a. homogeneous distribution and vertical partition (a.k.a. heterogeneous distribution. In this paper, we propose an algorithm for secure association rule mining for vertical partition.

  10. Fast rule-based bioactivity prediction using associative classification mining

    Directory of Open Access Journals (Sweden)

    Yu Pulan

    2012-11-01

    Full Text Available Abstract Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM, which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR, classification based on multiple association rules (CMAR and classification based on association rules (CBA are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB, mutagenicity and hERG (the human Ether-a-go-go-Related Gene blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM methods, and produce highly interpretable models.

  11. Classification approach based on association rules mining for unbalanced data

    CERN Document Server

    Ndour, Cheikh

    2012-01-01

    This paper deals with the supervised classification when the response variable is binary and its class distribution is unbalanced. In such situation, it is not possible to build a powerful classifier by using standard methods such as logistic regression, classification tree, discriminant analysis, etc. To overcome this short-coming of these methods that provide classifiers with low sensibility, we tackled the classification problem here through an approach based on the association rules learning because this approach has the advantage of allowing the identification of the patterns that are well correlated with the target class. Association rules learning is a well known method in the area of data-mining. It is used when dealing with large database for unsupervised discovery of local patterns that expresses hidden relationships between variables. In considering association rules from a supervised learning point of view, a relevant set of weak classifiers is obtained from which one derives a classification rule...

  12. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

    Institute of Scientific and Technical Information of China (English)

    Daniel Kunkle; Donghui Zhang; Gene Cooperman

    2008-01-01

    This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a fiat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR

  13. A New Parallel Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    DING Yan-hui; WANG Hong-guo; GAO Ming; GU Jian-jun

    2006-01-01

    Mining association rules from large database is very costly.We develop a parallel algorithm for this task on sharedmemory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to scan the database at least two times. In this article, a parallel algorithm Scan Once (SO) has been proposed for SMP,which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm Count Distribution (CD). It adopts bit matrix to store the database information and gets the support of the frequent itemsets by adopting Vector-And-Operation, which greatly improve the efficiency of generating all frequent itemsets.Empirical evaluation shows that the algorithm outperforms the known one CD algorithm.

  14. Feasibility study for banking loan using association rule mining classifier

    Directory of Open Access Journals (Sweden)

    Agus Sasmito Aribowo

    2015-03-01

    Full Text Available The problem of bad loans in the koperasi can be reduced if the koperasi can detect whether member can complete the mortgage debt or decline. The method used for identify characteristic patterns of prospective lenders in this study, called Association Rule Mining Classifier. Pattern of credit member will be converted into knowledge and used to classify other creditors. Classification process would separate creditors into two groups: good credit and bad credit groups. Research using prototyping for implementing the design into an application using programming language and development tool. The process of association rule mining using Weighted Itemset Tidset (WIT–tree methods. The results shown that the method can predict the prospective customer credit. Training data set using 120 customers who already know their credit history. Data test used 61 customers who apply for credit. The results concluded that 42 customers will be paying off their loans and 19 clients are decline

  15. Parametric Rough Sets with Application to Granular Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Xu He

    2013-01-01

    Full Text Available Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. The algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable.

  16. AN INCREMENTAL UPDATING ALGORITHM FOR MINING ASSOCIATION RULES

    Institute of Scientific and Technical Information of China (English)

    Xu Baowen; Yi Tong; Wu Fangjun; Chen Zhenqiang

    2002-01-01

    In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.

  17. Efficient Data Mining in SAMS through Association Rule

    Directory of Open Access Journals (Sweden)

    Mr. Rahul B. Diwate

    2014-05-01

    Full Text Available We propose a protocol for secure mining of association rules in distributed databases. Previous techniques all people deals with different database, now a day’s people also deals with the distributed database. Can we develop a kind of application in which the people can access the distributed data which is already store in remote location in encrypted format? This proposes system technique is used for efficient data mining in SAMS (Student Assessment Management System through association rules in distributed databases. The current leading techniques are that of Kantarcioglu and Clifton. This proposed system is ready to implements two methods, one that computes the union of private subsets that each of the interacting users hold, and another that tests the inclusion of an element held by one user in a subset held by another .We propose a protocol for secure mining through association rule consist a different level of execution process to secure storage of data and access of data. This paper will focus on such process for secure storage plus secure access of data

  18. COLLABORATIVE NETWORK SECURITY MANAGEMENT SYSTEM BASED ON ASSOCIATION MINING RULE

    Directory of Open Access Journals (Sweden)

    Nisha Mariam Varughese

    2014-07-01

    Full Text Available Security is one of the major challenges in open network. There are so many types of attacks which follow fixed patterns or frequently change their patterns. It is difficult to find the malicious attack which does not have any fixed patterns. The Distributed Denial of Service (DDoS attacks like Botnets are used to slow down the system performance. To address such problems Collaborative Network Security Management System (CNSMS is proposed along with the association mining rule. CNSMS system is consists of collaborative Unified Threat Management (UTM, cloud based security centre and traffic prober. The traffic prober captures the internet traffic and given to the collaborative UTM. Traffic is analysed by the Collaborative UTM, to determine whether it contains any malicious attack or not. If any security event occurs, it will reports to the cloud based security centre. The security centre generates security rules based on association mining rule and distributes to the network. The cloud based security centre is used to store the huge amount of tragic, their logs and the security rule generated. The feedback is evaluated and the invalid rules are eliminated to improve the system efficiency.

  19. Study on the Customer targeting using Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Surendiran.R

    2010-10-01

    Full Text Available Data mining is one of the widest area where many researches takes place to mine desired and hidden data. There are many different approaches to find the hidden data. This paper deals with Frequent Pattern growth algorithm which follows association rule concept togroup the required data items. Using this method of mining time can be reduced to a greater extent. This paper contains implementation of a real time system; the implementation is about making a survey on the group of people and their mobile connection’s service providers.End result contains the set of people from a particular age group with their support and confidence for the service provider they have chosen. Based on which any decisions can be made by service providers to enhance their business and attain many customers.

  20. Penguins Search Optimisation Algorithm for Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Youcef Gheraibia

    2016-06-01

    Full Text Available Association Rules Mining (ARM is one of the most popular and well-known approaches for the decision-making process. All existing ARM algorithms are time consuming and generate a very large number of association rules with high overlapping. To deal with this issue, we propose a new ARM approach based on penguins search optimization algorithm (Pe-ARM for short. Moreover, an efficient measure is incorporated into the main process to evaluate the amount of overlapping among the generated rules. The proposed approach also ensures a good diversification over the whole solutions space. To demonstrate the effectiveness of the proposed approach, several experiments have been carried out on different datasets and specifically on the biological ones. The results reveal that the proposed approach outperforms the well-known ARM algorithms in both execution time and solution quality.

  1. An Algorithm of Association Rule Mining for Microbial Energy Prospection

    Science.gov (United States)

    Shaheen, Muhammad; Shahbaz, Muhammad

    2017-01-01

    The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846

  2. A partition enhanced mining algorithm for distributed association rule mining systems

    Directory of Open Access Journals (Sweden)

    A.O. Ogunde

    2015-11-01

    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  3. Analysis of Electric Power System Using Data Mining Association Rule

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jun Sub; Kim, Min Soo; Choi, Sang Yule; Kim, Chul Whan; Kim, Ung Mo [Skungkyunkwan University (Korea)

    2001-07-01

    Data Mining is a issue of Database fields. Data mining is discovered optimally interesting rules for user, which are results of specific requirement of user, through past data. Through to analyze and to statical suppose interesting rules, we can prepare future faults of system. In this paper, we present a new way which is discovered and repaired faults of Electric Power system using Data Mining techniques. (author). 15 refs., 4 figs., 1 tab.

  4. ASSOCIATION RULES IN HORIZONTALLY DISTRIBUTED DATABASES WITH ENHANCED SECURE MINING

    Directory of Open Access Journals (Sweden)

    Sonal Patil

    2015-10-01

    Full Text Available Recent developments in information technology have made possible the collection and analysis of millions of transactions containing personal data. These data include shopping habits, criminal records, medical histories and credit records among others. In the term of distributed database, distributed database is a database in which storage devices are not all attached to a common processing unit such as the CPU controlled by a distributed database management system (together sometimes called a distributed database system. It may be stored in multiple computers located in the same physical location or may be dispersed over a network of interconnected computers. A protocol has been proposed for secure mining of association rules in horizontally distributed databases. This protocol is optimized than the Fast Distributed Mining (FDM algorithm which is an unsecured distributed version of the Apriori algorithm. The main purpose of this protocol is to remove the problem of mining generalized association rules that affects the existing system. This protocol offers more enhanced privacy with respect to previous protocols. In addition it is simpler and is optimized in terms of communication rounds, communication cost and computational cost than other protocols.

  5. 关联规则挖掘研究述评%Association Rule Mining: A Survey

    Institute of Scientific and Technical Information of China (English)

    贾彩燕; 倪现君

    2003-01-01

    Association rule mining has been one of the most popular data mining subejcts and has a wide range of applicability. In this paper, we first investigate the main approaches for the task of association rule mining, and analyzed the essence of the algorithms. Then we review foundations of assocation rule mining based on the several possible theoretical frameworks for data mining. What's more,we show the open problems in field of the association rule mining and figure out the tendency of its development in recent years.

  6. An Efficient Approach to Prune Mined Association Rules in Large Databases

    Directory of Open Access Journals (Sweden)

    D. Narmadha

    2011-01-01

    Full Text Available Association rule mining finds interesting associations and/or correlation relationships among large set of data items. However, when the number of association rules become large, it becomes less interesting to the user. It is crucial to help the decision-maker with an efficient postprocessing step in order to select interesting association rules throughout huge volumes of discovered rules. This motivates the need for association analysis. Thus, this paper presents a novel approach to prune mined association rules in large databases. Further, an analysis of different association rule mining techniques for market basket analysis, highlighting strengths of different association rule mining techniques are also discussed. We want to point out potential pitfalls as well as challenging issues need to be addressed by an association rule mining technique. We believe that the results of this approach will help decision maker for making important decisions.

  7. AN INCREMENTAL UPDATING ALGORITHM FOR MINING ASSOCIATION RULES

    Institute of Scientific and Technical Information of China (English)

    XuBaowen; YiTong; 等

    2002-01-01

    In this letter,on the basis of Frequent Pattern(FP) tree,the support function to update FP-tree is introduced,then an incremental FP(IFP) algorithm for mining association rules is proposed.IFP algorithm considers not only adding new data into the database but also reducing old data from the database.Furthermore,it can predigest five cases to three case .The algorithm proposed in this letter can avoid generating lots of candidate items,and it is high efficient.

  8. Prediction of users webpage access behaviour using association rule mining

    Indian Academy of Sciences (India)

    R Geetharamani; P Revathy; Shomona G Jacob

    2015-12-01

    Web Usage mining is a technique used to identify the user needs from the web log. Discovering hidden patterns from the logs is an upcoming research area. Association rules play an important role in many web mining applications to detect interesting patterns. However, it generates enormous rules that cause researchers to spend ample time and expertise to discover the really interesting ones. This paper works on the server logs from the MSNBC dataset for the month of September 1999. This research aims at predicting the probable subsequent page in the usage of web pages listed in this data based on their navigating behaviour by using Apriori prefix tree (PT) algorithm. The generated rules were ranked based on the support, confidence and lift evaluation measures. The final predictions revealed that the interestingness of pages mainly depended on the support and lift measure whereas confidence assumed a uniform value among all the pages. It proved that the system guaranteed 100% confidence with the support of 1.3E−05. It revealed that the pages such as Front page, On-air, News, Sports and BBS attracted more interested subsequent users compared to Travel, MSN-News and MSN-Sports which were of less interest.

  9. Integrated Web Recommendation Model with Improved Weighted Association Rule Mining

    Directory of Open Access Journals (Sweden)

    S.A.Sahaaya Arul Mary

    2013-04-01

    Full Text Available World Wide Web plays a significant role in human life. It requires a technological improvement to satisfy the user needs. Web log data is essential for improving the performance of the web. It contains large,heterogeneous and diverse data. Analyzing g the web log data is a tedious process for Web developers, Web designers, technologists and end users. In this work, a new weighted association mining algorithm is developed to identify the best association rules that are useful for web site restructuring and recommendation that reduces false visit and improve users’ navigation behavior. The algorithm finds the frequent item set from a large uncertain database. Frequent scanning of database in each time is the problem with the existing algorithms which leads to complex output set and time consuming process. Theproposed algorithm scans the database only once at the beginning of the process and the generated frequent item sets, which are stored into the database. The evaluation parameters such as support, confidence, lift and number of rules are considered to analyze the performance of proposed algorithm and traditional association mining algorithm. The new algorithm produced best result that helps the developer to restructure their website in a way to meet the requirements of the end user within short time span.

  10. Mining Association Rules in Dengue Gene Sequence with Latent Periodicity

    Directory of Open Access Journals (Sweden)

    Marimuthu Thangam

    2015-01-01

    Full Text Available The mining of periodic patterns in dengue database is an interesting research problem that can be used for predicting the future evolution of dengue viruses. In this paper, we propose an algorithm called Recurrence Finder (RECFIN that uses the suffix tree for detecting the periodic patterns of dengue gene sequence. Also, the RECFIN finds the presence of palindrome which indicates the possibilities of formation of proteins. Further, this paper computes the periodicity of nucleic acid and amino acid sequences of any length. The periodicity based association rules are used to diagnose the type of dengue. The time complexity of the proposed algorithm is O(n2. We demonstrate the effectiveness of the proposed approach by comparing the experimental results performed on dengue virus serotypes dataset with NCBI-BLAST algorithm.

  11. A SURVEY ON PRIVACY PRESERVING ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    K.Sathiyapriya

    2013-03-01

    Full Text Available Businesses share data, outsourcing for specific business problems. Large companies stake a large part of their business on analysis of private data. Consulting firms often handle sensitive third party data as part of client projects. Organizations face great risks while sharing their data. Most of this sharing takes place with little secrecy. It also increases the legal responsibility of the parties involved in the process. So, it is crucial to reliably protect their data due to legal and customer concerns. In this paper, a review of the state-of-the-art methods for privacy preservation is presented. It also analyzes the techniques for privacy preserving association rule mining and points out their merits and demerits. Finally the challenges and directions for future research are discussed.

  12. A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

    Directory of Open Access Journals (Sweden)

    Ms. Sanober Shaikh

    2011-09-01

    Full Text Available In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set. The defined algorithm scans the database at the start only once and then makes the undirected item set graph. From this graph by considering minimum support it finds the frequent item set and by considering the minimum confidence it generates the association rule. If database and minimum support is changed, the new algorithm finds the new frequent items by scanning undirected item set graph. That is why it’s executing efficiency is improved distinctly compared to traditional algorithm.

  13. SQL Based Association Rule Mining%基于SQL的关联规则挖掘

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    Data mining is becoming increasingly important since the size of database grows even larger and the need to explore hidden rules from the database becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will definitely ease implementation of data mining. In this paper ,we introduce an association rule mining algorithm based on Apriori and the implementation using SQL. At the end of the paper ,we summarize the paper.

  14. Association Rule Hiding Techniques for Privacy Preserving Data Mining: A Study

    Directory of Open Access Journals (Sweden)

    Gayathiri P

    2015-12-01

    Full Text Available Association rule mining is an efficient data mining technique that recognizes the frequent items and associative rule based on a market basket data analysis for large set of transactional databases. The probability of most frequent data item occurrence of the transactional data items are calculated to present the associative rule that represents the habits of buying products of the customers in demand. Identifying associative rules of a transactional database in data mining may expose the confidentiality and privacy of an organization and individual. Privacy Preserving Data Mining (PPDM is a solution for privacy threats in data mining. This issue is solved using Association Rule Hiding (ARH techniques in Privacy Preserving Data Mining (PPDM. This research work on Association Rule Hiding technique in data mining performs the generation of sensitive association rules by the way of hiding based on the transactional data items. The property of hiding rules not the data makes the sensitive rule hiding process is a minimal side effects and higher data utility technique.

  15. Action Rules Mining

    CERN Document Server

    Dardzinska, Agnieszka

    2013-01-01

    We are surrounded by data, numerical, categorical and otherwise, which must to be analyzed and processed to convert it into information that instructs, answers or aids understanding and decision making. Data analysts in many disciplines such as business, education or medicine, are frequently asked to analyze new data sets which are often composed of numerous tables possessing different properties. They try to find completely new correlations between attributes and show new possibilities for users.   Action rules mining discusses some of data mining and knowledge discovery principles and then describe representative concepts, methods and algorithms connected with action. The author introduces the formal definition of action rule, notion of a simple association action rule and a representative action rule, the cost of association action rule, and gives a strategy how to construct simple association action rules of a lowest cost. A new approach for generating action rules from datasets with numerical attributes...

  16. Validity of association rules extracted by healthcare-data-mining.

    Science.gov (United States)

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  17. Research on Algorithm for Mining Negative Association Rules Based on Frequent Pattern Tree

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP-tree is presented to discover negative association rules.

  18. WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    Pratiyush Guleria

    2015-11-01

    Full Text Available This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based tool developed using Asp.Net framework and php can be helpful for universities or institutions providing the students with elective courses as well improving academic activities based on feedback collected from students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from students and based on that Feedback it shows performance of faculty and institution. Using that data, it helps management to improve in-house training skills and gains knowledge about educational trends which is to be followed by faculty to improve the effectiveness of the course and teaching skills.

  19. Reduction of Negative and Positive Association Rule Mining and Maintain Superiority of Rule Using Modified Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Nikhil Jain,Vishal Sharma,Mahesh Malviya

    2012-12-01

    Full Text Available Association rule mining play important rule inmarket data analysis and also in medical diagnosisof correlated problem. For the generation ofassociation rule mining various technique are usedsuch as Apriori algorithm, FP-growth and treebased algorithm. Some algorithms are wonderperformance but generate negative association ruleand also suffered from Superiority measureproblem. In this paper we proposed a multi-objectiveassociation rule mining based on genetic algorithmand Euclidean distance formula. In this method wefind the near distance of rule set using Euclideandistance formula and generate two class higherclass and lower class .the validate of class check bydistance weight vector. Basically distance weightvector maintain a threshold value of rule itemsets.In whole process we used genetic algorithm foroptimization of rule set. Here we set population sizeis 1000 and selection process validate by distanceweight vector. Our proposed algorithm distanceweight optimization of association rule mining withgenetic algorithm compared with multi-objectiveassociation rule optimization using geneticalgorithm. Our proposed algorithm is better rule setgeneration instead of MORA method.

  20. A Frequent Closed Itemsets Lattice-based Approach for Mining Minimal Non-Redundant Association Rules

    CERN Document Server

    Vo, Bay

    2011-01-01

    There are many algorithms developed for improvement the time of mining frequent itemsets (FI) or frequent closed itemsets (FCI). However, the algorithms which deal with the time of generating association rules were not put in deep research. In reality, in case of a database containing many FI/FCI (from ten thousands up to millions), the time of generating association rules is much larger than that of mining FI/FCI. Therefore, this paper presents an application of frequent closed itemsets lattice (FCIL) for mining minimal non-redundant association rules (MNAR) to reduce a lot of time for generating rules. Firstly, we use CHARM-L for building FCIL. After that, based on FCIL, an algorithm for fast generating MNAR will be proposed. Experimental results show that the proposed algorithm is much faster than frequent itemsets lattice-based algorithm in the mining time.

  1. EOQ estimation for imperfect quality items using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-09-01

    Full Text Available Timely identification of newly emerging trends is needed in business process. Data mining techniques like clustering, association rule mining, classification, etc. are very important for business support and decision making. This paper presents a method for redesigning the ordering policy by including cross-selling effect. Initially, association rules are mined on the transactional database and EOQ is estimated with revenue earned. Then, transactions are clustered to obtain homogeneous clusters and association rules are mined in each cluster to estimate EOQ with revenue earned for each cluster. Further, this paper compares ordering policy for imperfect quality items which is developed by applying rules derived from apriori algorithm viz. a without clustering the transactions, and b after clustering the transactions. A numerical example is illustrated to validate the results.

  2. Mining of the quantitative association rules with standard SQL queries and its evaluation

    Institute of Scientific and Technical Information of China (English)

    孙海洪; 唐菁; 蒋洪; 杨炳儒

    2004-01-01

    A new algorithm for mining quantitative association rules with standard SQL is presented. The association rules are evaluated with the sufficiency gene LS of subjectivity Bayes reasoning. This algorithm is proved to be quick and effective with its application in Lujiang insects and pests database.

  3. Cross-Ontology multi-level association rule mining in the Gene Ontology.

    Directory of Open Access Journals (Sweden)

    Prashanti Manda

    Full Text Available The Gene Ontology (GO has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  4. Association Rule Mining for Both Frequent and Infrequent Items Using Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    MIR MD. JAHANGIR KABIR

    2014-07-01

    Full Text Available In data mining research, generating frequent items from large databases is one of the important issues and the key factor for implementing association rule mining tasks. Mining infrequent items such as relationships among rare but expensive products is another demanding issue which have been shown in some recent studies. Therefore this study considers user assigned threshold values as a constraint which helps users mine those rules which are more interesting for them. In addition, in real world users may prefer to know relationships among frequent items along with infrequent ones. The particle swarm optimization algorithm is an important heuristic technique in recent years and this study uses this technique to mine association rules effectively. If this technique considers user defined threshold values, interesting association rules can be generated more efficiently. Therefore this study proposes a novel approach which includes using particle swarm optimization algorithm to mine association rules from databases. Our implementation of the search strategy includes bitmap representation of nodes in a lexicographic tree and from superset-subset relationship of the nodes it classifies frequent items along with infrequent itemsets. In addition, this approach avoids extra calculation overhead for generating frequent pattern trees and handling large memory which store the support values of candidate item sets. Our experimental results show that this approach efficiently mines association rules. It accesses a database to calculate a support value for fewer numbers of nodes to find frequent itemsets and from that it generates association rules, which dramatically reduces search time. The main aim of this proposed algorithm is to show how heuristic method works on real databases to find all the interesting association rules in an efficient way.

  5. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    OpenAIRE

    Rajendran, P.; M.Madheswaran

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper....

  6. Causal association rule mining methods based on fuzzy state description

    Institute of Scientific and Technical Information of China (English)

    Liang Kaijian; Liang Quan; Yang Bingru

    2006-01-01

    Aiming at the research that using more new knowledge to develope knowledge system with dynamic accordance, and under the background of using Fuzzy language field and Fuzzy language values structure as description framework, the generalized cell Automation that can synthetically process fuzzy indeterminacy and random indeterminacy and generalized inductive logic causal model is brought forward. On this basis, a kind of the new method that can discover causal association rules is provded. According to the causal information of standard sample space and commonly sample space,through constructing its state (abnormality) relation matrix, causal association rules can be gained by using inductive reasoning mechanism. The estimate of this algorithm complexity is given,and its validity is proved through case.

  7. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    Directory of Open Access Journals (Sweden)

    Vishwadeepak Singh Baghela

    2012-05-01

    Full Text Available A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mining deals with structured data (for example relational databases, whereas text presents special characteristics and is unstructured. The unstructured data is totally different from databases, where mining techniques are usually applied and structured data is managed. Text mining can work with unstructured or semi-structured data sets A brief review of some recent researches related to mining associations from text documents is presented in this paper.

  8. Mining Interesting Positive and Negative Association Rule Based on Improved Genetic Algorithm (MIPNAR_GA

    Directory of Open Access Journals (Sweden)

    Nikky Suryawanshi Rai

    2014-01-01

    Full Text Available Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the mining of positive and negative rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. A number of algorithms are wonder performance but produce large number of negative association rule and also suffered from multi-scan problem. The idea of this paper is to eliminate these problems and reduce large number of negative rules. Hence we proposed an improved approach to mine interesting positive and negative rules based on genetic and MLMS algorithm. In this method we used a multi-level multiple support of data table as 0 and 1. The divided process reduces the scanning time of database. The proposed algorithm is a combination of MLMS and genetic algorithm. This paper proposed a new algorithm (MIPNAR_GA for mining interesting positive and negative rule from frequent and infrequent pattern sets. The algorithm is accomplished in to three phases: a.Extract frequent and infrequent pattern sets by using apriori method b.Efficiently generate positive and negative rule. c.Prune redundant rule by applying interesting measures. The process of rule optimization is performed by genetic algorithm and for evaluation of algorithm conducted the real world dataset such as heart disease data and some standard data used from UCI machine learning repository.

  9. A NOVEL SIMILARITY ASSESSMENT FOR REMOTE SENSING IMAGES VIA FAST ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    J. Liu

    2016-06-01

    Full Text Available Similarity assessment is the fundamentally important to various remote sensing applications such as image classification, image retrieval and so on. The objective of similarity assessment is to automatically distinguish differences between images and identify the contents of an image. Unlike the existing feature-based or object-based methods, we concern more about the deep level pattern of image content. The association rule mining is capable to find out the potential patterns of image, hence in this paper, a fast association rule mining algorithm is proposed and the similarity is represented by rules. More specifically, the proposed approach consist of the following steps: firstly, the gray level of image is compressed using linear segmentation to avoid interference of details and reduce the computation amount; then the compressed gray values between pixels are collected to generate the transaction sets which are transformed into the proposed multi-dimension data cube structure; the association rules are then fast mined based on multi-dimension data cube; finally the mined rules are represented as a vector and similarity assessment is achieved by vector comparison using first order approximation of Kullback-Leibler divergence. Experimental results indicate that the proposed fast association rule mining algorithm is more effective than the widely used Apriori method. The remote sensing image retrieval experiments using various images for example, QuickBird, WorldView-2, based on the existing and proposed similarity assessment show that the proposed method can provide higher retrieval precision.

  10. a Novel Similarity Assessment for Remote Sensing Images via Fast Association Rule Mining

    Science.gov (United States)

    Liu, Jun; Chen, Kai; Liu, Ping; Qian, Jing; Chen, Huijuan

    2016-06-01

    Similarity assessment is the fundamentally important to various remote sensing applications such as image classification, image retrieval and so on. The objective of similarity assessment is to automatically distinguish differences between images and identify the contents of an image. Unlike the existing feature-based or object-based methods, we concern more about the deep level pattern of image content. The association rule mining is capable to find out the potential patterns of image, hence in this paper, a fast association rule mining algorithm is proposed and the similarity is represented by rules. More specifically, the proposed approach consist of the following steps: firstly, the gray level of image is compressed using linear segmentation to avoid interference of details and reduce the computation amount; then the compressed gray values between pixels are collected to generate the transaction sets which are transformed into the proposed multi-dimension data cube structure; the association rules are then fast mined based on multi-dimension data cube; finally the mined rules are represented as a vector and similarity assessment is achieved by vector comparison using first order approximation of Kullback-Leibler divergence. Experimental results indicate that the proposed fast association rule mining algorithm is more effective than the widely used Apriori method. The remote sensing image retrieval experiments using various images for example, QuickBird, WorldView-2, based on the existing and proposed similarity assessment show that the proposed method can provide higher retrieval precision.

  11. Gain ratio based fuzzy weighted association rule mining classifier for medical diagnostic interface

    Indian Academy of Sciences (India)

    N S Nithya; K Duraiswamy

    2014-02-01

    The health care environment still needs knowledge based discovery for handling wealth of data. Extraction of the potential causes of the diseases is the most important factor for medical data mining. Fuzzy association rule mining is wellperformed better than traditional classifiers but it suffers from the exponential growth of the rules produced. In the past, we have proposed an information gain based fuzzy association rule mining algorithm for extracting both association rules and membership functions of medical data to reduce the rules. It used a ranking based weight value to identify the potential attribute. When we take a large number of distinct values, the computation of information gain value is not feasible. In this paper, an enhanced approach, called gain ratio based fuzzy weighted association rule mining, is thus proposed for distinct diseases and also increase the learning time of the previous one. Experimental results show that there is a marginal improvement in the attribute selection process and also improvement in the classifier accuracy. The system has been implemented in Java platform and verified by using benchmark data from the UCI machine learning repository.

  12. Stellar spectra association rule mining method based on the weighted frequent pattern tree

    Institute of Scientific and Technical Information of China (English)

    Jiang-Hui Cai; Xu-Jun Zhao; Shi-Wei Sun; Ji-Fu Zhang; Hai-Feng Yang

    2013-01-01

    Effective extraction of data association rules can provide a reliable basis for classification of stellar spectra.The concept of stellar spectrum weighted itemsets and stellar spectrum weighted association rules are introduced,and the weight of a single property in the stellar spectrum is determined by information entropy.On that basis,a method is presented to mine the association rules of a stellar spectrum based on the weighted frequent pattern tree.Important properties of the spectral line are highlighted using this method.At the same time,the waveform of the whole spectrum is taken into account.The experimental results show that the data association rules of a stellar spectrum mined with this method are consistent with the main features of stellar spectral types.

  13. Spatio-Temporal Rule Mining

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2005-01-01

    Recent advances in communication and information technology, such as the increasing accuracy of GPS technology and the miniaturization of wireless communication devices pave the road for Location-Based Services (LBS). To achieve high quality for such services, spatio-temporal data mining techniques...... are needed. In this paper, we describe experiences with spatio-temporal rule mining in a Danish data mining company. First, a number of real world spatio-temporal data sets are described, leading to a taxonomy of spatio-temporal data. Second, the paper describes a general methodology that transforms...... the spatio-temporal rule mining task to the traditional market basket analysis task and applies it to the described data sets, enabling traditional association rule mining methods to discover spatio-temporal rules for LBS. Finally, unique issues in spatio-temporal rule mining are identified and discussed....

  14. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  15. Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities

    Directory of Open Access Journals (Sweden)

    C. Usha Rani

    2013-09-01

    Full Text Available Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. The three main weaknesses of current data- mining techniques are: 1 rescanning of the entire database must be done whenever new attributes are added because current methods are based on flat mining using predefined schemata. 2 An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. This may result in loss of important association rules. 3 Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. This paper also presents a benchmark which is used to compare the level of efficiency and effectiveness of the proposed algorithm against other known methods. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.

  16. Improving Intrusion Detection System Based on Snort Rules for Network Probe Attacks Detection with Association Rules Technique of Data Mining

    Directory of Open Access Journals (Sweden)

    Nattawat Khamphakdee

    2015-07-01

    Full Text Available The intrusion detection system (IDS is an important network security tool for securing computer and network systems. It is able to detect and monitor network traffic data. Snort IDS is an open-source network security tool. It can search and match rules with network traffic data in order to detect attacks, and generate an alert. However, the Snort IDS  can detect only known attacks. Therefore, we have proposed a procedure for improving Snort IDS rules, based on the association rules data mining technique for detection of network probe attacks.  We employed the MIT-DARPA 1999 data set for the experimental evaluation. Since behavior pattern traffic data are both normal and abnormal, the abnormal behavior data is detected by way of the Snort IDS. The experimental results showed that the proposed Snort IDS rules, based on data mining detection of network probe attacks, proved more efficient than the original Snort IDS rules, as well as icmp.rules and icmp-info.rules of Snort IDS.  The suitable parameters for the proposed Snort IDS rules are defined as follows: Min_sup set to 10%, and Min_conf set to 100%, and through the application of eight variable attributes. As more suitable parameters are applied, higher accuracy is achieved.

  17. A Hybrid Approach to Privacy Preserving in Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Narges Jamshidian Ghalehsefidi

    Full Text Available Nowadays, data mining is a useful, yet dangerous technology through which useful information and the relationships between items in a database are detected. Today, companies and users need to share information with others for their progress and they shoul ...

  18. The Books Recommend Service System Based on Improved Algorithm for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    王萍

    2009-01-01

    The Apriori algorithm is a classical method of association rules mining. Based on analysis of this theory, the paper provides an improved Apriori algorithm. The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to en-hance the usage efficiency of resources as well as the individualized service of the data library.

  19. A New Hybrid Algorithm for Association Rule Mining

    Institute of Scientific and Technical Information of China (English)

    ZHANG Min-cong; YAN Cun-liang; ZHU Kai-yu

    2007-01-01

    HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ItemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (direct-addressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.

  20. Associative Regressive Decision Rule Mining for Predicting Customer Satisfactory Patterns

    OpenAIRE

    Suresh, P

    2016-01-01

    Opinion mining also known as sentiment analysis, involves cust omer satisfactory patterns, sentiments and attitudes toward entities, products, service s and their attributes. With the rapid development in the field of Internet, potential customer’s provi des a satisfactory level of product/service reviews. The high volume of customer rev iews were developed for product/review through taxonomy-aware processing bu...

  1. Multi-objective Genetic Algorithm for Association Rule Mining Using a Homogeneous Dedicated Cluster of Workstations

    Directory of Open Access Journals (Sweden)

    S. Dehuri

    2006-01-01

    Full Text Available This study presents a fast and scalable multi-objective association rule mining technique using genetic algorithm from large database. The objective functions such as confidence factor, comprehensibility and interestingness can be thought of as different objectives of our association rule-mining problem and is treated as the basic input to the genetic algorithm. The outcomes of our algorithm are the set of non-dominated solutions. However, in data mining the quantity of data is growing rapidly both in size and dimensions. Furthermore, the multi-objective genetic algorithm (MOGA tends to be slow in comparison with most classical rule mining methods. Hence, to overcome these difficulties we propose a fast and scalability technique using the inherent parallel processing nature of genetic algorithm and a homogeneous dedicated network of workstations (NOWs. Our algorithm exploit both data and control parallelism by distributing the data being mined and the population of individuals across all available processors. The experimental result shows that the algorithm has been found suitable for large database with an encouraging speed up.

  2. A Review of Protein-DNA Binding Motif using Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Virendra Kumar Tripathi

    2013-03-01

    Full Text Available The survival of gene regulation and life mechanisms is pre-request of finding unknown pattern of transcription factor binding sites. The discovery motif of gene regulation in bioinformatics is challenging jobs for getting relation between transcription factors and transcription factor binding sites. The increasing size and length of string pattern of motif is issued a problem related to modeling and optimization of gene selection process. In this paper we give a survey of protein-DNA binding using association rule mining. Association rule mining well known data mining technique for pattern analysis. The capability of negative and positive pattern generation help full for discovering of new pattern in DNA binding bioinformatics data. The other data mining approach such as clustering and classification also applied the process of gene selection grouping for known and unknown pattern. But faced a problem of valid string of DNA data, the rule mining principle find a better relation between transcription factors and transcription factor binding sites.

  3. A New Approach of Using Association Rule Mining in Customer Complaint Management

    Directory of Open Access Journals (Sweden)

    Behrouz Minaei-Bidgoli

    2010-09-01

    Full Text Available A new approach of using data mining tools for customer complaint management is presented in this paper. The association rule mining technique is applied to discover the relationship between different groups of citizens and different kinds of complainers. The data refers to citizens' complaints from the performance of municipality of Tehran, the capital of Iran. Analyzing these rules, make it possible for the municipality managers to find out the causes of complaints, so, it leads to facilitate engineering changes accordingly. The idea of contrast association rules is also applied to identify the attributes characterizing patterns of complaints occurrence among various groups of citizens. The results would enable the municipality to optimize its services.

  4. A Novel Approach for Discovery Quantitative Fuzzy Multi-Level Association Rules Mining Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saad M. Darwish

    2016-10-01

    Full Text Available Quantitative multilevel association rules mining is a central field to realize motivating associations among data components with multiple levels abstractions. The problem of expanding procedures to handle quantitative data has been attracting the attention of many researchers. The algorithms regularly discretize the attribute fields into sharp intervals, and then implement uncomplicated algorithms established for Boolean attributes. Fuzzy association rules mining approaches are intended to defeat such shortcomings based on the fuzzy set theory. Furthermore, most of the current algorithms in the direction of this topic are based on very tiring search methods to govern the ideal support and confidence thresholds that agonize from risky computational cost in searching association rules. To accelerate quantitative multilevel association rules searching and escape the extreme computation, in this paper, we propose a new genetic-based method with significant innovation to determine threshold values for frequent item sets. In this approach, a sophisticated coding method is settled, and the qualified confidence is employed as the fitness function. With the genetic algorithm, a comprehensive search can be achieved and system automation is applied, because our model does not need the user-specified threshold of minimum support. Experiment results indicate that the recommended algorithm can powerfully generate non-redundant fuzzy multilevel association rules.

  5. A LFP-tree based method for association rules mining in telecommunication alarm correlation analysis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The mining of association rules is one of the primary methods used in telecommunication alarm correlation analysis,of which the alarm databases are very large.The efficiency of the algorithms plays an important role in tackling with large datasets. The classical frequent pattern growth(FP-growth) algorithm can produce a large number of conditional pattern trees which made it difficult to mine association rules in are telecommunication environment.In this paper,an algorithm based on layered frequent pattern tree(LFP-tree) is proposed for mining frequent patterns. Efficiency of this alagorithm is achieved with following techniques:1) All the frequent patterns are condensed into a layered structure,which can save memory time but also be very useful for updating the alarm databases.2) Each alarm item can be viewed as a triple,in which t is a Boolean vaviable that shows the item frequent or not.3) Deleting infrequent items with dynamic pruning can avoid produce conditional pattern sets. Simulation and analysis of algorithm show that it is a valid method with better time and space efficiency,which is adapted to mine association rules in telecommunication alarm correlation analysis.

  6. Mining Association Rules to Evade Network Intrusion in Network Audit Data

    Directory of Open Access Journals (Sweden)

    Kamini Nalavade

    2014-06-01

    Full Text Available With the growth of hacking and exploiting tools and invention of new ways of intrusion, intrusion detection and prevention is becoming the major challenge in the world of network security. The increasing network traffic and data on Internet is making this task more demanding. There are various approaches being utilized in intrusion detections, but unfortunately any of the systems so far is not completely flawless. The false positive rates make it extremely hard to analyse and react to attacks. Intrusion detection systems using data mining approaches make it possible to search patterns and rules in large amount of audit data. In this paper, we represent a model to integrate association rules to intrusion detection to design and implement a network intrusion detection system. Our technique is used to generate attack rules that will detect the attacks in network audit data using anomaly detection. This shows that the modified association rules algorithm is capable of detecting network intrusions. The KDD dataset which is freely available online is used for our experimentation and results are compared. Our intrusion detection system using association rule mining is able to generate attack rules that will detect the attacks in network audit data using anomaly detection, while maintaining a low false positive rate.

  7. Design a Weight Based sorting distortion algorithm using Association rule Hiding for Privacy Preserving Data mining

    Directory of Open Access Journals (Sweden)

    R.Sugumar

    2011-12-01

    Full Text Available The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong an efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Weight Based Sorting Distortion (WBSD algorithm is to distort certain data which satisfies a particular sensitive rule. Then hide those transactions which support a sensitive rule and assigns them a priority and sorts them in ascending order according to the priority value of each rule. Then it uses these weights to compute the priority value for each transaction according to how weak the rule is that a transaction supports. Data distortion is one of the important methods to avoid this kind of scalability issues

  8. Studies on Application of Mining Association Rules algorithm in Storage Location Configuration

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    How to reduce in and out motion distance and improve work efficiency is not only the key question of logistics storage & distribution center, but also a primary factor in improving competitive power of enterprise . In view of this question, the method of using mining association rules to resolve the problem of storage location configuration was put forward in this article with the purpose of improving work efficiency.

  9. Efficient Mining of Association Rules by Reducing the Number of Passes over the Database

    Institute of Scientific and Technical Information of China (English)

    李庆忠; 王海洋; 闫中敏; 马绍汉

    2001-01-01

    This paper introduces a new algorithm of mining association rules.The algorithm RP counts the itemsets with different sizes in the same pass of scanning over the database by dividing the database into m partitions. The total number of passes over the database is only (k+2m-2)/m, where k is the longest size in the itemsets. It is much less than k.

  10. Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Noha Negm

    2013-09-01

    Full Text Available The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

  11. An Associate Rules Mining Algorithm Based on Artificial Immune Network for SAR Image Segmentation

    Directory of Open Access Journals (Sweden)

    Mengling Zhao

    2015-01-01

    Full Text Available As a computational intelligence method, artificial immune network (AIN algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new classification algorithm an associate rules mining algorithm based on artificial immune network (ARM-AIN. The new method uses the association rules to represent immune cells and mine the best association rules rather than searching optimal clustering centers. The proposed algorithm has been extensively compared with artificial immune network classification (AINC algorithm, artificial immune network classification algorithm based on self-adaptive PSO (SPSO-AINC, and PSO-AINC over several large-scale data sets, target recognition of remote sensing image, and segmentation of three different SAR images. The result of experiment indicates the superiority of ARM-AIN in classification accuracy and running time.

  12. Association rule mining on grid monitoring data to detect error sources

    CERN Document Server

    Maier, G; Kranzlmueller, D; Gaidioz, B

    2010-01-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability

  13. An Overview of Secure Mining of Association Rules in Horizontally Distributed Databases

    Directory of Open Access Journals (Sweden)

    Sonal Patil

    2015-10-01

    Full Text Available In this paper, propose a protocol for secure mining of association rules in horizontally distributed databases. Now a day the current leading protocol is Kantarcioglu and Clifton. This protocol is based on the Fast Distributed Mining (FDM algorithm which is an unsecured distributed version of the Apriori algorithm. The main ingredients in this protocol are two novel secure multi-party algorithms 1. That computes the union of private subsets that each of the interacting players hold, and 2. Tests the inclusion of an element held by one player in a subset held by another. In this protocol offers enhanced privacy with respect to the other one. Differences in this protocol, it is simpler and is significantly more efficient in terms of communication rounds, communication cost and computational cost [1].

  14. A Business Intelligence Model to Predict Bankruptcy using Financial Domain Ontology with Association Rule Mining Algorithm

    CERN Document Server

    Martin, A; Venkatesan, Dr V Prasanna

    2011-01-01

    Today in every organization financial analysis provides the basis for understanding and evaluating the results of business operations and delivering how well a business is doing. This means that the organizations can control the operational activities primarily related to corporate finance. One way that doing this is by analysis of bankruptcy prediction. This paper develops an ontological model from financial information of an organization by analyzing the Semantics of the financial statement of a business. One of the best bankruptcy prediction models is Altman Z-score model. Altman Z-score method uses financial rations to predict bankruptcy. From the financial ontological model the relation between financial data is discovered by using data mining algorithm. By combining financial domain ontological model with association rule mining algorithm and Zscore model a new business intelligence model is developed to predict the bankruptcy.

  15. Efficient mining of association rules for the early diagnosis of Alzheimer's disease.

    Science.gov (United States)

    Chaves, R; Górriz, J M; Ramírez, J; Illán, I A; Salas-Gonzalez, D; Gómez-Río, M

    2011-09-21

    In this paper, a novel technique based on association rules (ARs) is presented in order to find relations among activated brain areas in single photon emission computed tomography (SPECT) imaging. In this sense, the aim of this work is to discover associations among attributes which characterize the perfusion patterns of normal subjects and to make use of them for the early diagnosis of Alzheimer's disease (AD). Firstly, voxel-as-feature-based activation estimation methods are used to find the tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs serve as input to secondly mine ARs with a minimum support and confidence among activation blocks by using a set of controls. In this context, support and confidence measures are related to the proportion of functional areas which are singularly and mutually activated across the brain. Finally, we perform image classification by comparing the number of ARs verified by each subject under test to a given threshold that depends on the number of previously mined rules. Several classification experiments were carried out in order to evaluate the proposed methods using a SPECT database that consists of 41 controls (NOR) and 56 AD patients labeled by trained physicians. The proposed methods were validated by means of the leave-one-out cross validation strategy, yielding up to 94.87% classification accuracy, thus outperforming recent developed methods for computer aided diagnosis of AD.

  16. ADAPTIVE ASSOCIATION RULE MINING BASED CROSS LAYER INTRUSION DETECTION SYSTEM FOR MANET

    Directory of Open Access Journals (Sweden)

    V. Anjana Devi

    2011-10-01

    Full Text Available Mobile ad-hoc wireless networks (MANET are a significant area of research with many applications.MANETs are more vulnerable to malicious attack. Authentication and encryption techniques can be usedas the first line of defense for reducing the possibilities of attacks. Alternatively, these approaches haveseveral demerits and designed for a set of well known attacks. This paper proposes a cross layer intrusiondetection architecture to discover the malicious nodes and different types of DoS attacks by exploiting theinformation available across different layers of protocol stack in order to improve the accuracy ofdetection. This approach uses a fixed width clustering algorithm for efficient detection of the anomalies inthe MANET traffic and also for detecting newer attacks generated . In the association process, theAdaptive Association Rule mining algorithm is utilized. This helps to overcome the more time taken forperforming the association process.

  17. Adaptive Interval Configuration to Enhance Dynamic Approach for Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    Most proposed algorithms for mining association rules follow the conventional le vel-wise approach. The dynamic candidate generation idea introduced in the dyna mic itemset counting (DIC) a lgorithm broke away from the level-wise limitation which could find the large i t emsets using fewer passes over the database than level-wise algorithms. However , the dynamic approach is very sensitive to the data distribution of the database and it requires a proper interval size. In this paper an optimization technique named adaptive interval configuration (AIC) has been developed to enhance the d y namic approach. The AIC optimization has the following two functions. The first is that a homogeneous distribution of large itemsets over intervals can be achie ved so that less unnecessary candidates could be generated and less database sca nning passes are guaranteed. The second is that the near optimal interval size c ould be determined adaptively to produce the best response time. We also develop ed a candidate pruning technique named virtual partition pruning to reduce the s ize-2 candidate set and incorporated it into the AIC optimization. Based on the optimization technique, we proposed the efficient AIC algorithm for mining asso c iation rules. The algorithms of AIC, DIC and the classic Apriori were implemente d on a Sun Ultra Enterprise 4000 for performance comparison. The results show th at the AIC performed much better than both DIC and Apriori, and showed a strong robustness.

  18. 一种新的关联规则挖掘的模型%A New Model of Mining Association Rules

    Institute of Scientific and Technical Information of China (English)

    苏毅娟; 严小卫

    2001-01-01

    A new algorithm for mining positive and negative association rules is presented. A new confi-dence is constructed to measure the uncertainty of an association rule based on the probability theory and Piatetsky-Shapiro′s model.

  19. Deriving Association between Student’s Comprehension and Facial expressions using Class Association Rule Mining

    Directory of Open Access Journals (Sweden)

    M. Mohamed Sathik

    2013-06-01

    Full Text Available The scope of this study was to discover the association between facial expressions of students in an academic lecture and the level of comprehension shown by their expressions. This study focussed onfinding the relationship between the specific elements of learner’s behaviour for the different emotional states and the relevant expression that could be observed from individual students. The experimentation was done through surveying quantitative observations of the lecturers in the classroom in which the behaviour of students are recorded and were statistically analyzed. The main aim of this paper is to derive association rules that represent relationships between input conditions and results of domain experiments. Hence the relationship between the physical behaviors that are linked to emotional state with the student’s comprehension is being formulated in the form of rules. We present Predictive Apriori algorithm that is able to find all valid class association rules with high accuracy. The rules derived by Predictive Apriori are pruned by objective and subjective measures.

  20. [A method to enhance user experience of EMR based on mining association rules of incremental updating data].

    Science.gov (United States)

    Zhou, Bao-zhuo; Li, Chuan-fu; Dai, Liang-liang; Feng, Huan-qing

    2009-03-01

    The user experience (EX) of current Electronic Medical Record systems (EMR) is needed to improve. This paper proposed a new method to enhance EX of EMR. Firstly, system template and text characterization are used to make the EMR data structured. Then, the structured date are mined based on mining the association rules of incremental updating data to find the association of the elements of template of EMR and the values of elements. Finally, with the help of mined results, the users of EMR are able to input data effectively and quickly.

  1. Leveraging Bibliographic RDF Data for Keyword Prediction with Association Rule Mining (ARM

    Directory of Open Access Journals (Sweden)

    Nidhi Kushwaha

    2014-11-01

    Full Text Available The Semantic Web (Web 3.0 has been proposed as an efficient way to access the increasingly large amounts of data on the internet. The Linked Open Data Cloud project at present is the major effort to implement the concepts of the Seamtic Web, addressing the problems of inhomogeneity and large data volumes. RKBExplorer is one of many repositories implementing Open Data and contains considerable bibliographic information. This paper discusses bibliographic data, an important part of cloud data. Effective searching of bibiographic datasets can be a challenge as many of the papers residing in these databases do not have sufficient or comprehensive keyword information. In these cases however, a search engine based on RKBExplorer is only able to use information to retrieve papers based on author names and title of papers without keywords. In this paper we attempt to address this problem by using the data mining algorithm Association Rule Mining (ARM to develop keywords based on features retrieved from Resource Description Framework (RDF data within a bibliographic citation. We have demonstrate the applicability of this method for predicting missing keywords for bibliographic entries in several typical databases. −−−−− Paper presented at 1st International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2014 March 27-28, 2014. Organized by VIT University, Chennai, India. Sponsored by BRNS.

  2. Generalized Multidimensional Association Rules

    Institute of Scientific and Technical Information of China (English)

    周傲英; 周水庚; 金文; 田增平

    2000-01-01

    The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowl-edge discovery from large-scale databases. And there has been a spurt of research activities around this problem. Traditional association rule mining is limited to intra-transaction. Only recently the concept of N-dimensional inter-transaction as-sociation rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extends Lu's definition of NDITAR based on the analysis of its limitations, and the general-ized multidimensional association rule (GMDAR) is subsequently introduced, which is more general, flexible and reasonable than NDITAR.

  3. An association rule mining-based framework for understanding lifestyle risk behaviors.

    Directory of Open Access Journals (Sweden)

    So Hyun Park

    Full Text Available OBJECTIVES: This study investigated the prevalence and patterns of lifestyle risk behaviors in Korean adults. METHODS: We utilized data from the Fourth Korea National Health and Nutrition Examination Survey for 14,833 adults (>20 years of age. We used association rule mining to analyze patterns of lifestyle risk behaviors by characterizing non-adherence to public health recommendations related to the Alameda 7 health behaviors. The study variables were current smoking, heavy drinking, physical inactivity, obesity, inadequate sleep, breakfast skipping, and frequent snacking. RESULTS: Approximately 72% of Korean adults exhibited two or more lifestyle risk behaviors. Among women, current smoking, obesity, and breakfast skipping were associated with inadequate sleep. Among men, breakfast skipping with additional risk behaviors such as physical inactivity, obesity, and inadequate sleep was associated with current smoking. Current smoking with additional risk behaviors such as inadequate sleep or breakfast skipping was associated with physical inactivity. CONCLUSION: Lifestyle risk behaviors are intercorrelated in Korea. Information on patterns of lifestyle risk behaviors could assist in planning interventions targeted at multiple behaviors simultaneously.

  4. Linguistic Valued Association Rules

    Institute of Scientific and Technical Information of China (English)

    LU Jian-jiang; QIAN Zuo-ping

    2002-01-01

    Association rules discovering and prediction with data mining method are two topics in the field of information processing. In this paper, the records in database are divided into many linguistic values expressed with normal fuzzy numbers by fuzzy c-means algorithm, and a series of linguistic valued association rules are generated. Then the records in database are mapped onto the linguistic values according to largest subject principle, and the support and confidence definitions of linguistic valued association rules are also provided. The discovering and prediction methods of the linguistic valued association rules are discussed through a weather example last.

  5. A Set Operation Based Algorithm for Association Rules Mining%基于集合运算的关联规则采掘算法

    Institute of Scientific and Technical Information of China (English)

    铁治欣; 陈奇; 俞瑞钊

    2001-01-01

    Mining association rules are an important data mining problem. In this paper ,an association rules mining algorithm,ARDBSO,which is based on set operation,is given. It can find all large itemsets in the database while only scan the database once. So,the time for I/O is reduced enormously and the efficiency of ARDBSO is improved. The experiments show that the efficiency of ARDBSO is 80~ 150times of Apriori's.

  6. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    Science.gov (United States)

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil

    2015-10-01

    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  7. A Novel Association Rule Mining with IEC Ratio Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers

    Directory of Open Access Journals (Sweden)

    Ms. Kanika Shrivastava

    2012-06-01

    Full Text Available Dissolved gas Analysis (DGA is the most importantcomponent of finding fault in large oil filledtransformers. Early detection of incipient faults intransformers reduces costly unplanned outages. Themost sensitive and reliable technique for evaluatingthe core of transformer is dissolved gas analysis. Inthis paper we evaluate different transformercondition on different cases. This paper usesdissolved gas analysis to study the history ofdifferent transformers in service, from whichdissolved combustible gases (DCG in oil are usedas a diagnostic tool for evaluating the condition ofthe transformer. Oil quality and dissolved gassestests are comparatively used for this purpose. In thispaper we present a novel approach which is basedon association rule mining and IEC ratio method.By using data mining concept we can categorizefaults based on single and multiple associations andalso map the percentage of fault. This is an efficientapproach for fault diagnosis of power transformerswhere we can find the fault in all obviousconditions. We use java for programming andcomparative study.

  8. Multi-objective Numeric Association Rules Mining via Ant Colony Optimization for Continuous Domains without Specifying Minimum Support and Minimum Confidence

    Directory of Open Access Journals (Sweden)

    Parisa Moslehi

    2011-09-01

    Full Text Available Currently, all search algorithms which use discretization of numeric attributes for numeric association rule mining, work in the way that the original distribution of the numeric attributes will be lost. This issue leads to loss of information, so that the association rules which are generated through this process are not precise and accurate. Based on this fact, algorithms which can natively handle numeric attributes would be interesting. Since association rule mining can be considered as a multi-objective problem, rather than a single objective one, a new multi-objective algorithm for numeric association rule mining is presented in this paper, using Ant Colony Optimization for Continuous domains (ACOR. This algorithm mines numeric association rules without any need to specify minimum support and minimum confidence, in one step. In order to do this we modified ACOR for generating rules. The results show that we have more precise and accurate rules after applying this algorithm and the number of rules is more than the ones resulted from previous works.

  9. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems.

    Directory of Open Access Journals (Sweden)

    Chao Wang

    Full Text Available BACKGROUND: The detection of signals of adverse drug events (ADEs has increased because of the use of data mining algorithms in spontaneous reporting systems (SRSs. However, different data mining algorithms have different traits and conditions for application. The objective of our study was to explore the application of association rule (AR mining in ADE signal detection and to compare its performance with that of other algorithms. METHODOLOGY/PRINCIPAL FINDINGS: Monte Carlo simulation was applied to generate drug-ADE reports randomly according to the characteristics of SRS datasets. Thousand simulated datasets were mined by AR and other algorithms. On average, 108,337 reports were generated by the Monte Carlo simulation. Based on the predefined criterion that 10% of the drug-ADE combinations were true signals, with RR equaling to 10, 4.9, 1.5, and 1.2, AR detected, on average, 284 suspected associations with a minimum support of 3 and a minimum lift of 1.2. The area under the receiver operating characteristic (ROC curve of the AR was 0.788, which was equivalent to that shown for other algorithms. Additionally, AR was applied to reports submitted to the Shanghai SRS in 2009. Five hundred seventy combinations were detected using AR from 24,297 SRS reports, and they were compared with recognized ADEs identified by clinical experts and various other sources. CONCLUSIONS/SIGNIFICANCE: AR appears to be an effective method for ADE signal detection, both in simulated and real SRS datasets. The limitations of this method exposed in our study, i.e., a non-uniform thresholds setting and redundant rules, require further research.

  10. Identification of the Patterns Behavior Consumptions by Using Chosen Tools of Data Mining - Association Rules

    OpenAIRE

    R. Benda Prokeinová; J. Paluchová

    2014-01-01

    The research and development in sustainable environment, that is a subject of research goal of many various countries and food producers, now, it has a long tradition. The research aim of this paper allows for an identification of the patterns behaviour consumptions by using of association rules, because of knowledge ́s importance of segmentation differences between consumers and their opinions on current sustainable tendencies. The research area of sustainability will be in Slovakia stil...

  11. An Associate Rules Mining Algorithm Based on Artificial Immune Network for SAR Image Segmentation

    OpenAIRE

    Mengling Zhao; Hongwei Liu

    2015-01-01

    As a computational intelligence method, artificial immune network (AIN) algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new class...

  12. Characteristics of cyclist crashes in Italy using latent class analysis and association rule mining

    Science.gov (United States)

    De Angelis, Marco; Marín Puchades, Víctor; Fraboni, Federico; Pietrantoni, Luca

    2017-01-01

    The factors associated with severity of the bicycle crashes may differ across different bicycle crash patterns. Therefore, it is important to identify distinct bicycle crash patterns with homogeneous attributes. The current study aimed at identifying subgroups of bicycle crashes in Italy and analyzing separately the different bicycle crash types. The present study focused on bicycle crashes that occurred in Italy during the period between 2011 and 2013. We analyzed categorical indicators corresponding to the characteristics of infrastructure (road type, road signage, and location type), road user (i.e., opponent vehicle and cyclist’s maneuver, type of collision, age and gender of the cyclist), vehicle (type of opponent vehicle), and the environmental and time period variables (time of the day, day of the week, season, pavement condition, and weather). To identify homogenous subgroups of bicycle crashes, we used latent class analysis. Using latent class analysis, the bicycle crash data set was segmented into 19 classes, which represents 19 different bicycle crash types. Logistic regression analysis was used to identify the association between class membership and severity of the bicycle crashes. Finally, association rules were conducted for each of the latent classes to uncover the factors associated with an increased likelihood of severity. Association rules highlighted different crash characteristics associated with an increased likelihood of severity for each of the 19 bicycle crash types. PMID:28158296

  13. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

    KAUST Repository

    Boudellioua, Imane

    2016-07-08

    The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.

  14. Analysis of Medical Domain Using CMARM: Confabulation Mapreduce Association Rule Mining Algorithm for Frequent and Rare Itemsets

    Directory of Open Access Journals (Sweden)

    Dr. Jyoti Gautam

    2015-11-01

    Full Text Available In Human Life span, disease is a major cause of illness and death in the modern society. There are various factors that are responsible for diseases like work environment, living and working conditions, agriculture and food production, housing, unemployment, individual life style etc. The early diagnosis of any disease that frequently and rarely occurs with the growing age can be helpful in curing the disease completely or to some extent. The long-term prognosis of patient records might be useful to find out the causes that are responsible for particular diseases. Therefore, human being can take early preventive measures to minimize the risk of diseases that may supervene with the growing age and hence increase the life expectancy chances. In this paper, a new CMARM: Confabulation-MapReduce based association rule mining algorithm is proposed for the analysis of medical data repository for both rare and frequent itemsets using an iterative MapReduce based framework inspired by cogency. Cogency is the probability of the assumed facts being true if the conclusion is true, means it is based on pairwise item conditional probability, so the proposed algorithm mine association rules by only one pass through the file. The proposed algorithm is also valuable for dealing with infrequent items due to its cogency inspired approach.

  15. Mining Association Rules in Big Data for E-healthcare Information System

    Directory of Open Access Journals (Sweden)

    N. Rajkumar

    2014-08-01

    Full Text Available Big data related to large volume, multiple ways of growing data sets and autonomous sources. Now the big data is quickly enlarged in many advanced domains, because of rapid growth in networking and data collection. The study is defining the E-Healthcare Information System, which needs to make logical and structural method of approaching the knowledge. And also effectually preparing and controlling the data generated during the diagnosis activities of medical application through sharing information among E-Healthcare Information System devices. The main objective is, A E-Healthcare Information System which is extensive, integrated knowledge system designed to control all the views of a hospital operation, such as medical data’s, administrative, financial, legal information’s and the corresponding service processing. At last the analysis of result will be generated using Association Mining Techniques which processed from big data of hospital information datasets. Finally mining techniques result could be evaluated in terms of accuracy, precision, recall and positive rate.

  16. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships.

  17. Customer Requirements Mapping Method Based on Association Rule Mining for Mass Customization

    Institute of Scientific and Technical Information of China (English)

    XIA Shi-sheng; WANG Li-ya

    2008-01-01

    Customer requirements analysis is the key step for product variety design of mass customiza-tion(MC). Quality function deployment (QFD) is a widely used management technique for understanding thevoice of the customer (VOC), however, QFD depends heavily on human subject judgment during extractingcustomer requirements and determination of the importance weights of customer requirements. QFD pro-cess and related problems are so complicated that it is not easily used. In this paper, based on a generaldata structure of product family, generic bill of material (CBOM), association rules analysis was introducedto construct the classification mechanism between customer requirements and product architecture. The newmethod can map customer requirements to the items of product family architecture respectively, accomplishthe mapping process from customer domain to physical domain directly, and decrease mutual process betweencustomer and designer, improve the product design quality, and thus furthest satisfy customer needs. Finally,an example of customer requirements mapping of the elevator cabin was used to illustrate the proposed method.

  18. Mining Algorithm of Normalized Weighted Association Rules in Database%数据库中标准加权关联规则挖掘算法

    Institute of Scientific and Technical Information of China (English)

    杜鹢; 藏海霞

    2001-01-01

    在原有的关联规则挖掘算法的研究中,认为所有的属性的重要程度相同,提出标准加权关联规则的挖掘算法,能够解决因属性重要程度不一样带来的问题。%Previous algorithms on mining association rules maintain that theimportance of each item in database is equal. This paper presents a method of mining weighted association rules in database, which can solve the problems caused by the unequal importance of the items.

  19. Pattern Discovery Using Association Rules

    Directory of Open Access Journals (Sweden)

    Ms Kiruthika M,

    2011-12-01

    Full Text Available The explosive growth of Internet has given rise to many websites which maintain large amount of user information. To utilize this information, identifying usage pattern of users is very important. Web usage mining is one of the processes of finding out this usage pattern and has many practical applications. Our paper discusses how association rules can be used to discover patterns in web usage mining. Our discussion starts with preprocessing of the given weblog, followed by clustering them and finding association rules. These rules provide knowledge that helps to improve website design, in advertising, web personalization etc.

  20. A Novel Association Rule Mining with IEC Ratio Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers

    Directory of Open Access Journals (Sweden)

    Kanika Shrivastava

    2012-06-01

    Full Text Available Dissolved gas Analysis (DGA is the most important component of finding fault in large oil filled transformers. Early detection of incipient faults in transformers reduces costly unplanned outages. The most sensitive and reliable technique for evaluating the core of transformer is dissolved gas analysis. In this paper we evaluate different transformer condition on different cases. This paper uses dissolved gas analysis to study the history of different transformers in service, from which dissolved combustible gases (DCG in oil are used as a diagnostic tool for evaluating the condition of the transformer. Oil quality and dissolved gasses tests are comparatively used for this purpose. In this paper we present a novel approach which is based on association rule mining and IEC ratio method. By using data mining concept we can categorize faults based on single and multiple associations and also map the percentage of fault. This is an efficient approach for fault diagnosis of power transformers where we can find the fault in all obvious conditions. We use java for programming and comparative study.

  1. DETERMINING THE CORE PART OF SOFTWARE DEVELOPMENT CURRICULUM APPLYING ASSOCIATION RULE MINING ON SOFTWARE JOB ADS IN TURKEY

    Directory of Open Access Journals (Sweden)

    Ilkay Yelmen

    2016-01-01

    Full Text Available The software technology is advancing rapidly over the years. In order to adapt to this advancement, the employees on software development should renew themselves consistently. During this rapid change, it is vital to train the proper software developer with respect to the criteria desired by the industry. Therefore, the curriculum of the programs related to software development at the universities should be revised according to software industry requirements. In this study, the core part of Software Development Curriculum is determined by applying association rule mining on Software Job ads in Turkey. The courses in the core part are chosen with respect to IEEE/ACM computer science curriculum. As a future study, it is also important to gather the academic personnel and the software company professionals to determine the compulsory and elective courses so that newly graduated software dev

  2. Targeting Association Rule Mining Without Support Constraint%无支持度约束的靶向式关联规则挖掘

    Institute of Scientific and Technical Information of China (English)

    李凯里; 王立宏

    2012-01-01

    Some concepts such as all attribute itemset, absolute association rule, key antecedent of association rule are proposed to solve information annihilating problem caused by the combination explosive of itemset in associated rules mining without support. This paper proves an important theory, the association rule with the antecedent of key antecedent's super set must be absolute association rule, and it is upward closure. Based on this principle, a targeting association rule mining algorithm is designed to eliminate redundancy association rule significantly. Through an instance, the feasibility and effectiveness of the algorithm are verified.%为解决不考虑支持度时关联规则挖掘中数据项集组合爆炸引起的信息湮灭问题,给出全属性项目集、完全关联规则、关联规则的关键前提等概念.证明以关键前提的超集作为前提的关联规则也一定是完全关联规则,即向上闭合特性.根据该原理设计一个能够消除大量冗余关联规则的靶向式关联规则挖掘算法.通过挖掘实例验证了该算法的可行性和有效性.

  3. Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining

    Directory of Open Access Journals (Sweden)

    Piontkivska Helen

    2009-07-01

    Full Text Available Abstract Background Studies have shown that in the genome of human immunodeficiency virus (HIV-1 regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used. Results Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007, we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms. The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms. Conclusion We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges

  4. Discovering Non-Redundant Association Rules using MinMax Approximation Rules

    OpenAIRE

    R. Vijaya Prakash; Dr. A. Govardhan3; Prof. SSVN. Sarma

    2012-01-01

    Frequent pattern mining is an important area of data mining used to generate the Association Rules. The extracted Frequent Patterns quality is a big concern, as it generates huge sets of rules and many of them are redundant. Mining Non-Redundant Frequent patterns is a big concern in the area of Association rule mining. In this paper we proposed a method to eliminate the redundant Frequent patterns using MinMax rule approach, to generate the quality Association Rules.

  5. Studying Co-evolution of Production and Test Code Using Association Rule Mining

    NARCIS (Netherlands)

    Lubsen, Z.; Zaidman, A.; Pinzger, M.

    2009-01-01

    Long version of the short paper accepted for publication in the proceedings of the 6th International Working Conference on Mining Software Repositories (MSR 2009). Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on

  6. Interestingness of association rules in data mining: Issues relevant to e-commerce

    Indian Academy of Sciences (India)

    Rajesh Natarajan; B Shekar

    2005-04-01

    The ubiquitous low-cost connectivity synonymous with the internet has changed the competitive business environment by dissolving traditional sources of competitive advantage based on size, location and the like. In this level playing field, firms are forced to compete on the basis of knowledge. Data mining tools and techniques provide e-commerce applications with novel and significant knowledge. This knowledge can be leveraged to gain competitive advantage. However, the automated nature of data mining algorithms may result in a glut of patterns – the sheer numbers of which contribute to incomprehensibility. Importance of automated methods that address this immensity problem, particularly with respect to practical application of data mining results, cannot be overstated. We first examine different approaches to address this problem citing their applicability to e-commerce whenever appropriate. We then provide a detailed survey of one important approach, namely interestingness measure, and discuss its relevance in e-commerce applications such as personalization in recommender systems. Study of current literature brings out important issues that reveal many promising avenues for future research. We conclude by reiterating the importance of post-processing methods in data mining for effective and efficient deployment of e-commerce solutions.

  7. Association Rule Discovery and Its Applications

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Data mining, i.e. , mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed peoples ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining.The association mining task consists of identifying the frequent item sets and then forming conditional implication rule among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.``

  8. AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION RULES

    Directory of Open Access Journals (Sweden)

    ARKAN A. G. AL-HAMODI

    2016-03-01

    Full Text Available In mining frequent itemsets, one of most important algorithm is FP-growth. FP-growth proposes an algorithm to compress information needed for mining frequent itemsets in FP-tree and recursively constructs FP-trees to find all frequent itemsets. In this paper, we propose the EFP-growth (enhanced FPgrowth algorithm to achieve the quality of FP-growth. Our proposed method implemented the EFPGrowth based on MapReduce framework using Hadoop approach. New method has high achieving performance compared with the basic FP-Growth. The EFP-growth it can work with the large datasets to discovery frequent patterns in a transaction database. Based on our method, the execution time under different minimum supports is decreased..

  9. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    Science.gov (United States)

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  10. Association Rules Applied to Intrusion Detection

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user's behaviors, we discuss the mining algorithm of association rules and apply to detect anomaly in IDS. Moreover, according to the characteristic of intrusion detection, we optimize the mining algorithm of association rules, and use fuzzy logic to improve the system performance.

  11. A Method of Association Rules Data Mining Based on Star Schema%一种星型模式下的关联规则挖掘方法

    Institute of Scientific and Technical Information of China (English)

    李艳; 白玉峰

    2011-01-01

    目前的数据挖掘基本上都是基于普通数据集的挖掘,针对星型模式结构的数据挖掘的研究工作较少,为此定义星型模式挖掘结构,并在此基础上构建一种关联规则挖掘算法,该算法先扫描事实表,产生最大频繁项集和关联规则,进而以此为基础,提出一种基于连接条件和关联规则局部有效性的理论,并在此基础上建立一种快速扫描维表属性的方法,一次产生维表隐藏的关联规则,这个扫描是基于局部的,不是基于全局的,同时可根据需要,对于不明确的关联规则,通过构建扩展的维表,进行隐知识的挖掘.算法挖掘速度快,若合理地构建扩展维表,能够发现扩展的隐藏信息.%Current data mining is based on the mining of general data set basically. The research to data mining of the star schema structure is less. So the star schema mining structure is defined, and based on which an mining algorithm with association rules is constructed. The algorithm first scans the fact table, and produces maximal frequency item sets and association rules, with which as the basis, the theory based on the local efficiency principle of linking conditions and association rules is put forward, and the method scans the dimension table attributes quickly. It produces the association rules one-off. The scan is based on the part, not global. At the same time the undefined association rules are dealt with by mining the implicit knowledge through constructing extended dimension table. The mining speed of the algorithm is faster. Through building expanded dimension table reasonably, the extended hidden information can be found in this way.

  12. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  13. 基于关联规则的Web挖掘技术研究%Research on Web Mining Based on Association Rules

    Institute of Scientific and Technical Information of China (English)

    夏惠芬; 董卫民

    2011-01-01

    Association rules is an important area of Web mining. In order to dig out the hidden correlation among the data, the concept of association rules was introduced into the Web mining, and the user's access was expressed in the form of association rules. With the idea of Aporiori algorithm, the new Aporiori algorithm role and pattern appropriate for Web mining are presented. The results were verified in some simple webs, and a good result was obtained.%关联规则是Web挖掘中一个重要的研究领域.为了挖掘出隐藏在数据间的相互关系,将关联规则的概念引入到Web挖掘系统中,把用户的访问路径以关联规则的形式表现出来.基于Apriori算法的思想,给出了适合Web挖掘用户访问的新Apriori算法规则及其模式,最后将结果在一些较简单的网页上进行了验证,取得了较好的应用效果.

  14. 关联规则快速挖掘在CRM中的应用%Fast Association Rule Mining in CRM

    Institute of Scientific and Technical Information of China (English)

    王扶东; 李洁; 薛劲松; 朱云龙

    2004-01-01

    交叉销售分析是CRM中的主要分析内容之一.提出了一种前件固定、后件受约束的关联规则快速挖掘算法,该算法的挖掘结果可以帮助企业利用销售情况好的产品促进其他产品的销售;同时提出了一种后件固定、前件受约束的关联规则快速挖掘算法,该算法的挖掘结果可以有效地帮助企业利用交叉销售方法为新产品开拓市场.仿真结果表明,这两种算法能够帮助企业快速准确地得到所需的信息.%The analysis of cross-selling is one of the important parts in analytical CRM. We present a constraint-based association rules mining algorithm AApriori with the specified antecedent and the constrained consequent. The outcome of this algorithm can help enterprises use selling products to popularize products that are unpopular. At the same time, an algorithm CApriori that the consequent is specified and the antecedent is constraind is presented.It can effectively support enterprises to exploit the market of new products. The evaluation demonstrated that the algorithm AApriori and CApriori could quickly get exact information that the enterprise wants.

  15. Method of data tendency measure mining in dynamic association rules%动态关联规则的趋势度挖掘方法

    Institute of Scientific and Technical Information of China (English)

    张忠林; 曾庆飞; 许凡

    2012-01-01

    针对规则随着时间变化的特点,在分析原有定义和对支持度向量(SV)和置信度向量分类的基础上,提出了动态关联规则趋势度的挖掘方法.首先,利用趋势度阈值消除无价值的规则,减小候选项集;其次,产生动态关联规则的趋势度元规则,找出具有价值的规则,提高挖掘质量;最后,通过对具有增减和周期趋势的事物数据库分析,证明了所提方法的有效性.%Based on the original definition and classification of Support Vector (SV) and confidence vector, this paper put forward a method of data tendency measure mining in dynamic association rules, according to the characteristic of rules with time changing. First, taking advantage of tendency measure threshold to eliminate useless rules, the item sets candidates can be reduced. Second, producing the dynamic association rule, this method found out valuable rules and improved the mining quality. Finally, by analyzing a transaction database that is characterized by the tendency of changes and cycles, the analytical results verify the validity of the proposed method.

  16. A Quick Algorithm for Mining Exceptional Rules

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Exceptional rules are often ignored because of their small support. However, they have high confidence, so they are useful sometimes. A new algorithm for mining exceptional rules is presented, which creates a large itemset from a relatively small database and scans the whole database only one time to generate all exceptional rules. This algorithm is proved to be quick and effective through its application in a mushroom database.

  17. RESEARCH ON ALL NEGATIVE ASSOCIATION RULES MINING IN A DATABASE%数据库中全部负关联规则挖掘研究

    Institute of Scientific and Technical Information of China (English)

    李红; 宗瑜; 解浚源

    2011-01-01

    数据库中关联规则信息是知识的表述形式之一,负关联规则挖掘是数据库关联信息挖掘的重要研究内容,具有广泛的应用范围.现有的挖掘方法不能获取数据库中全部的负关联规则,考虑从数据库中提取全部的负关联规则,通过(1)扫描数据库建立数据库频繁模式树DFP-tree( Database Frequent Pattern tree);(2)在精简DFP-tree的基础上获取全部极小非频繁项集ASI;(3)对ASI中极大频繁项集的向上闭包,得到全部非频繁项集;(4)在此基础上采用相关度作为规则兴趣度量之一提取负关联规则.理论和实验表明算法的正确性和效率.%In a database, associated rule information is one of the representation formats for knowledge. Negative association rule mining is so important to study in database association information mining that it bears wide application value. Existing mining approaches can not obtain all negative rules from a database. The paper considers to extract all negative association rules from a database through: (1) scanning the database to build a database frequent pattern tree called DFP-tree; (2) acquiring based on pruning the DFP-tree all small infrequent itemsets; (3) acquiring via upward closure packets of large frequent itemsets in ASI all infrequent itemsets; (4) based on the previous 3 steps adopting correlation metric as one of rule interest measurements to extract negative association rules. Theories and experiments validate the correctness and efficiency of the presented algorithm.

  18. 基于领域知识的冗余关联规则消除算法%Elimination algorithm of redundant rules in association rules mining based on domain knowledge

    Institute of Scientific and Technical Information of China (English)

    张晶; 张斌; 胡学钢

    2011-01-01

    Many association rule mining algorithms have been developed to extract interesting patterns from large databases. However, a large amount of knowledge explicitly represented in domain knowledge(DK) has not been used to reduce the number of association rules. A significant number of well known dependences are unnecessarily extracted by association rule mining algorithrns, which results in the generation of hundreds or thousands of non-interesting association rules. This paper presents a DKARM algorithm, which takes both database and relative DK into account, to eliminate all associations explicitly represented in DK. Experiments on the proposed algorithm show the significant reduction of the number of rules and the elimination of non-interesting rules.%关联规则挖掘算法用于从大型数据库中提取感兴趣的规则,然而,在领域知识中已经能清晰表示的知识并没有被充分考虑,关联规则挖掘算法提取的规则中包含了大量已知的关联性,从而产生了很多冗余规则.文章提出一种算法DKARM,同时考虑了数据本身以及相关的领域知识,以消除在领域知识中清晰表示的已知关联性.实验表明,该算法合理消除了冗余规则,有效降低了规则数目.

  19. Association Rule Mining Based on the Interestingness About Vocational College Courses%基于兴趣度的高职课程关联规则挖掘

    Institute of Scientific and Technical Information of China (English)

    董辉

    2012-01-01

    研究关联规则数据挖掘,讨论兴趣度的概念,设计基于此概念的算法.以高职成绩数据库为处理对象,分析课程间的关联规则,并以兴趣度为约束条件,剔除具有欺骗性的无效关联,挖掘一些合理可靠的课程间有趣的关联规则,从而为高职课程设置和教学大纲的修订提供参考,同时也验证了算法的有效性.%This paper studies the association rules data mining, the concept of interestingness and algorithm design based on the concept. Taking vocational college's achievement database for processing object,this paper analyzes the association rules of courses; and with interestingness as constraint conditions, decep- tive invalid association rules are eliminated, and some reliable interesting association rules of courses are discovered. This paper provides reference for vocational college curriculum design and syllabus revision, and it also verifies the validity of the algorithm.

  20. a Reliability Evaluation System of Association Rules

    Science.gov (United States)

    Chen, Jiangping; Feng, Wanshu; Luo, Minghai

    2016-06-01

    In mining association rules, the evaluation of the rules is a highly important work because it directly affects the usability and applicability of the output results of mining. In this paper, the concept of reliability was imported into the association rule evaluation. The reliability of association rules was defined as the accordance degree that reflects the rules of the mining data set. Such degree contains three levels of measurement, namely, accuracy, completeness, and consistency of rules. To show its effectiveness, the "accuracy-completeness-consistency" reliability evaluation system was applied to two extremely different data sets, namely, a basket simulation data set and a multi-source lightning data fusion. Results show that the reliability evaluation system works well in both simulation data set and the actual problem. The three-dimensional reliability evaluation can effectively detect the useless rules to be screened out and add the missing rules thereby improving the reliability of mining results. Furthermore, the proposed reliability evaluation system is applicable to many research fields; using the system in the analysis can facilitate obtainment of more accurate, complete, and consistent association rules.

  1. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis.

    Science.gov (United States)

    Gardiner, Eleanor J; Gillet, Valerie J

    2015-09-28

    Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development

  2. Discovery of Association Rules from University Admission System Data

    Directory of Open Access Journals (Sweden)

    Abdul Fattah Mashat

    2013-05-01

    Full Text Available Association rules discovery is one of the vital data mining techniques. Currently there is an increasing interest in data mining and educational systems, making educational data mining (EDM as a new growing research community. In this paper, we present a model for association rules discovery from King Abdulaziz University (KAU admission system data. The main objective is to extract the rules and relations between admission system attributes for better analysis. The model utilizes an apriori algorithm for association rule mining. Detailed analysis and interpretation of the experimental results is presented with respect to admission office perspective.

  3. Association Rule Pruning based on Interestingness Measures with Clustering

    Directory of Open Access Journals (Sweden)

    R. Bhaskaran

    2009-11-01

    Full Text Available Association rule mining plays vital part in knowledge mining. The difficult task is discovering knowledge or useful rules from the large number of rules generated for reduced support. For pruning or grouping rules, several techniques are used such as rule structure cover methods, informative cover methods, rule clustering, etc. Another way of selecting association rules is based on interestingness measures such as support, confidence, correlation, and so on. In this paper, we study how rule clusters of the pattern Xi -> Y are distributed over different interestingness measures.

  4. Effective Discovery of Exception Class Association Rules

    Institute of Scientific and Technical Information of China (English)

    周傲英; 魏藜; 俞舫

    2002-01-01

    In this paper, a new effective method is proposed to find class association rules (CAR), to get useful class association rules (UCAR) by removing the spurious class association rules (SCAR), and to generate exception class association rules (ECAR) for each UCAR. CAR mining, which integrates the techniques of classification and association, is of great interest recently. However, it has two drawbacks: one is that a large part of CARs are spurious and maybe misleading to users; the other is that some important ECARs are difficult to find using traditional data mining techniques. The method introduced in this paper aims to get over these flaws. According to our approach, a user can retrieve correct information from UCARs and know the influence from different conditions by checking corresponding ECARs. Experimental results demonstrate the effectiveness of our proposed approach.

  5. On the Mining Algorithm Based on BDIF Association Rule%基于BDIF的关联规则挖掘算法研究

    Institute of Scientific and Technical Information of China (English)

    郭昌建

    2015-01-01

    This article describes research on association rule mining and classification methods of association rules, analyzes and evaluates the classic Apriori algorithm, which gives rise to an efficient frequent BDIF (Based Transactional Databases Including Frequent Item Set) algorithm. It thereby reduces scanning data block and improves algorithm efficiency by dividing data block and quickly searching for frequent item set.%阐述了关联规则挖掘的研究情况,关联规则的分类方法等,对经典Apriori算法进行了分析和评价,在此基础上提出了一种高效产生频繁集的BDIF(Based Transactional Databases Including Frequent ItemSet)算法;它通过划分数据块,快速的搜寻频繁项目集,从而减少对数据块的扫描次数,提高了算法的效率。并用BorlandC++Builder6.0开发环境来调试、验证该算法。

  6. Book Lending Data Mining Based on Association Rules%基于关联规则的图书借阅数据挖掘

    Institute of Scientific and Technical Information of China (English)

    吴玉春; 龙小建

    2016-01-01

    Based on the university libraries’ actual business needs, this article uses association rules to analyze book lending data of students in university libraries. First the article puts forward library history lending data pretreatment, including data cleaning, data integration, data transformation and transactional database construction. Then we apply MFP-Miner algorithm to the transaction database mining, aiming to excavate the association rules of lending books, providing scientific data support for lending books and books services, so as to enhance the university libraries’ service quality.%文章根据高校图书馆的实际业务需要,运用关联规则对高校图书馆学生的借阅数据进行了挖掘分析。首先将图书馆历史借阅数据进行预处理,预处理包括对数据进行清理、集成、转换以及建立事务数据库;然后利用关联规则挖掘算法(MFP-Miner算法)对事务数据库进行挖掘处理,挖掘出图书借阅的关联规则,为图书借阅、图书推荐等服务提供科学的数据支持,从而提升图书馆服务质量。

  7. Apriori Association Rule Algorithms using VMware Environment

    Directory of Open Access Journals (Sweden)

    R. Sumithra

    2014-07-01

    Full Text Available The aim of this study is to carry out a research in distributed data mining using cloud platform. Distributed Data mining becomes a vital component of big data analytics due to the development of network and distributed technology. Map-reduce hadoop framework is a very familiar concept in big data analytics. Association rule algorithm is one of the popular data mining techniques which finds the relationships between different transactions. A work has been executed using weighted apriori and hash T apriori algorithms for association rule mining on a map reduce hadoop framework using a retail data set of transactions. This study describes the above concepts, explains the experiment carried out with retail data set on a VMW are environment and compares the performances of weighted apriori and hash-T apriori algorithms in terms of memory and time.

  8. Pair Triplet Association Rule Generation in Streams

    Directory of Open Access Journals (Sweden)

    Manisha Thool

    2013-08-01

    Full Text Available Many applications involve the generation and analysis of a new kind of data, called stream data, where data flows in and out of an observation platform or window dynamically. Such data streams have the unique features such as huge or possibly infinite volume, dynamically changing, flowing in or out in a fixed order, allowing only one or a small number of scans. An important problem in data stream mining is that of finding frequent items in the stream. This problem finds application across several domains such as financial systems, web traffic monitoring, internet advertising, retail and e-business. This raises new issues that need to be considered when developing association rule mining technique for stream data. The Space-Saving algorithm reports both frequent and top-k elements with tight guarantees on errors. We also develop the notion of association rules in streams of elements. The Streaming-Rules algorithm is integrated with Space-Saving algorithm to report 1-1 association rules with tight guarantees on errors, using minimal space, and limited processing per element and we are using Apriori algorithm for static datasets and generation of association rules and implement Streaming-Rules algorithm for pair, triplet association rules. We compare the top- rules of static datasets with output of stream datasets and find percentage of error.

  9. Indirect associations between multiple items and a mining algorithm

    Institute of Scientific and Technical Information of China (English)

    Ni Min; Xu Xiaofei; Deng Shengchun

    2005-01-01

    Indirect association is a high level relationship between items and frequent itemsets in data. Current research approaches on indirect association mining are limited to indirect association between itempairs, which will discovertoo many rules from dataset. A formal definition of indirect association between multiple items is presented, along with an algorithm, SET-NIA,for mining this kind of indirect associations based on anti-monotonicity of indirect associations and frequent itempair support matrix. While the found rules contain same information as compared to the rules found by indirect association between itempairs mining algorithms, this notion brings space-saving in storage ofthe rules as well as superiority for human to understand and apply the rules. Experiments conducted on two real-word datasets show that SET-NIA can effectively find fewer rules than existing algorithms which mine indirect association between itempairs, the experimental results also prove that SET-NIA has better performance than existing algorithms.

  10. 大数据环境下相容数据集的关联规则数据挖掘%Data Mining Algorithm of Association Rules Among Compatible Datasets in Big Data Environment

    Institute of Scientific and Technical Information of China (English)

    张春生

    2016-01-01

    在对不可连接数据集充分分析的基础上,给出基于不可连接数据集的相容数据集和不相容数据集2个定义,给出相容数据集的一些基本理论,在这些理论基础上给出一个基于相容数据集的关联规则挖掘方案,实现每个相容数据集挖掘的关联规则直接合并,生成整个相容数据集的关联规则,实现普通数据挖掘算法无法实现的关联规则挖掘。方案扩展了关联规则算法的应用领域,提高了数据挖掘效率,在一定程度上也实现了隐私保护。%On the basis of the data set which can not be connecteel ,the definitions of the compatible data set and incompatible data set 2 based on the data sets not to be connected are given ,some basic theories about compatible data set are presented ,an association rule mining programme based on compatible data set is proposed .It is realized that association rules of compatible data set mining merge directly ,so that association rules of the whole compatible data set are generated and the algorithm can realize an association rule mining which cannot be realized by the common data mining .The algorithm expands the application field of association rule algorithm and improves the efficiency of data mining ,meanwhile ,it realized the privacy protection in a certain extent .

  11. STATE OF THE ART - MODERN SEQUENTIAL RULE MINING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    Anjali Paliwal

    2015-10-01

    Full Text Available This paper is state of the art of existing sequential rule mining algorithms. Extracting sequential rule is a very popular and computationally expensive task. We also explain the fundamentals of sequential rule mining. We describe today’s approaches for sequential rule mining. From the broad variety of efficient algorithms that have been developed we will compare the most important ones. We will systematize the algorithms and analyze their performance based on both their run t ime performance and theoretical considerations. Their strengths and weaknesses are also investigated.

  12. State of The Art - Modern Sequential Rule Mining Techniques

    Directory of Open Access Journals (Sweden)

    Ms. Anjali Paliwal

    2014-08-01

    Full Text Available This paper is state of the art of existing sequential rule mining algorithms. Extracting sequential rule is a very popular and computationally expensive task. We also explain the fundamentals of sequential rule mining. We describe today’s approaches for sequential rule mining. From the broad variety of efficient algorithms that have been developed we will compare the most important ones. We will systematize the algorithms and analyze their performance based on both their run time performance and theoretical considerations. Their strengths and weaknesses are also investigated.

  13. New game - new rules: mining in the democratic South Africa

    Energy Technology Data Exchange (ETDEWEB)

    Motlatsi, J. [National Union of Mineworkers (South Africa)

    1995-12-31

    Discusses the eight areas identified by the South African Union of Mineworkers as requiring new rules to improve safety and conditions in the South African mining industry. The areas are: improved health and safety; the elimination of racism; fair wages; decent living conditions; proper training; care for workers and areas affected by the downscaling of mining; development of an economically viable mining sector; and a mining sector run on a humane and participatory manner.

  14. A heuristic algorithm for quick hiding of association rules

    Directory of Open Access Journals (Sweden)

    Maryam Fouladfar

    Full Text Available Increasing use of data mining process and extracting of association rules caused the introduction of privacy preserving in data mining. A complete publication of the database is inconsistent with security policies and it would result in disclosure of some ...

  15. Association rules of data mining in library service application research%关联规则数据挖掘在图书馆个性化服务中的应用研究

    Institute of Scientific and Technical Information of China (English)

    刘志勇; 王阿利; 魏迎; 郭轶

    2012-01-01

    随着计算机技术、网络技术以及现代通信技术的蓬勃发展,数据挖掘作为信息技术飞速发展的衍生物,为数字知识资源的有效管理提供了技术保障。文章通过对关联规则数据挖掘技术以及图书馆个性化服务相关内容的介绍,探讨了关联规则数据挖掘在数字化图书馆中的应用,说明关联规则挖掘技术在数字图书馆应用的必要性,以及在提升图书馆服务质量和服务水平方面的发挥的重要作用。%Along with the computer technology,network technology and modern communication technology rapid development,the data mining as the rapid development of information technology the derivatives,for digital intellectual resources effective management to provide technical support.Based on the association rules in data mining technology and library personalized service related content introduction,discusses the association rules in data mining in digital library application,illustrate the association rules mining technology in the digital library the application necessity,as well as in the promotion of library service quality and service level of the play important role.

  16. Recent Trends and Research Issues in Video Association Mining

    CERN Document Server

    V, Vijayakumar

    2011-01-01

    With the ever-growing digital libraries and video databases, it is increasingly important to understand and mine the knowledge from video database automatically. Discovering association rules between items in a large video database plays a considerable role in the video data mining research areas. Based on the research and development in the past years, application of association rule mining is growing in different domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. The purpose of this paper is to provide general framework of mining the association rules from video database. This article is also represents the research issues in video association mining followed by the recent trends.

  17. Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    KIZILOLUK, S.

    2015-11-01

    Full Text Available In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA, is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets.

  18. Revealing Significant Relations between Chemical/Biological Features and Activity: Associative Classification Mining for Drug Discovery

    Science.gov (United States)

    Yu, Pulan

    2012-01-01

    Classification, clustering and association mining are major tasks of data mining and have been widely used for knowledge discovery. Associative classification mining, the combination of both association rule mining and classification, has emerged as an indispensable way to support decision making and scientific research. In particular, it offers a…

  19. 基于磁盘表存储FP-TREE的关联规则挖掘算法%Mining Algorithm of Association Rules Based on Disk Table Resident FP-TREE

    Institute of Scientific and Technical Information of China (English)

    申彦; 宋顺林; 朱玉全

    2012-01-01

    随着现实待挖掘数据库规模不断增长,系统可使用的内存成为用FP-GROWTH算法进行关联规则挖掘的瓶颈.为了摆脱内存的束缚,对大规模数据库中的数据进行关联规则挖掘,基于磁盘的关联规则挖掘成为重要的研究方向.对此,改进原始的FP-TREE数据结构,提出了一种新颖的基于磁盘表的DTRFP-GROWTH (disk table resident FP-TREE growth)算法.该算法利用磁盘表存储FPTREE,降低内存使用,在传统FP-GROWTH算法占用过多内存、挖掘工作无法进行时,以独特的磁盘表存储FP-TREE技术,减少内存使用,能够继续完成挖掘工作,适合空间性能优先的场合.不仅如此,该算法还将关联规则挖掘和关系型数据库整合,克服了基于文件系统相关算法效率较低、开发难度较大等问题.在真实数据集上进行了验证实验以及性能分析.实验结果表明,在内存空间有限的情况下,DTRFP-GROWTH算法是一种有效的基于磁盘的关联规则挖掘算法.%As the size of the database to be mined is increasing constantly, the size of physical memory available has become a bottleneck when using FP-GROWTH algorithm for association rules mining. So. it is necessary to tackle space scalability by some new algorithms in order to mine association rules in huge database. Nowadays, disk-resident algorithm has become the main target. Therefore, the original data structure of FP-TREE is improved and a novel algorithm called DTRFP-GROWTH (disk table resident FP-TREE growth) is presented. This algorithm uses disk table for storing FP-TREE to decrease memory usage. When the mining works failed for FP-GROWTH using too much memory, DTRFP GROWTH can continue to mine association rules from huge database by its special skill called disk table resident FP-TREE, which is suitable to occasions of space performance priority. In addition, this algorithm also integrates association rules mining with RDBMS system. It overcomes the problems of

  20. The diagnostic rules of peripheral lung cancer preliminary study based on data mining technique

    Institute of Scientific and Technical Information of China (English)

    Yongqian Qiang; Youmin Guo; Xue Li; Qiuping Wang; Hao Chen; Duwu Cui

    2007-01-01

    Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age,cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.

  1. 聚类与关联规则在信息舞弊识别中的应用%The Application of Clustering and Associate Rule Mining to Fraud Information Identification

    Institute of Scientific and Technical Information of China (English)

    幸莉仙; 黄慧连

    2012-01-01

    针对现代电子数据迅速膨胀,传统的审计方式已经无法应对海量的业务数据,试图将数据挖掘中的聚类和关联规则算法引入审计领域.在研究聚类与关联规则算法的含义及相关算法—K-Means和Apriori算法的基础上,提出了一种基于聚类与关联规则的审计模型,并以某市城镇医疗保险的审计为例,首先利用聚类分析进行数据筛选,然后利用关联规则挖掘海量数据之间潜在的关系,为审计提供线索.文章通过案例分析为数据挖掘在信息舞弊识别领域的应用提供参考.%Considering that with the rapid expansion of electronic data, the traditional audit approachs can not cope with vast business data, this paper intend to introduce the Clustering and Association Rule Mining in the audit fields. Based on the study of the meaning of Clustering and Association Rule Mining and their Algorithm—K-Means and Apriori, this article proposed an audit model which is based on the Clustering and Association Rule Mining, at the same time, taking the audit of medical insurance of some a city as an example, it detailed first how to use the Clustering to filter data, then how to mining the potential relationships in vast data so as to determine the audit priorities and audit clues.Through the case, the article is committed to provide a reference for the application of data mining in the fraud information identification.

  2. Association Rules Mining Based on SVM and Its Application in Simulated Moving Bed PX Adsorption Process%基于支持向量基的关联规则挖掘及其在模拟移动床PX吸附分离过程中的应用

    Institute of Scientific and Technical Information of China (English)

    张英; 苏宏业; 褚健

    2005-01-01

    In this paper, a novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machines (SVMs) is proposed. Hyperrectangles rules are constructed on the base of prototypes and support vectors (SVs) under some heuristic limitations. The proposed algorithm is applied to a simulated moving bed (SMB) paraxylene (PX) adsorption process. The relationships between the key process variables and some objective variables such as purity, recovery rate of PX are obtained. Using existing domain knowledge about PX adsorption process, most of the obtained association rules can be explained.

  3. Mining Rules from Electrical Load Time Series Data Set

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The mining of the rules from the electrical load time series data which are collected from the EMS (Energy Management System) is discussed. The data from the EMS are too huge and sophisticated to be understood and used by the power system engineer, while useful information is hidden in the electrical load data. The authors discuss the use of fuzzy linguistic summary as data mining method to induce the rules from the electrical load time series. The data preprocessing techniques are also discussed in the paper.

  4. A new incremental updating algorithm for association rules

    Institute of Scientific and Technical Information of China (English)

    WANG Zuo-cheng; XUE Li-xia

    2007-01-01

    Incremental data mining is an attractive goal for many kinds of mining in large databases or data warehouses. A new incremental updating algorithm rule growing algorithm (RGA) is presented for efficient maintenance discovered association rules when new transaction data is added to a transaction database. The algorithm RGA makes use of previous association rules as seed rules. By RGA, the seed rules whether are strong or not can be confirmed without scanning all the transaction DB in most cases. If the distributing of item of transaction DB is not uniform, the inflexion of robustness curve comes very quickly, and RGA gets great efficiency, saving lots of time for I/O. Experiments validate the algorithm and the test results showed that this algorithm is efficient.

  5. Medical images data mining using classification algorithm based on association rule%基于关联分类算法的医学图像数据挖掘

    Institute of Scientific and Technical Information of China (English)

    邓薇薇; 卢延鑫

    2012-01-01

    Objective In order to assist clinicians in diagnosis and treatment of brain disease,a classifier for medical images which contains tumora inside,based on association rule data mining techniques was constructed.Methtoods After a pre-processing phase of the medical images,the related features from those images were extracted and discretized as the input of association rule,then the medical images classifier was constructed by improved Apriori algorithm.Results The medical images classifier was constructed.The known type of medical images was utilized to train the classifier so as to mine the association rules that satisfy the constraint conditions.Then the brain tumor in an unknown type of medical image was classified by the classifier constructed.Conclusion Classification algorithm based on association rule can be effectively used in mining image features,and constructing an image classifier to identify benign or malignant tumors.%目的 利用关联分类算法,构造医学图像分类器,对未知类型的脑肿瘤图像进行自动判别和分类,以帮助临床医生进行脑疾病的诊断和治疗.方法 对医学图像经过预处理后进行特征提取,再将提取的特征离散化后放到事务数据库中作为关联分类规则的输入,然后利用改进的Apriori算法构造医学图像分类器.结果 构造了医学图像分类器,用已知类型的图像训练分类器挖掘满足约束条件的关联规则,然后利用发现的关联规则对未知类型的医学图像进行分类以判断脑肿瘤的良恶性.结论 利用关联分类算法可以有效地挖掘医学图像特征,进而构造图像分类器,实现脑肿瘤良恶性的自动判别.

  6. MAROR: Multi-Level Abstraction of Association Rule Using Ontology and Rule Schema

    Directory of Open Access Journals (Sweden)

    Salim Khiat

    2014-11-01

    Full Text Available Many large organizations have multiple databases distributed over different branches. Number of such organizations is increasing over time. Thus, it is necessary to study data mining on multiple databases. Most multi-databases mining (MDBM algorithms for association rules typically represent input patterns at a single level of abstraction. However, in many applications of association rules – e.g., Industrial discovery, users often need to explore a data set at multiple levels of abstraction, and from different points of view. Each point of view corresponds to set of beliefs (and representational commitments regarding the domain of interest. Using domain ontologies, we strengthen the integration of user knowledge in the mining and post-processing task. Furthermore, an interactive and iterative framework is designed to assist the user along the analyzing task at different levels. This paper formalizes the problem of association rules using ontologies in multi-database mining, describes an ontology-driven association rules algorithm to discoverer rules at multiple levels of abstraction and presents preliminary results in petroleum field to demonstrate the feasibility and applicability of this proposed approach.

  7. Selecting the Model of Teaching Methods Based on The Application Type of Tax Mining Association Rules%基于关联规则挖掘的应用型税法教学方法选择模型

    Institute of Scientific and Technical Information of China (English)

    刘纯林; 孙睿潇

    2016-01-01

    针对目前税法教学方法无法达到实践应用技术型人才的培养目标,并且独立院校对应用型税法教学方法的选择上也无法满足应用技术型人才培养的需求,本文提出了一种基于模糊集优化关联规则挖掘的应用型税法教学方法选择模型,它是建立在关联规则挖掘算法的原则之上,运用模糊集提升了它准确性,再将专题法、案例法、讲授法、归纳比较法、“讲、读、练”法分别对五个不同的班级进行应用型税法教学,最后采用基于模糊集优化关联规则挖掘的应用型税法教学方法选择模型对其进行分析,并将得到的关联规则的强弱替代教学方法的优劣性。算法仿真结果证明了本文提出的优化模型比原算法更加准确。%Based on the fact that current teaching methods of revenue can not achieve the training objectives of cultivating practical and technical personnel, and independent institutions to choose the tax applied on teaching methods can not meet the needs of training practical and technical personnel, this paper presents a new teaching method applied tax rules mining selection model based on a fuzzy set optimization association. It is built on association rule mining algorithm, and its accuracy is improved based on fuzzy sets. What is more, teaching law, special law, case law, comparative law induction,"speaking, reading practicing"law have been conducted on tax applied teaching in five different classes. Finally, choose the model to analyze them using teaching methods of applied tax based on related optimization rule mining of fuzzy sets, and replace pros and cons of teaching method with resulting substitute teaching association rules. Algorithm simulation results show that the improved model have more accuracy compared to the original one.

  8. Data Mining of Front Pages of Medical Records Based on Association Rules%基于关联规则的病案首页数据挖掘

    Institute of Scientific and Technical Information of China (English)

    杜军; 郭慧敏; 杜静静; 李宁; 黄路非; 杨建南

    2016-01-01

    Objectives To find the association rules of each index of discharged patients’information in the use of Apriori algorithm, provide a theoretical basis for hospital management and decision making. Methods Apriori correlation analysis was conducted on discharged patients in 2015 with the application of R software, to explore gender department and hospital, medical treatment, hospital departments, hospitalization days and total expenses, medical treatment, hospital departments and association rules whether the operation, and analyzed its causes. Results After the field analysis on the front pages of medical records of 49737 cases of patients discharged in 2015, we found the rules below:the discharged number in respiratory ward, digestion ward, general surgery ward, male were more than female patients, and the confidence of the strong association rules were 0.621, 0.531,0.518;in neurology ward and ophthalmology ward, female were more than male in discharged patients, and the confidence of the strong association rules were 0.565, 0.561;health care hospital hospitalization expenses was closely related with the duration of hospitalization, and the confidence of the strong association rules were 0.731、0.649、0.745、0.545;whether to adopt surgical treatment and there was a close relationship between departments, and the confidence of the strong association rules were 0.951、0.748、0.985、0.974、0.735. Conclusions The potential association rules of association rules could explore different indicators, and provide the basis for hospital management and policy decision.%目的:利用Apriori算法找到出院患者信息各个指标中的关联规则,为医院管理和决策提供理论依据。方法利用R软件中的arules包对2015年某院出院患者做Apriori关联分析,探索出院科室与性别,费别、出院科室、住院天数与总费用,费别、出院科室与是否手术的关联规则,并分析其原因。结果对2015

  9. Price Adjustment by Mining Negative Association Rules%基于负关联规则挖掘的价格调整

    Institute of Scientific and Technical Information of China (English)

    黄发良; 郑小建; 张师超

    2006-01-01

    定制优良的产品价格是激烈竞争的市场中一个关键,基于负关联规则挖掘的技术提出一种新的定价方法,它可通过人力参与和完全自动两种方式进行,该方法具有易操作与易扩展的优点.实验表明该方法是有效的.%Well-determining product price has been a crucial problem in marketing competition. A novel pricing method based on negative association rules identified from past data is proposed, which is easily-manipulated and well-extended for end users. In our approach, an optimal price can be generated with two alternative strategies: human-assisted pricing strategy and automatic pricing strategy. In addition, an efficient algorithm for generating short negative association rules is devised. The results show that the approach is promising and efficient.

  10. Research on risk web information mining technology based on improved association rules%基于改进关联规则的危险Web信息挖掘技术研究

    Institute of Scientific and Technical Information of China (English)

    黄宏本

    2016-01-01

    The security of cyber information space is threatened by the hazard information that caused by different protocols and network channels in Web network,and the cyber space is purified to ensure the network security by mining the hazard Web information accurately. The algorithm of the fuzzy association rules are used in the traditional method to excavate and classified the dangerous Web information. The fuzzy clustering is easy to be disturbed in the influence background and has low efficiency, so it is hard to establish effective association rules. Because of this,the risk Web information mining technology based on the im⁃proved association rules is proposed. Before establishing the association rules,Takens theorem is introduced to reconstruct the phase space of the hazard Web information data to establish the channel model for the hazard information mining in Web net⁃work and make classification design for the multisource progress of the risk Web information flow. An adaptive IIR cascade filtering algorithm is designed to filter the data influence,improve the progress of the association rules,and realize the accurate mining of the risk Web information. The simulation results for the performance verification show that this algorithm has advantages of good filtering interference performance and high accuracy.%在Web网络中承载着不同的协议和网络信道,由此产生危险信息,给网络信息空间带来安全威胁,通过对危险Web信息的准确挖掘,可净化网络空间,确保网络安全。传统方法采用模糊关联规则算法进行危险Web信息分类挖掘,在干扰背景下,模糊聚类过容易受到干扰,导致很难建立有效的关联规则,挖掘效率较低。提出一种基于改进关联规则的危险Web信息挖掘技术。在建立关联规则前,引入Takens 定理进行危险Web信息数据的相空间重构,构建Web网络的危险信息挖掘的信道模型,并对危险Web信息的信息流多

  11. Recent Trends and Research Issues in Video Association Mining

    Directory of Open Access Journals (Sweden)

    Vijayakumar.V

    2011-12-01

    Full Text Available With the ever-growing digital libraries and video databases, it is increasingly important to understand andmine the knowledge from video database automatically. Discovering association rules between items in alarge video database plays a considerable role in the video data mining research areas. Based on theresearch and development in the past years, application of association rule mining is growing in differentdomains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well aspersonal and online media collections. The purpose of this paper is to provide general framework ofmining the association rules from video database. This article is also represents the research issues invideo association mining followed by the recent trends.

  12. MINING METHOD OF SPATIAL ASSOCIATION RULE INTEGRATING CO-LOCATION PATTERNS%集成同位模式的空间关联规则挖掘方法

    Institute of Scientific and Technical Information of China (English)

    向俊; 王静

    2013-01-01

    In spatial data mining , to use traditional frequent item-set mining method will lead to duplicated counting of spatial entities;moreover , it does not take into account that the relations between spatial entities and the correlation of spatial entities and their surrounding environments will produce a large amount of irrelevant spatial information .In order to overcome the deficiency , in the paper we propose the model and the algorithm of spatial association rule mining which integrates the spatial co -location patterns mining .The idea of "the first law of geography"and "the biocoenosis construction"are introduced .Firstly, it carries out the optimised partition on spatial entities features distri-bution and the discretised process on continuous space distribution .Secondly , it eliminates the redundant spatial information , and uses spatial co-location algorithm to mine the implicit relationships of different spatial entities features .The spatial attributes , no-spatial attributes and spatial relationships of the entities are employed to construct the spatial transaction database .Lastly, the mining of spatial association rules is conducted in spatial transaction database .Experimental results show that the model and the algorithm are effective , the correlation between the spatial entities features is considered .Many significant spatial association rules can be discovered from the constructed spatial transaction database after the elimination of the redundant information and the discretised processing on the continuous space distribution .%在空间数据挖掘中,使用传统的频繁集挖掘方法会导致空间实体重复计数,而且未考虑到空间实体之间的关系以及空间实体与周围环境的相关性将产生大量不相关的空间信息。针对以上存在的不足,提出集成空间同位模式挖掘的空间关联规则挖掘模型及算法。引入“地理学第一定律”和“生物群落构建”思想,首先对空

  13. 基于文化免疫克隆算法的关联规则挖掘研究%Mining association rules based on cultured immune clone algorithm

    Institute of Scientific and Technical Information of China (English)

    杨光军

    2013-01-01

      针对关联规则挖掘问题,给出一种基于文化免疫克隆算法的关联规则挖掘方法,该方法将免疫克隆算法嵌入到文化算法的框架中,采用双层进化机制,利用免疫克隆算法的智能搜索能力和文化算法信念空间形成的公共认知信念的引导挖掘规则。该方法重新给出了文化算法中状况知识和历史知识的描述,设计了一种变异算子,能够自适应调节变异尺度,提高免疫克隆算法全局搜索能力。实验表明,该算法的运行速度和所得关联规则的准确率优于免疫克隆算法。%For the association rules mining, a method of mining association rules based on cultured immune clone algorithm is proposed. This method uses two-layer evolutionary mechanism and embeds the immune clone algorithm in the culture algorithm framework. It uses the intelligent searching ability of the immune clone algorithm and the commonly accepted knowledge in the culture algorithm to guide the rules mining. The situational knowledge and history knowledge in the culture algorithm are rede-fined, and a new mutation operator is put forward. This operator has the adaptive adjustment of mutation measure to improve the global search ability of immune clone algorithm. The experiments show that the new algorithm is superior to immune clone algo-rithm in performance speed and the rules’accuracy.

  14. Research and Realization of Mining Association Rules in Book Circulation Based on Visual FoxPro%基于Visual FoxPro编程的图书流通关联规则挖掘研究与实现

    Institute of Scientific and Technical Information of China (English)

    卢红杰

    2012-01-01

    对关联规则挖掘的经典Apriori算法进行了深入细致研究.在Visual FoxPro环境下,通过编程实现了经典的Apriori算法,完成了对辽宁石油化工大学近十年来图书借阅数据的关联规则挖掘.得出了专业图书间的借阅关联关系.为预测读者的借阅倾向、辅助采购决策、主动推送相关信息等服务提供了较为翔实的数据支持.%The classic Apriori algorithm for raining association rules is deeply studied. According to Visual FoxPro software, the classic Apriori algorithm is programmed through the computer, achieved mining association rules of borrowing books data of Liaoning Shihua University over the past decade. The association relationship of borrowing professional books is obtained. It not only provides more details about borrowing books data, but also forecasts the readers borrowing tendencies, assists to purchase the strategies of books, and actively provides some related information.

  15. A Method for Hiding Association rules with Minimum Changes in Database

    Directory of Open Access Journals (Sweden)

    Zahra Sheykhinezhad

    Full Text Available Privacy preserving data mining is a continues way for to use data mining, without disclosing private information. To prevent disclosure of sensitive information by data mining techniques, it is necessary to make changes to the data base. Association rules ...

  16. An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules

    Directory of Open Access Journals (Sweden)

    Ahmed Abdul-WahabAl-Opahi

    2015-06-01

    Full Text Available Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of positive rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. But these algorithms do not consider negation occurrence of the attribute in them and also these rules are not in infrequent form. The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of interest negative association rules, and their huge number as compared with positive association rules. The interesting discovery of association rules is an important and active area within data mining research. In this paper, an efficient algorithm is proposed for discovering interesting positive and negative association rules from frequent and infrequent items. The experimental results show the usefulness and effectiveness of the proposed algorithm.

  17. Using Fuzzy Association Rules to Design E-commerce Personalized Recommendation System

    OpenAIRE

    Guofang Kuang; Yuanchen Li

    2013-01-01

    In order to improve the efficiency of fuzzy association rule mining, the paper defines the redundant fuzzy association rules, and strong fuzzy association rules redundant nature. As much as possible for more information in the e-commerce environment, and in the right form is a prerequisite for personalized recommendation. Personalized recommendation technology is a core issue of e-commerce automated recommendation system. Higher complexity than ordinary association rules algorithm fuzzy assoc...

  18. 基于关联规则 Apriori 算法的物联网海量数据挖掘系统研究%Mass Data Mining System for Internet of Things Based on Association Rules Apriori Algorithm

    Institute of Scientific and Technical Information of China (English)

    周芳

    2015-01-01

    Internet of Things brings much convence.However,in the process of using it mass data increase continuously,which adds the difficulties of obtaining useful information from them.So,mass data mining has been a hot point in research on internet of things.To the mass bussiness data,the key problem to be solved is how to rapidly analyze,process,store and mine data so as to realize swift abstrac-tion of useful data and serve for business decision-making of internet of things.Therefore,mass data min-ing system for internet of things was designed based on association rules Apriori algorithm in this paper.%物联网的出现为人们带来了诸多有利之处,人们在利用物联网的过程中会产生海量的数据,这些数据的不断增多加大了用户从中获取有用信息的难度,因此,物联网海量数据挖掘一直是研究的热点,面对物联网海量业务数据如何能够快速进行分析、处理、存储、挖掘,以实现有价值信息的快速提取,并服务于物联网商业决策,这是亟待解决的主要问题。将基于关联规则 Apriori 算法设计物联网海量数据挖掘系统。

  19. 云计算环境下的关联挖掘在图书销售中的研究%RESEARCH ON ASSOCIATION RULE MINING IN BOOK SALES UNDER CLOUD COMPUTING ENVIRONMENT

    Institute of Scientific and Technical Information of China (English)

    郭健; 任永功

    2014-01-01

    随着大数据时代的到来,如今人们已经淹没在海量的信息当中。云计算技术的出现,为解决在海量数据中高效地挖掘出有价值的信息问题提供了新的思路。利用云计算的分布式处理和虚拟化技术的优势,提出一种基于Map/Reduce编程模型与编码操作相结合的分布式关联规则挖掘算法———MCM-Apriori算法;设计并实现一个基于Hadoop云平台的网上图书销售系统。为进一步验证该系统的高效性,在该系统中利用MCM-Apriori算法进行图书推荐服务的应用。实验对比结果表明,该系统实现了快速分析与查询、可靠存储的功能,可以明显提高关联规则挖掘效率。%With the advent of big data era, people are now overwhelmed by massive information.The emergence of cloud computing tech-nology provides new idea for efficiently mining the valuable information from mass data.By utilising its advantages in distributed processing and virtualisation, we present a distributed associate rule mining algorithm ( MCM-Apriori) , which is based on the combination of Map/Re-duce programming model and coding operation.We also design and implement an online bookstore sales system with Hadoop framework using cloud computing.To further verify the efficiency of the system, we use MCM-Apriori algorithm to implement the application of book recom-mendations service in it.Contrasted experimental results demonstrate that this system achieves the functions of fast analysis and query as well as reliable storage, and can significantly improve the efficiency of association rules mining.

  20. Closed-set-based Discovery of Representative Association Rules Revisited

    CERN Document Server

    Balcázar, José L

    2010-01-01

    The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the "essential" or "representative" rules. We revisit the algorithm given by Kryszkiewicz (Int. Symp. Intelligent Data Analysis 2001, Springer-Verlag LNCS 2189, 350-359) for mining representative rules. We show that its output is sometimes incomplete, due to an oversight in its mathematical validation, and we propose an alternative complete generator that works within only slightly larger running times.

  1. ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURISTIC ALGORITHMS

    Directory of Open Access Journals (Sweden)

    Roghayeh Saneifar

    2015-11-01

    Full Text Available According to the increase of using data mining techniques in improving educational systems operations, Educational Data Mining has been introduced as a new and fast growing research area. Educational Data Mining aims to analyze data in educational environments in order to solve educational research problems. In this paper a new associative classification technique has been proposed to predict students final performance. Despite of several machine learning approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along with high accuracy. In this research work, we have employed Honeybee Colony Optimization and Particle Swarm Optimization to extract association rule for student performance prediction as a multi-objective classification problem. Results indicate that the proposed swarm based algorithm outperforms well-known classification techniques on student performance prediction classification problem.

  2. 基于关联规则的动态数据库快速挖掘算法%Dynamic Fast Database Mining Algorithm Based on Association Rules

    Institute of Scientific and Technical Information of China (English)

    王宗江

    2007-01-01

    关联规则的动态快速挖掘算法(Dynamic Fast Mining Algorithm,DFMA),不需要重复扫描原始数据库,克服关联规则挖掘最具代表性的方法Apriori算法耗时多、无法在线挖掘等诸多弱点.可支持在线挖掘及渐进式挖掘的需求.利用DFMA多层同步处理与更新的特性,搭配敏感度指数的定义,可以被用来挖掘对决策者有用的实时性信息.

  3. Image segmentation using association rule features.

    Science.gov (United States)

    Rushing, John A; Ranganath, Heggere; Hinke, Thomas H; Graves, Sara J

    2002-01-01

    A new type of texture feature based on association rules is described. Association rules have been used in applications such as market basket analysis to capture relationships present among items in large data sets. It is shown that association rules can be adapted to capture frequently occurring local structures in images. The frequency of occurrence of these structures can be used to characterize texture. Methods for segmentation of textured images based on association rule features are described. Simulation results using images consisting of man made and natural textures show that association rule features perform well compared to other widely used texture features. Association rule features are used to detect cumulus cloud fields in GOES satellite images and are found to achieve higher accuracy than other statistical texture features for this problem.

  4. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  5. Personalized Learning Based on Association Rules Mining in the E-learning System%基于关联规则挖掘的e-Learning系统中个性化学习推荐

    Institute of Scientific and Technical Information of China (English)

    浦慧忠

    2013-01-01

      随着信息时代与学习型社会的来临,基于因特网技术面向个性化学习的e_Learning的研究受到了普遍重视。本文基于Web挖掘中关联规则的经典Apriori算法,通过对学生高频访问路径和最大向前访问路径两个方面的挖掘,调整系统结构,从而实现向学生进行个性化学习内容的推荐。%With the advent of the information age and learning society, the research on E -learning based on the Internet technologies for personalized learning has got widespread attention. Based on the Apriori algorithm of association rules in Web mining, this paper makes studies on students’ frequent access paths and maximum forward access paths, adjusts the system structure and realizes students’ personalized learning.

  6. 统计分析及关联挖掘在大学生心理健康中的应用%Statistical Analysis and Association Rule Mining of Application in College Students’ Mental Health

    Institute of Scientific and Technical Information of China (English)

    亓文娟; 黄书城

    2014-01-01

    为深入了解影响大学生心理健康的主要因素以及心理症状之间的关系,以某高校2011级的学生心理测试数据为基础,采用统计分析和关联规则挖掘两种方法,从性别、学生干部、独生子女、来源地、家庭结构、家庭月收入等方面进行了分析研究,根据研究结果为高校开展大学生心理健康教育的规划、决策提供依据。%To better understand the relationship between the main factors affecting the mental health of college students as well as psychological symptoms between a university’s 2011’ students’ psychological test data, the research uses statistical analysis and association rule mining two species method. From gender, only-child or not, native place, student cadre or not, family structure, family’s monthly income to analysis research. According to the research results will help educators to get a deeper understanding of students’ mental health problems and provide a basis for them to make plans and decisions about college studnets’ psychological educaiton.

  7. 一种新的在图像关联规则挖掘中产生频繁项集的方法%A New Approach to Generate Frequent Itemsets in Mining Image Association Rules

    Institute of Scientific and Technical Information of China (English)

    杜琳; 陈云亮; 朱静

    2011-01-01

    This paper proposes an approach to generate frequent itemsets in mining image association rules------Frequent Item Tree. We utilize bSQ image format to re-organize the image data to apply frequent item tree. Moreover, this paper propose several optimization techniques, including frequent item tree pruning, semi-depth-first search, image mask, and multi-level gray generation, to decrease the time and space complexity.%提出了一种进行图像关联规则提取时产生频繁项集的方法--频繁项树.为便于频繁项树的运用,使用了bSQ的图像数据格式来重新组织图像数据,并在此基础上提出了频繁项树的截断、半深度优先、图像掩模和多层次灰度范围自动生成等优化技术,降低了算法的时间和空间复杂度,使其具有较高的运行效率和实用价值.

  8. Discovering market basket patterns using hierarchical association rules

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2015-10-01

    Full Text Available Association rules are a data mining method for discovering patterns of frequent item sets, such as products in a store that are frequently purchased at the same time by a customer (market basket analysis. A number of interestingness measures for association rules have been developed to date, but research has shown that there a dominant measure does not exist. Authors have mostly used objective measures, whereas subjective measures have rarely been investigated. This paper aims to combine objective measures such as support, confidence and lift with a subjective approach based on human expert selection in order to extract interesting rules from a real dataset collected from a large Croatian retail chain. Hierarchical association rules were used to enhance the efficiency of the extraction rule. The results show that rules that are more interesting were extracted using the hierarchical method, and that a hybrid approach of combining objective and subjective measures succeeds in extracting certain unexpected and actionable rules. The research can be useful for retail and marketing managers in planning marketing strategies, as well as for researchers investigating this field.

  9. Fast Algorithms of Mining Probability Functional Dependency Rules in Relational Database

    Institute of Scientific and Technical Information of China (English)

    TAO Xiaopeng; ZHOU Aoying; HU Yunfa

    2000-01-01

    This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presented to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.

  10. Sequential association rules in atonal music

    NARCIS (Netherlands)

    A. Honingh; T. Weyde; D. Conklin

    2009-01-01

    This paper describes a preliminary study on the structure of atonal music. In the same way as sequential association rules of chords can be found in tonal music, sequential association rules of pitch class set categories can be found in atonal music. It has been noted before that certain pitch class

  11. A Study of Frequent Cyclic Association Rule%经常性周期关联规则的研究

    Institute of Scientific and Technical Information of China (English)

    黄益民

    2000-01-01

    One of the most intportant data mining problems is mining association rules. In this paper,we considered the problem of founding frequent cyclic association rules. By exploiting the relationship between cycles and large itemsets,we identified optimization techniques that allow us to minimize the unnecessary amount of work performed during the data mining process. Furthermore,we demonstrated the effectiveness of these methods through a series of experiments.

  12. Rough Set Model for Discovering Hybrid Association Rules

    CERN Document Server

    Pandey, Anjana

    2009-01-01

    In this paper, the mining of hybrid association rules with rough set approach is investigated as the algorithm RSHAR.The RSHAR algorithm is constituted of two steps mainly. At first, to join the participant tables into a general table to generate the rules which is expressing the relationship between two or more domains that belong to several different tables in a database. Then we apply the mapping code on selected dimension, which can be added directly into the information system as one certain attribute. To find the association rules, frequent itemsets are generated in second step where candidate itemsets are generated through equivalence classes and also transforming the mapping code in to real dimensions. The searching method for candidate itemset is similar to apriori algorithm. The analysis of the performance of algorithm has been carried out.

  13. 加权模糊关联规则的研究%Research on Weighted Fuzzy Association Rules

    Institute of Scientific and Technical Information of China (English)

    陆建江

    2003-01-01

    Algorithms for mining quantitative association rules consider each attribute equally, but the attributes usu-ally have different importance. Two kinds of algorithms for mining the weighted fuzzy association rules are providedwith respect to two kinds of database. The first algorithm can effectively consider the importance of quantitative at-tributes, and considers that the importance of association rule is not increased with the amount of attributes in therule. The second algorithm not only considers the importance of quantitative attributes, but also considers that theimportance of association rule is increased with the amount of attributes in the rule.

  14. Using Fuzzy Association Rules to Design E-commerce Personalized Recommendation System

    Directory of Open Access Journals (Sweden)

    Guofang Kuang

    2013-09-01

    Full Text Available In order to improve the efficiency of fuzzy association rule mining, the paper defines the redundant fuzzy association rules, and strong fuzzy association rules redundant nature. As much as possible for more information in the e-commerce environment, and in the right form is a prerequisite for personalized recommendation. Personalized recommendation technology is a core issue of e-commerce automated recommendation system. Higher complexity than ordinary association rules algorithm fuzzy association rules, the low efficiency become a bottleneck in the practical application of fuzzy association rules algorithm. The paper presents using fuzzy association rules to design E-commerce personalized recommendation system. The experimental results show that the new algorithm to improve the efficiency of the implementation.

  15. Clustering Association Rules with Fuzzy Concepts

    Science.gov (United States)

    Steinbrecher, Matthias; Kruse, Rudolf

    Association rules constitute a widely accepted technique to identify frequent patterns inside huge volumes of data. Practitioners prefer the straightforward interpretability of rules, however, depending on the nature of the underlying data the number of induced rules can be intractable large. Even reasonably sized result sets may contain a large amount of rules that are uninteresting to the user because they are too general, are already known or do not match other user-related intuitive criteria. We allow the user to model his conception of interestingness by means of linguistic expressions on rule evaluation measures and compound propositions of higher order (i.e., temporal changes of rule properties). Multiple such linguistic concepts can be considered a set of fuzzy patterns (Fuzzy Sets and Systems 28(3):313-331, 1988) and allow for the partition of the initial rule set into fuzzy fragments that contain rules of similar membership to a user’s concept (Höppner et al., Fuzzy Clustering, Wiley, Chichester, 1999; Computational Statistics and Data Analysis 51(1):192-214, 2006; Advances in Fuzzy Clustering and Its Applications, chap. 1, pp. 3-30, Wiley, New York, 2007). With appropriate visualization methods that extent previous rule set visualizations (Foundations of Fuzzy Logic and Soft Computing, Lecture Notes in Computer Science, vol. 4529, pp. 295-303, Springer, Berlin, 2007) we allow the user to instantly assess the matching of his concepts against the rule set.

  16. On construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-01-01

    This paper is devoted to the study of approximate algorithms for minimization of partial association rule length. It is shown that under some natural assumptions on the class NP, a greedy algorithm is close to the best polynomial approximate algorithms for solving of this NP-hard problem. The paper contains various bounds on precision of the greedy algorithm, bounds on minimal length of rules based on an information obtained during greedy algorithm work, and results of the study of association rules for the most part of binary information systems. © 2009 Springer Berlin Heidelberg.

  17. 例外关联规则挖掘%Exception Rule Mining

    Institute of Scientific and Technical Information of China (English)

    印鉴; 周祥福

    2003-01-01

    Data mining is the process of discovering hidden structure or patterns in large quantities of data by usingkinds of analytic tools. The structure or patterns can help decision makers for advantageous actions. This paper intro-duces the concept of interestingness and reference rules ,and uses interestingness to estimate the information includedin rule,and then presents a method for mining exception rules while computing the interestingness according to thereference rules. Experiments compared with other methods show that the proposed method has the better effects.

  18. 关联度最强药物配伍的中医止呕类方数据挖掘%Rule of Remedy for Vomiting in Science of Traditional Chinese Medicine Formu as by Data Mining Based on Both Association and Correlation Rule

    Institute of Scientific and Technical Information of China (English)

    黄颖琦; 贾恒; 何前松; 冯泳

    2012-01-01

    Objective: To look for the rules of compatibility of medicines in historical prescriptions which treat vomiting, mining out new knowledge in science of traditional Chinese medicine ( TCM ) formulas, providing support for making new TCM remedy for vomiting. Method: Created a TCM formulas database to treat vomiting by collecting 985 TCM formulas. A threshold level datamining method which based on both association and correlation rule is used, to mining rule of compatibility of medicines in database of historical prescriptions for vomiting. Result; The most used drug is Zingiber officinale, the using frequency of Z. officinale is 61.23%. The most association and correlation couple of drugs are Poria cocos ( Schw. ) Wolf and Pinellia ternate ( Thunb. ) Breit. the correlation - confidence of this couple is 0. 114 4. The most association and correlation group of drugs are Z. officinale and P. cocos and P. ternate, the correlation-confidence of this group is 0. 295 4, Conclusion; The group of drugs which contain Z. officinale. and P. ternate and P. cocos is most used to treat vomiting. The ternate added in the P. cocos soup, which is ancient famous TCM formulas created by ancient great doctor ZHANG Zhong-jing, is proved that it is the key compatibility of drugs to treat vomiting.%目的:在古今中医文献中寻找止呕方剂配伍规律与用药特点,为中药止呕新药的开发提供理论支持.方法:收录古今止呕类方剂985首建立止呕类方剂数据库,运用相关置信度规则,对中医止呕方剂药物配伍的数据进行挖掘,利用剪枝方法筛选关联度最强的数据.结果:最常用的单味药物为生姜使用频率高达61.23%.关联性最强的核心药对是茯苓配伍姜半夏,其相关置信度为0.1144.关联度最强的药组为生姜、姜半夏、茯苓.结论:生姜、姜半夏、茯苓,其相关置信度为0.2954是中医止呕方剂中最常合用的药物配伍,其3种药物间存在极强的关联性,张仲景创制

  19. Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

    Institute of Scientific and Technical Information of China (English)

    YI Yunfei; LIN Fang; QIN Jun

    2008-01-01

    Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules. The paper gives the method to encode for the rules, the fitness function, the selecting, crossover, mutation and migration operator for the DAGA at the same time are designed.

  20. The Research of Intrusion Detection System Based on Improved Apriori Algorithm of Data Mining Association Rules%基于数据挖掘关联规则Apriori改进算法的入侵检测系统的研究

    Institute of Scientific and Technical Information of China (English)

    张浩; 景凤宣; 谢晓尧

    2011-01-01

    在众多的关联规则挖掘算法中,Apriori算法是最为经典的一个,但Apriori算法有以下缺陷:需要扫描多次数据库、生成大量候选集以及迭代求解频繁项集。因而提出了一种新方法,使Apriori算法产生的候选项集再通过数据库查找是否为频繁项集,从而提高算法的效率。最后针对入侵检测系统形成关联规则。实验结果表明,改进后的算法能有效地提高关联规则挖掘的效率。%Among a large number of association rule mining algorithms, Apriori algorithm is the most classic one ,but it has three deficiencies,including scanning databases many times, senerating a large number of candidate anthology, and mining frequent itemsets iteratively. This paper presented a method, Apriori algorithm to generate the candidate itemsets and then finds whether it is the frequent item- sets through the database, thereby enhancing the efficiency of the algorithm. Finally, intrusion detection system for the formation of association rules (IDS). The experimental results show that the optimized algorithm can effectively improve the efficiency of mining association rules.

  1. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup; Hansen, Lars Kai; Balslev, D.

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...... that the statistically motivated associations are well aligned with general neuroscientific knowledge....

  2. [Rules of acupoints combination of ancient acupuncture for Xiaoke based on data mining technology].

    Science.gov (United States)

    Xu, LinLing; Xu, Tianshu; Zhang, Jianbin

    2015-08-01

    The rules of acupoints combination of ancient acupuncture for Xiaoke are mainly explored. By retrieval on ancient literature, the database of acupuncture and moxibustion for Xiaoke is established; based on the database, association analysis between acupoints and symptoms is performed. According to the association analysis in 5 databases of Xiaoke database, Xiaoke database of kidney deficiency, Xiaoke-database of dry mnouth and thirst, Xiaoke database of difficult urination, Xiaoke database of drinking addiction, the results are mainly characterized with symptom differentiation combination, distal-local combination, local combination and front-back combination, which can nourish yin and clear heat. It is believed that establishment of TCM ancient literature database and exploration of data mining technology is a potential research orientation.

  3. 基于加权关联规则和文本挖掘的金融新闻传播 Agent 实现%WEIGHTED ASSOCIATION RULES AND TEXT MINING-BASED AGENT REALISATION OF FINANCIAL NEWS SPREADING

    Institute of Scientific and Technical Information of China (English)

    张人上; 曲开社

    2015-01-01

    针对传统的金融预测系统仅仅依靠股票价格和市场指数等定量数据而不能很好地满足实时性和高准确性的问题,提出一种基于加权关联规则和文本挖掘的新闻传播 Agent 实现方法。首先,利用中文知识与信息处理系统将每个新闻标题分离得到每个中文单词;然后,利用加权关联规则算法检测频繁出现在同一条新闻标题中的多个术语,并提取名词、动词和复合语;最后,根据新闻供给市场第一个交易日股票交易金融价格指数为提取的关键字分配权重,并根据新闻标题的权重值判断其对股票价格的影响程度。新闻标题特征数据库上的实验验证了该方法在金融新闻标题的实时信息发布应用中的可行性,实验结果表明,相比其他几种预测方法,该方法取得了更高的预测准确率和召回率。%Traditional financial prediction systems cannot well satisfy both real-time property and high accuracy because they rely on quantitative data of stock prices and market indexes only.For which,we propose the weighted association rules and text mining-based Agent realisation of news spreading.First,it employs Chinese knowledge and information processing system to divide every news headline into single Chinese characters.Then,it uses WAR algorithm to detect multiple terminologies frequently appearing in same news headlines,and extracts noun,verb and complex languages as well.Finally,it assigns weights to the extracted keywords according to the first day’s financial price index of stock transactions in news supplying market,and estimates the influence degree of weighted values of news headlines on stock prices. The effectiveness of the proposed method in application of real-time information delivery of financial news headlines has been verified by the experiments on news headlines characteristic database.Experimental results show that the proposed method achieves higher accuracy

  4. Recommendation System Based On Association Rules For Distributed E-Learning Management Systems

    Science.gov (United States)

    Mihai, Gabroveanu

    2015-09-01

    Traditional Learning Management Systems are installed on a single server where learning materials and user data are kept. To increase its performance, the Learning Management System can be installed on multiple servers; learning materials and user data could be distributed across these servers obtaining a Distributed Learning Management System. In this paper is proposed the prototype of a recommendation system based on association rules for Distributed Learning Management System. Information from LMS databases is analyzed using distributed data mining algorithms in order to extract the association rules. Then the extracted rules are used as inference rules to provide personalized recommendations. The quality of provided recommendations is improved because the rules used to make the inferences are more accurate, since these rules aggregate knowledge from all e-Learning systems included in Distributed Learning Management System.

  5. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Arup; Hansen, Lars Kai; Balslev, Daniela

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...

  6. Proceedings of the 2010 International Mine Water Association symposium : mine water and innovative thinking

    Energy Technology Data Exchange (ETDEWEB)

    Wolkersdorfer, C. [Cape Breton Univ., Sydney, NS (Canada); Freund, A. [CBU Press, Sydney, NS (Canada)] (eds.)

    2010-07-01

    Acid mine drainage is causing pollution in many waterways and ground water tables throughout the world. Hosted by the International Mine Water Association, this symposium examined issues related to acid mine drainage and explored various water treatment and water removal technologies and mine water chemistry analysis methods. Issues concerning the remediation and monitoring of abandoned mines were explored and recent innovations in geochemistry and geological engineering were presented. Water management issues in various types of geologic formations were included. The conference themes were: mine water issues and innovative mining methods; mine water engineering; mine water treatment, active systems; mine water treatment, passive systems; mine water geothermal, geochemistry and biochemistry uses; analysis of mine water and its chemistry; underground and surface coal mining; mine closures; legal and social aspects of mine water; mine tailings; the Cape Breton Development Corporation legacy; and the concept of a zero waste mine. The symposium featured 155 presentations, of which 32 have been catalogued separately for inclusion in this database. tabs., figs.

  7. Dynamic Programming Approach for Construction of Association Rule Systems

    KAUST Repository

    Alsolami, Fawaz

    2016-11-18

    In the paper, an application of dynamic programming approach for optimization of association rules from the point of view of knowledge representation is considered. The association rule set is optimized in two stages, first for minimum cardinality and then for minimum length of rules. Experimental results present cardinality of the set of association rules constructed for information system and lower bound on minimum possible cardinality of rule set based on the information obtained during algorithm work as well as obtained results for length.

  8. From data mining rules to medical logical modules and medical advices.

    Science.gov (United States)

    Gomoi, Valentin; Vida, Mihaela; Robu, Raul; Stoicu-Tivadar, Vasile; Bernad, Elena; Lupşe, Oana

    2013-01-01

    Using data mining in collaboration with Clinical Decision Support Systems adds new knowledge as support for medical diagnosis. The current work presents a tool which translates data mining rules supporting generation of medical advices to Arden Syntax formalism. The developed system was tested with data related to 2326 births that took place in 2010 at the Bega Obstetrics - Gynaecology Hospital, Timişoara. Based on processing these data, 14 medical rules regarding the Apgar score were generated and then translated in Arden Syntax language.

  9. Overlying strata movement rules and safety mining technology for the shallow depth seam proximity beneath a room mining goaf

    Institute of Scientific and Technical Information of China (English)

    Wang Fangtian; Zhang Cun; Zhang Xiaogang; Song Qi

    2015-01-01

    Aiming at the shallow depth seam proximity beneath a room mining goaf, due to that the shallow depth seam is exploited using the longwall mining and overlain by thin bedrock and thick loose sands, many accidents are likely to occur, including roof structure instability, roof step subsidence, damages of shield supports, and the face bumps triggered by the large area roof weighting, resulting in serious threats to the safety of underground miners and equipment. This paper analyses the overlying strata movement rules for the shallow seams using the physical simulation, the 3DEC numerical simulation and the field mea-surements. The results show that, in shallow seam mining, the overburden movement forms caved zone and fractured zone, the cracks develop continuously and reach the surface with the face advancing, and the development of surface cracks generally goes through four stages. With the application of loose blast-ing of residual pillars, reasonable mining height, and roof support and management, the safe, efficient and high recovery rate mining has been achieved in the shallow depth seam proximity beneath a room min-ing goaf.

  10. [Acupoints selection rules analysis of ancient acupuncture for urinary incontinence based on data mining technology].

    Science.gov (United States)

    Zhang, Wei; Tan, Zhigao; Cao, Juanshu; Gong, Houwu; Qin, Zuoai; Zhong, Feng; Cao, Yue; Wei, Yanrong

    2015-12-01

    Based on ancient literature of acupuncture in Canon of Chinese Medicine (4th edition), the articles regarding acupuncture for urinary incontinence were retrieved and collected to establish a database. By Weka data mining software, the multi-level association rules analysis method was applied to analyze the acupoints selection characteristics and rules of ancient acupuncture for treatment of urinary incontinence. Totally 356 articles of acupuncture for urinary incontinence were collected, involving 41 acupoints with a total frequency of 364. As a result, (1) the acupoints in the yin-meridian of hand and foot were highly valued, as the frequency of acupoints in yin-meridians was 2.6 times than that in yang-meridians, and the frequency of acupoints selected was the most in the liver meridian of foot-jueyin; (2) the acupoints in bladder meridian of foot-taiyang were also highly valued, and among three yang-meridians of foot, the frequency of acupoints in the bladder meridian of foot-taiyang was 54, accounting for 65.85% (54/82); (3) more acupoints selected were located in the lower limbs and abdomen; (4) specific acupoints in above meridians were mostly selected, presenting 73.2% (30/41) to the ratio of number and 79.4% (289/364) to the frequency, respectively; (5) Zhongji (CV 3), the front-mu point of bladder meridian, was seldom selected in the ancient acupuncture literature, which was different from modern literature reports. The results show that urinary incontinence belongs to external genitalia diseases, which should be treated from yin, indicating more yin-meridians be used and special acupoints be focused on. It is essential to focus inheritance and innovation in TCM clinical treatment, and applying data mining technology to ancient literature of acupuncture could provide classic theory basis for TCM clinical treatment.

  11. Application Research in Medicine Based on Texture Features Association Rules Mining%基于纹理特征的关联规则挖掘方法的医学应用

    Institute of Scientific and Technical Information of China (English)

    于超; 王璐; 吴琼; 裴志松

    2012-01-01

    In order to meet the requirement of medical image auxiliary diagnosis, we present a feature fusion algorithm based on Apriori algorithm: texture features and patient natural features in HIS ( Hospital Information System). Accordingly, the combination of pruning methods associated rule base, prototype system for a CT (Computer Tomography) image is divided into normal and abnormal categories. Experiments were evaluated in accordance with the system, showing that association rules established by the algorithm library, in the auxiliary doctor diagnosed, with good results.%为满足借助医学图像辅助诊断的要求,提出了一种基于Apriori算法的特征融合算法:融合图像的纹理特征和医院信息系统( HIS:Hospital Information System)中病患自然特征.结合剪枝方法建立关联规则库,实现了一个可以自动将CT( Computer Tomography)图像分为正常与异常两类的原型系统.依据该系统进行了评价实验.实验表明,通过该算法建立的关联规则库,对辅助医生诊断具有较好的效果.

  12. 关联规则挖掘在软件销售策略中的研究与应用%Research and Application of Association Rule Mining in Software Sales' Strategy

    Institute of Scientific and Technical Information of China (English)

    杨盛苑; 白粒沙

    2013-01-01

      针对某软件企业的多个不同产品建立有效的销售策略,提出使用Apriori算法对客户与软件产品之间的关系进行数据挖掘的方法,建立客户与软件产品之间的关联模型。使用此模型对某软件企业的销售数据进行挖掘,发现客户软件产品之间的潜在关系,指导决策者和销售人员对不同的客户实施不同的营销策略,提高客户满意度,增加软件产品的销售量。通过实例验证此模型具有一定的可信度。%Establishes the effective marketing strategies for a software company's different products, pro-poses a data mining method between the relationship of customers and software products using Apriori algorithm, and establishes an association model between customers and software prod-ucts. Using this model to mine the sales data in a software enterprise can find the potential re-lationship between the customers and the software products,which can guide the decision mak-ers and the salesperson to carry out different marketing strategies for different customers,im-prove customers' satisfaction, and increase software product sales. It's verified by an example that this model has certain reliability.

  13. Realization of the English Assisted Learning System Based on Rule Mining

    Directory of Open Access Journals (Sweden)

    Li Kun

    2015-01-01

    Full Text Available This paper first makes a brief introduction on the research progress of artificial intelligence, then introduces the basic structure of the whole English assisted learning system from the angle of system functional requirements, and finally discusses the realization of functions of the English assisted learning system under the support of rule-based data mining, aiming at attracting more attentions.

  14. FSRM: A Fast Algorithm for Sequential Rule Mining

    Directory of Open Access Journals (Sweden)

    Anjali Paliwal

    2014-10-01

    Full Text Available Recent developments in computing and automation technologies have resulted in computerizing business and scientific applications in various areas. Turing the massive amounts of accumulated information into knowledge is attracting researchers in numerous domains as well as databases, machine learning, statistics, and so on. From the views of information researchers, the stress is on discovering meaningful patterns hidden in the massive data sets. Hence, a central issue for knowledge discovery in databases, additionally the main focus of this paper, is to develop economical and scalable mining algorithms as integrated tools for management systems.

  15. DoS detections based on association rules and frequent itemsets

    Institute of Scientific and Technical Information of China (English)

    George S Oreku; LI Jian-zhong; Fredrick J Mtenzi

    2008-01-01

    To detect the DoS in networks by applying association rules mining techniques, we propose that asso-ciation rules and frequent itemsets can be employed to find DoS pattern in packet streams which describe trafficand user behaviors. The method extracts information from the log analysis of submitted packets using the algo-rithm which depends on the definition of the intrusion. Large itemsets were extracted to represent the super facts to build the association analysis for the intrusion. Network data files were analysed for experiments. The analy-sis and experimental results are encouraging with better performance as packet frequency number increases.

  16. Integrated analysis of gene expression by association rules discovery

    Directory of Open Access Journals (Sweden)

    Carazo Jose M

    2006-02-01

    Full Text Available Abstract Background Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. Results In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work. Conclusion The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package.

  17. Formal and Computational Properties of the Confidence Boost of Association Rules

    CERN Document Server

    Balcázar, José L

    2011-01-01

    Some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. One can push the intuition of redundancy further and find an intuitive notion of interest of an association rule, in terms of its "novelty" with respect to other rules. Namely: an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher? We propose to measure such a sort of "novelty" through the confidence boost of a rule, which encompasses two previous similar notions (confidence width and rule blocking, of which the latter is closely related to the earlier measure "improvement"). Acting as a complement to confidence and support, the confidence boost helps to obtain small and crisp sets of mined association rules, and solves the well-known problem that, in certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two version...

  18. Greedy algorithms withweights for construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-09-10

    This paper is devoted to the study of approximate algorithms for minimization of the total weight of attributes occurring in partial association rules. We consider mainly greedy algorithms with weights for construction of rules. The paper contains bounds on precision of these algorithms and bounds on the minimal weight of partial association rules based on an information obtained during the greedy algorithm run.

  19. Association and Sequence Mining in Web Usage

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-06-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. The focus of this paper is to provide an overview how to use frequent pattern techniques for discovering different types of patterns in a Web log database. In this paper we will focus on finding association as a data mining technique to extract potentially useful knowledge from web usage data. I implemented in Java, using NetBeans IDE, a program for identification of pages’ association from sessions. For exemplification, we used the log files from a commercial web site.

  20. Research on e-commerce commodity recommendation system based on mining algorithm of weighted association rules%基于加权关联规则挖掘算法的电子商务商品推荐系统研究

    Institute of Scientific and Technical Information of China (English)

    郝海涛; 马元元

    2016-01-01

    To solve the direct commodity rapid and accurate matching problem between electronic shoppers and merchants, the e⁃commerce commodity recommendation system based on mining algorithm of weighted association rules is researched. Ai⁃ming at the insufficiency of the classic Apriori algorithm,a new weighted fuzzy association rules mining algorithm is put forward to ensure the downward closure of frequent item sets. The work flow of the recommendation system was tested through the struc⁃tural design of e⁃commerce recommendation system,data preprocessing module design and recommendation module design. The hit rate is selected as the evaluation standard of different recommendation models. The contrastive analysis for the practical col⁃lected data was conducted with the half⁃off cross test method. The experimental results show that the hit rate of Top⁃N products in association rule set is significantly higher than that of the interest recommendation method and best selling recommendation method.%为了解决电子购物者和商家直接的商品快速、准确匹配问题,进行基于加权关联规则挖掘算法的电子商务商品推荐系统研究。首先指出了经典Apriori算法的缺点和不足,并提出一种新的加权模糊关联挖掘模型算法,以保证频繁项集的向下封闭性;通过对电子商务推荐系统的结构化设计、数据预处理模块设计、推荐模块设计,完成了推荐系统的工作流程测试;最后选取命中率作为不同推荐模型的评价标准,通过五折交叉试验法对实际采集数据进行了对比分析,试验结果表明关联规则集的Top⁃N产品命中率要明显高于兴趣推荐和畅销推荐法。

  1. Mining Compatibility Rules from Irregular Chinese Traditional Medicine Database by Apriori Agorithm

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper aims to mine the knowledge and rules on compatibility of drugs from the prescriptions for curing arrhythmia in the Chinese traditional medicine database by Apriori algorithm. For data preparation, 1 113 prescriptions for arrhythmia, including 535herbs (totally 10884 counts of herbs) were collected into the database. The prescription data were preprocessed through redundancy reduction, normalized storage, and knowledge induction according to the pretreatment demands of data mining. Then the Apriori algorithm was used to analyze the data and form the related technical rules and treatment procedures. The experimental result of compatibility of drugs for curing arrhythmia from the Chinese traditional medicine database shows that the prescription compatibility obtained by Apriori algorithm generally accords with the basic law of traditional Chinese medicine for arrhythmia. Some special compatibilities unreported were also discovered in the experiment, which may be used as the basis for developing new prescriptions for arrhythmia.

  2. Mining unexpected temporal associations: applications in detecting adverse drug reactions.

    Science.gov (United States)

    Jin, Huidong Warren; Chen, Jie; He, Hongxing; Williams, Graham J; Kelman, Chris; O'Keefe, Christine M

    2008-07-01

    In various real-world applications, it is very useful mining unanticipated episodes where certain event patterns unexpectedly lead to outcomes, e.g., taking two medicines together sometimes causing an adverse reaction. These unanticipated episodes are usually unexpected and infrequent, which makes existing data mining techniques, mainly designed to find frequent patterns, ineffective. In this paper, we propose unexpected temporal association rules (UTARs) to describe them. To handle the unexpectedness, we introduce a new interestingness measure, residual-leverage, and develop a novel case-based exclusion technique for its calculation. Combining it with an event-oriented data preparation technique to handle the infrequency, we develop a new algorithm MUTARC to find pairwise UTARs. The MUTARC is applied to generate adverse drug reaction (ADR) signals from real-world healthcare administrative databases. It reliably shortlists not only six known ADRs, but also another ADR, flucloxacillin possibly causing hepatitis, which our algorithm designers and experiment runners have not known before the experiments. The MUTARC performs much more effectively than existing techniques. This paper clearly illustrates the great potential along the new direction of ADR signal generation from healthcare administrative databases.

  3. Mining tree-query associations in graphs

    CERN Document Server

    Hoekx, Eveline

    2010-01-01

    New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

  4. A Novel Texture Classification Procedure by using Association Rules

    Directory of Open Access Journals (Sweden)

    L. Jaba Sheela

    2008-11-01

    Full Text Available Texture can be defined as a local statistical pattern of texture primitives in observer’s domain of interest. Texture classification aims to assign texture labels to unknown textures, according to training samples and classification rules. Association rules have been used in various applications during the past decades. Association rules capture both structural and statistical information, and automatically identify the structures that occur most frequently and relationships that have significant discriminative power. So, association rules can be adapted to capture frequently occurring local structures in textures. This paper describes the usage of association rules for texture classification problem. The performed experimental studies show the effectiveness of the association rules. The overall success rate is about 98%.

  5. Generalization-based discovery of spatial association rules with linguistic cloud models

    Institute of Scientific and Technical Information of China (English)

    杨斌; 田永青; 朱仲英

    2004-01-01

    Extraction of interesting and general spatial association rules from large spatial databases is an important task in the development of spatial database systems. In this paper, we investigate the generalization-based knowledge discovery mechanism that integrates attribute-oriented induction on nonspatial data and spatial merging and generalization on spatial data. Furthermore, we present linguistic cloud models for knowledge representation and uncertainty handling to enhance current generalization-based method. With these models, spatial and nonspatial attribute values are well generalized at higher-concept levels, allowing discovery of strong spatial association rules. Combining the cloud model based generalization method with Apriori algorithm for mining association rules from a spatial database shows the benefits in effectiveness and flexibility.

  6. Rule Based System for Enhancing Recall for Feature Mining from Short Sentences in Customer Review Documents

    Directory of Open Access Journals (Sweden)

    Tanvir Ahmad

    2012-06-01

    Full Text Available This paper discovers rules for enhancing the recall values of sentences containing opinions from customer review documents. It does so by mining the features and opinion from different blogs, news site, and review sites. With the advent of numerous web sites which are posting online reviews and opinion there has been exponential growth of user generated contents. Since almost all the contents are stored in unstructured or semi-structured format, mining of features and opinions from it has become a challenging task. The paper extracts features and thereby opinions sentences using semantic and linguistic analysis of text documents. The polarity of the extracted opinions is established using numeric score values obtained through Senti- WordNet. The system shows that normal rules discovered earlier are not sufficient to improve recall values as some of the opinions does not contain sentences which are linguistically correct but they express the main idea what the writer wants to convey about his opinion on a particular product. Our experiment uses a method which first identifies short sentences and then uses rules which can be applied on those sentences so that the recall values are enhanced. The paper also applies rules on sentences which are linguistically and syntactically incorrect. The efficacy of the system is established through experimentation over customer reviews on four different models of digital camera, and iPhone.

  7. Finding Exception For Association Rules Via SQL Queries

    Directory of Open Access Journals (Sweden)

    Luminita DUMITRIU

    2000-12-01

    Full Text Available Finding association rules is mainly based on generating larger and larger frequent set candidates, starting from frequent attributes in the database. The frequent sets can be organised as a part of a lattice of concepts according to the Formal Concept Analysis approach. Since the lattice construction is database contents-dependent, the pseudo-intents (see Formal Concept Analysis are avoided. Association rules between concept intents (closed sets A=>B are partial implication rules, meaning that there is some data supporting A and (not B; fully explaining the data requires finding exceptions for the association rules. The approach applies to Oracle databases, via SQL queries.

  8. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    Science.gov (United States)

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  9. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution. The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post

  10. An Object Extraction Model Using Association Rules and Dependence Analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Extracting objects from legacy systems is a basic step insystem's obje ct-orientation to improve the maintainability and understandability of the syst e ms. A new object extraction model using association rules an d dependence analysis is proposed. In this model data are classified by associat ion rules and the corresponding operations are partitioned by dependence analysis.

  11. Multi-agent-based modeling for extracting relevant association rules using a multi-criteria analysis approach

    Directory of Open Access Journals (Sweden)

    Addi Ait-Mlouk

    2016-06-01

    Full Text Available Abstract Recently, association rule mining plays a vital role in knowledge discovery in database. In fact, in most cases, the real datasets lead to a very large number of rules, which do not allow users to make their own selection of the most relevant. The difficult task is mining useful and non-redundant rules. Several approaches have been proposed, such as rule clustering, informative cover method and quality measurements. Another way to selecting relevant association rules, we believe that it is necessary to integrate a decisional approach within the knowledge discovery process. Therefore, in this paper, we propose an approach to discover a category of relevant association rules based on multi-criteria analysis. In other side, the general process of association rules extraction becomes more and more complex, to solve such problem, we also proposed a multi-agent system for modeling the different process of our proposed approach. Therefore, we conclude our work by an empirical study applied to a set of banking data to illustrate the performance of our approach.

  12. Analysis 320 coal mine accidents using structural equation modeling with unsafe conditions of the rules and regulations as exogenous variables.

    Science.gov (United States)

    Zhang, Yingyu; Shao, Wei; Zhang, Mengjia; Li, Hejun; Yin, Shijiu; Xu, Yingjun

    2016-07-01

    Mining has been historically considered as a naturally high-risk industry worldwide. Deaths caused by coal mine accidents are more than the sum of all other accidents in China. Statistics of 320 coal mine accidents in Shandong province show that all accidents contain indicators of "unsafe conditions of the rules and regulations" with a frequency of 1590, accounting for 74.3% of the total frequency of 2140. "Unsafe behaviors of the operator" is another important contributory factor, which mainly includes "operator error" and "venturing into dangerous places." A systems analysis approach was applied by using structural equation modeling (SEM) to examine the interactions between the contributory factors of coal mine accidents. The analysis of results leads to three conclusions. (i) "Unsafe conditions of the rules and regulations," affect the "unsafe behaviors of the operator," "unsafe conditions of the equipment," and "unsafe conditions of the environment." (ii) The three influencing factors of coal mine accidents (with the frequency of effect relation in descending order) are "lack of safety education and training," "rules and regulations of safety production responsibility," and "rules and regulations of supervision and inspection." (iii) The three influenced factors (with the frequency in descending order) of coal mine accidents are "venturing into dangerous places," "poor workplace environment," and "operator error."

  13. Association Rule Extraction from XML Stream Data for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Juryon Paik

    2014-07-01

    Full Text Available With the advances of wireless sensor networks, they yield massive volumes of disparate, dynamic and geographically-distributed and heterogeneous data. The data mining community has attempted to extract knowledge from the huge amount of data that they generate. However, previous mining work in WSNs has focused on supporting simple relational data structures, like one table per network, while there is a need for more complex data structures. This deficiency motivates XML, which is the current de facto format for the data exchange and modeling of a wide variety of data sources over the web, to be used in WSNs in order to encourage the interchangeability of heterogeneous types of sensors and systems. However, mining XML data for WSNs has two challenging issues: one is the endless data flow; and the other is the complex tree structure. In this paper, we present several new definitions and techniques related to association rule mining over XML data streams in WSNs. To the best of our knowledge, this work provides the first approach to mining XML stream data that generates frequent tree items without any redundancy.

  14. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

    Institute of Scientific and Technical Information of China (English)

    Cai-Yan Jia; Xie-Ping Gao

    2005-01-01

    One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off.

  15. Interestingness Rule Mining Algorithm Based on Information Entropy%基于信息熵的兴趣度规则挖掘算法

    Institute of Scientific and Technical Information of China (English)

    金洲; 王儒敬

    2014-01-01

    With the development of data collection and storage techniques, excessive and unorderly rules are generated by traditional association rule mining, which can not meet interest of users. To solve this problem, an interestingness measure of association rules based on information entropy is proposed to mine interestingness association rules. Correlation analysis for categorical variables is adopted to eliminate false and erroneous rules from the primitive set, and a framework for evaluating the interestingness degree of rules based on information entropy is proposed. Since the method does not depend on the prior knowledge of users, it can represent the information hidden in the data accurately. Simulation results on both real and synthetic datasets show that the proposed algorithm performs better than the traditional algorithms, and it discovers interestingness rules from large database efficiently.%传统关联规则挖掘方法通常产生海量杂乱的规则,它们对用户而言是冗余的。为解决该问题,文中提出一种基于信息熵的兴趣度规则挖掘算法。通过变量相关性分析剔除原始规则集中虚假、错误的规则,并在信息熵的基础上提出度量关联规则兴趣度的框架。该算法不依赖用户先验知识,能无偏地表达数据包含的信息。在真实和仿真数据集上的实验验证该算法能有效挖掘兴趣度规则,且性能比传统算法更优。

  16. [Analysis on medication rules of state medical master Yan Zhenghua from prescriptions with citri reticulatae pericarpium based on data mining].

    Science.gov (United States)

    Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Bing; Zhang, Xiao-Meng; Yang, Bing; Sheng, Xiao-Guang

    2014-02-01

    The prescriptions containing pericarpium citri reticulatae that built by Professor. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. After analyzed by data mining, such as apriori algorithm, the frequency of single medicine, the frequency of drug combination, the association rules between drugs and core drug combinations can be get from the database. Through the analysis of 1 027 prescriptions with pericarpium citri reticulatae, these prescriptions were commonly used to treat stomach aches, cough and other syndromes. The most frequency drug combinations were "Citri Reticulatae Pericarpium-Poria", "Paeoniae Radix Rubra-Citri Reticulatae Pericarpium" and so on. The drug association rules that the confidence was 1 were "Glycyrrhizae Radix ex Rhizoma --> Citri Reticulatae Pericarpium", "Paeoniae Alba Radix-Cyperi Rhizoma --> Citri Reticulatae Pericarpium", "Poria --> Citri Reticulatae Pericarpium", and so on. The drugs in the prescriptions containing pericarpium citri reticulatae that built by Professor Yan mostly had the effects of regulating the flow of Qi and invigorate blood circulation, which reflected the clearly thought when making prescriptions.

  17. Research on spatial state conversion rule mining and stochastic predicting based on CA

    Science.gov (United States)

    Li, Xinyun; Kong, Xiangqiang

    2007-06-01

    Spatial dynamic prediction in GIS is the process of spatial calculation that infers the thematic maps in future according to the historical thematic maps, and it is space-time calculation from map to map. There is great application value that spatial dynamic prediction applied to the land planning, urban land-use planning and town planning, but there is some imperfect in method and technique at present. The main technical difficulty is excavation and expression of spatial state conversion rule. In allusion to the deficiency in spatial dynamic prediction using CA, the method which excavated spatial state conversion rule based on spatial data mining was put forward. Stochastic simulation mechanism was put into the prediction calculating based on state conversion rule. The result of prediction was more rational and the relation between the prediction steps and the time course was clearer. The method was applied to prediction of spatial structure change of urban land-use in Jinan. The Urban land-use change maps were predicted in 2006 and 2010 by using the land-use maps in 1998 and 2002. The result of this test was rational by analyzing.

  18. Design and implementation of data mining tools

    CERN Document Server

    Thuraisingham, Bhavani; Awad, Mamoun

    2009-01-01

    DATA MINING TECHNIQUES AND APPLICATIONS IntroductionTrendsData Mining Techniques and ApplicationsData Mining for Cyber Security: Intrusion DetectionData Mining for Web: Web Page Surfing PredictionData Mining for Multimedia: Image ClassificationOrganization of This BookNext StepsData Mining TechniquesIntroductionOverview of Data Mining Tasks and TechniquesArtificial Neural NetworksSupport Vector MachinesMarkov ModelAssociation Rule Mining (ARM)Multiclass ProblemImage MiningSummaryData Mining ApplicationsIntroductionIntrusion DetectionWeb Page Surfing PredictionImage ClassificationSummaryDATA MI

  19. NIA2: A fast indirect association mining algorithm

    Institute of Scientific and Technical Information of China (English)

    NI Min; XU Xiao-fei; DENG Sheng-chun; WEN Xiao-xian

    2005-01-01

    Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis,web - log analysis, recommended system, etc. Existing indirect association mining algorithms are mostly based on the notion of post - processing of discovery of frequent item sets. In the mining process, all frequent item sets need to be generated first, and then they are filtered and joined to form indirect associations. We have presented an indirect association mining algorithm (NIA) based on anti - monotonicity of indirect associations whereas k candidate indirect associations can be generated directly from k - 1 candidate indirect associations,without all frequent item sets generated. We also use the frequent itempair support matrix to reduce the time and memory space needed by the algorithm. In this paper, a novel algorithm (NIA2) is introduced based on the generation of indirect association patterns between itempairs through one item mediator sets from frequent itempair support matrix. A notion of mediator set support threshold is also presented. NIA2 mines indirect association patterns directly from the dataset, without generating all frequent item sets. The frequent itempair support matrix and the notion of using tm as the support threshold for mediator sets can significantly reduce the cost of joint operations and the search process compared with existing algorithms. Results of experiments on a realword web log dataset have proved NIA2 one order of magnitude faster than existing algorithms.

  20. Data Mining for Gene Networks Relevant to Poor Prognosis in Lung Cancer via Backward-Chaining Rule Induction

    Directory of Open Access Journals (Sweden)

    Zhihua Chen

    2007-01-01

    Full Text Available We use Backward Chaining Rule Induction (BCRI, a novel data mining method for hypothesizing causative mechanisms, to mine lung cancer gene expression array data for mechanisms that could impact survival. Initially, a supervised learning system is used to generate a prediction model in the form of “IF THEN ” style rules. Next, each antecedent (i.e. an IF condition of a previously discovered rule becomes the outcome class for subsequent application of supervised rule induction. This step is repeated until a termination condition is satisfi ed. “Chains” of rules are created by working backward from an initial condition (e.g. survival status. Through this iterative process of “backward chaining,” BCRI searches for rules that describe plausible gene interactions for subsequent validation. Thus, BCRI is a semi-supervised approach that constrains the search through the vast space of plausible causal mechanisms by using a top-level outcome to kick-start the process. We demonstrate the general BCRI task sequence, how to implement it, the validation process, and how BCRI-rules discovered from lung cancer microarray data can be combined with prior knowledge to generate hypotheses about functional genomics.

  1. Optimising synaptic learning rules in linear associative memories.

    Science.gov (United States)

    Dayan, P; Willshaw, D J

    1991-01-01

    Associative matrix memories with real-valued synapses have been studied in many incarnations. We consider how the signal/noise ratio for associations depends on the form of the learning rule, and we show that a covariance rule is optimal. Two other rules, which have been suggested in the neurobiology literature, are asymptotically optimal in the limit of sparse coding. The results appear to contradict a line of reasoning particularly prevalent in the physics community. It turns out that the apparent conflict is due to the adoption of different underlying models. Ironically, they perform identically at their co-incident optima. We give details of the mathematical results, and discuss some other possible derivations and definitions of the signal/noise ratio.

  2. Collaborative Data Mining Tool for Education

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

    2009-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…

  3. Flood risk zoning using a rule mining based on ant colony algorithm

    Science.gov (United States)

    Lai, Chengguang; Shao, Quanxi; Chen, Xiaohong; Wang, Zhaoli; Zhou, Xiaowen; Yang, Bing; Zhang, Lilan

    2016-11-01

    Risk assessment is a preliminary step in flood management and mitigation, and risk zoning provides a quantitative measure of flood risk. The difficulty in flood risk zoning is to deal with the complicated non-linear relationship among indices and risk levels. To solve this problem, the ant colony algorithm based on rule mining (Ant-Miner) is promoted in this paper to map the regional flood risk at grid scale. For the case study in the Dongjiang River Basin in Southern China, 11 and 14 indices (without and with the socio-economic indices considered) are respectively chosen to construct the zoning model based on Ant-Miner. The results show that Ant-Miner exhibits higher accuracy and more simple rules that can be used to generate flood risk zoning map quickly and easily than decision tree method (DT); compared to random forest (RF) and fuzzy comprehensive evaluation (FCE), Ant-Miner has significant advantages both in implementation step-reducing and computing time-saving. Although the comprehensive measure and natural hazard measure of flood risk distributed similarly over the entire region, the former one which considered the socio-economic indices is more reasonable in term of real impact to natural and socio-economy. The areas with high-risk level obtained in this paper matched well with the integrated risk zoning map and the inundation areas of historical floods, suggesting that the proposed Ant-Miner method is capable of zoning the flood risk at grid scale. This study shows the potential to provide a novel and successful approach to flood risk zoning. Evaluation results provide a reference for flood risk management, prevention, and reduction of natural disasters in the study basin.

  4. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  5. Study on the change rule of groundwater level and its impacts on vegetation at arid mining area

    Institute of Scientific and Technical Information of China (English)

    LEI Shao-gang; BIAN Zheng-fu; ZHANG Ri-chen; LI Lin

    2007-01-01

    The shallow groundwater in Shendong mining area was broken because of large-scale underground mining activities. Selecting 32201 working-face as research area,analyzed the change rule of groundwater level and aquifer thickness under mining impact with a large number of water level observation data. Then, the impacts of groundwater level change on vegetation were analyzed by the relationship theory of arid area groundwater and vegetation. The results show that the aquifer structure and the water condition of supply flow and drainage are changed by the water proof mining. The groundwater level recovere only a little compared with the original groundwater level in two years. But the great change of groundwater level do not have notable influences on vegetation of this mining area, and further study indicates that there are certain conditions where groundwater level change impacted on vegetation. When the influence of groundwater level change was evaluated, the plant ecological water level, warning water level and spatial distribution character of original groundwater and mining-impacted groundwater-level change should be integrated.

  6. Combinatorial Approach of Associative Classification

    OpenAIRE

    P. R. Pal; R.C. Jain

    2010-01-01

    Association rule mining and classification are two important techniques of data mining in knowledge discovery process. Integration of these two has produced class association rule mining or associative classification techniques, which in many cases have shown better classification accuracy than conventional classifiers. Motivated by this study we have explored and applied the combinatorial mathematics in class association rule mining in this paper. Our algorithm is based on producing co...

  7. CTSS: A Tool for Efficient Information Extraction with Soft Matching Rules for Text Mining

    Directory of Open Access Journals (Sweden)

    A. Christy

    2008-01-01

    Full Text Available The abundance of information available digitally in modern world had made a demand for structured information. The problem of text mining which dealt with discovering useful information from unstructured text had attracted the attention of researchers. The role of Information Extraction (IE software was to identify relevant information from texts, extracting information from a variety of sources and aggregating it to create a single view. Information extraction systems depended on particular corpora and were poor in recall values. Therefore, developing the system as domain-independent as well as improving the recall was an important challenge for IE. In this research, the authors proposed a domain-independent algorithm for information extraction, called SOFTRULEMINING for extracting the aim, methodology and conclusion from technical abstracts. The algorithm was implemented by combining trigram model with softmatching rules. A tool CTSS was constructed using SOFTRULEMINING and was tested with technical abstracts of www.computer.org and www.ansinet.org and found that the tool had improved its recall value and therefore the precision value in comparison with other search engines.

  8. Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

    CERN Document Server

    Ahrens, R; The ATLAS collaboration; Kalinin, S; Maettig, P; Sandhoff, M; dos Santos, T; Volkmer, F

    2012-01-01

    The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based “PanDA” job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job’s owner immideatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehaviour. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitorin...

  9. Mining φ-Frequent Itemset Using FP-Tree

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ-association rule mining. It allows people to have different interests on different itemsets that are the need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP-tree for mining φ-frequent itemset is presented. It is shown by experiments that the proposed method is efficient and scalable over large databases.

  10. 一种基于决策表的分类规则挖掘新算法%A New Algorithm of Mining Classification Rules Based on Decision Table

    Institute of Scientific and Technical Information of China (English)

    谢娟英; 冯德民

    2003-01-01

    The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car's classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.

  11. Discovery of Web Topic-Specific Association Rules%Web主题关联知识自学习算法

    Institute of Scientific and Technical Information of China (English)

    杨沛; 郑启伦; 彭宏

    2003-01-01

    There are hidden and rich information for data mining in the topology of topic-specific websites. A new topic-specific association rules mining algorithm is proposed to further the research on this area. The key idea is to analyze the frequent hyperlinked relati ons between pages of different topics. In the topic-specific area, if pages of onetopic are frequently hyperlinked by pages of another topic, we consider the two topics are relevant. Also, if pages oftwo different topics are frequently hyperlinked together by pages of the other topic, we consider the two topics are relevant.The initial experiments show that this algorithm performs quite well while guiding the topic-specific crawling agent and it can be applied to the further discovery and mining on the topic-specific website.

  12. DMET-Miner: Efficient discovery of association rules from pharmacogenomic data.

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro H; Cannataro, Mario

    2015-08-01

    Microarray platforms enable the investigation of allelic variants that may be correlated to phenotypes. Among those, the Affymetrix DMET (Drug Metabolism Enzymes and Transporters) platform enables the simultaneous investigation of all the genes that are related to drug absorption, distribution, metabolism and excretion (ADME). Although recent studies demonstrated the effectiveness of the use of DMET data for studying drug response or toxicity in clinical studies, there is a lack of tools for the automatic analysis of DMET data. In a previous work we developed DMET-Analyzer, a methodology and a supporting platform able to automatize the statistical study of allelic variants, that has been validated in several clinical studies. Although DMET-Analyzer is able to correlate a single variant for each probe (related to a portion of a gene) through the use of the Fisher test, it is unable to discover multiple associations among allelic variants, due to its underlying statistic analysis strategy that focuses on a single variant for each time. To overcome those limitations, here we propose a new analysis methodology for DMET data based on Association Rules mining, and an efficient implementation of this methodology, named DMET-Miner. DMET-Miner extends the DMET-Analyzer tool with data mining capabilities and correlates the presence of a set of allelic variants with the conditions of patient's samples by exploiting association rules. To face the high number of frequent itemsets generated when considering large clinical studies based on DMET data, DMET-Miner uses an efficient data structure and implements an optimized search strategy that reduces the search space and the execution time. Preliminary experiments on synthetic DMET datasets, show how DMET-Miner outperforms off-the-shelf data mining suites such as the FP-Growth algorithms available in Weka and RapidMiner. To demonstrate the biological relevance of the extracted association rules and the effectiveness of the

  13. Mining Branching Rules from Past Survey Data with an Illustration Using a Geriatric Assessment Survey for Older Adults with Cancer

    Directory of Open Access Journals (Sweden)

    Daniel R. Jeske

    2016-05-01

    Full Text Available We construct a fast data mining algorithm that can be used to identify high-frequency response patterns in historical surveys. Identification of these patterns leads to the derivation of question branching rules that shorten the time required to complete a survey. The data mining algorithm allows the user to control the error rate that is incurred through the use of implied answers that go along with each branching rule. The context considered is binary response questions, which can be obtained from multi-level response questions through dichotomization. The algorithm is illustrated by the analysis of four sections of a geriatric assessment survey used by oncologists. Reductions in the number of questions that need to be asked in these four sections range from 33% to 54%.

  14. An application of improved association rules of an association graph in a recommendation system%基于关联图的改进关联规则在推荐系统中的应用

    Institute of Scientific and Technical Information of China (English)

    王林林; 石冰; 胡元; 邢海华

    2011-01-01

    提出了推荐模型中的关联规则挖掘方法的改进,给出了自定义的页面权值的定义,并改进了基于关联图的关联规则挖掘算法,将页面权值应用于关联规则的挖掘中。此算法是利用Web日志中经过预处理后得到的数据进行规则挖掘,将处理后的数据应用正态分布函数来得到页面权值。用页面权值重新计算支持度,最后将得到的支持度应用于改进的规则挖掘算法中,形成一种基于权值的关联图的关联规则算法。%This paper presents an improved association rule mining algorithm for the recommended system, and our definition for the page weights. We improve the association graph based association rules mining algorithm, and apply the page weights to the mining of association rules. This algorithm employs the data acquired after pretreatment to web log to mine the association rules. Page weights are obtained through the processing of such data with a normal distribution function. The algorithm then uses the page weights to recalculate the page support, which is applied to the improved rule mining algorithm. We can therefore acquire page weights based association rule algorithm of an association graph.

  15. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems

    Directory of Open Access Journals (Sweden)

    Brian Foo

    2009-01-01

    Full Text Available Networks of classifiers can offer improved accuracy and scalability over single classifiers by utilizing distributed processing resources and analytics. However, they also pose a unique combination of challenges. First, classifiers may be located across different sites that are willing to cooperate to provide services, but are unwilling to reveal proprietary information about their analytics, or are unable to exchange their analytics due to the high transmission overheads involved. Furthermore, processing of voluminous stream data across sites often requires load shedding approaches, which can lead to suboptimal classification performance. Finally, real stream mining systems often exhibit dynamic behavior and thus necessitate frequent reconfiguration of classifier elements to ensure acceptable end-to-end performance and delay under resource constraints. Under such informational constraints, resource constraints, and unpredictable dynamics, utilizing a single, fixed algorithm for reconfiguring classifiers can often lead to poor performance. In this paper, we propose a new optimization framework aimed at developing rules for choosing algorithms to reconfigure the classifier system under such conditions. We provide an adaptive, Markov model-based solution for learning the optimal rule when stream dynamics are initially unknown. Furthermore, we discuss how rules can be decomposed across multiple sites and propose a method for evolving new rules from a set of existing rules. Simulation results are presented for a speech classification system to highlight the advantages of using the rules-based framework to cope with stream dynamics.

  16. NV - Assessment of wildlife hazards associated with mine pit lakes

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Several open pit mines in Nevada lower groundwater to mine ore below the water table. After mining, the pits partially fill with groundwater to form pit lakes. Water...

  17. Finding Influential Users in Social Media Using Association Rule Learning

    Directory of Open Access Journals (Sweden)

    Fredrik Erlandsson

    2016-04-01

    Full Text Available Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods.

  18. Finding Influential Users in Social Media Using Association Rule Learning

    Science.gov (United States)

    Erlandsson, Fredrik; Bródka, Piotr; Borg, Anton; Johnson, Henric

    2016-04-01

    Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods.

  19. 3D reconstruction method and connectivity rules of fracture networks generated under different mining layouts

    Institute of Scientific and Technical Information of China (English)

    Zhang Ru; Ai Ting; Li Hegui; Zhang Zetian; Liu Jianfeng

    2013-01-01

    In current research, a series of triaxial tests, which were employed to simulate three typical mining lay-outs (i.e., top-coal caving, non-pillar mining and protected coal seam mining), were conducted on coal by using MTS815 Flex Test GT rock mechanics test system, and the fracture networks in the broken coal samples were qualitatively and quantitatively investigated by employing CT scanning and 3D reconstruc-tion techniques. This work aimed at providing a detail description on the micro-structure and fracture-connectivity characteristics of rupture coal samples under different mining layouts. The results show that: (i) for protected coal seam mining layout, the coal specimens failure is in a compression-shear manner and oppositely, (ii) the tension-shear failure phenomenon is observed for top-coal caving and non-pillar mining layouts. By investigating the connectivity features of the generated fractures in the direction of r1 under different mining layouts, it is found that the connectivity level of the fractures of the samples corresponding to non-pillar mining layout was the highest.

  20. Urban association rules: uncovering linked trips for shopping behavior

    CERN Document Server

    Yoshimura, Yuji; Hobin, Juan N Bautista; Ratti, Carlo; Blat, Josep

    2016-01-01

    In this article, we introduce the method of urban association rules and its uses for extracting frequently appearing combinations of stores that are visited together to characterize shoppers' behaviors. The Apriori algorithm is used to extract the association rules (i.e., if -> result) from customer transaction datasets in a market-basket analysis. An application to our large-scale and anonymized bank card transaction dataset enables us to output linked trips for shopping all over the city: the method enables us to predict the other shops most likely to be visited by a customer given a particular shop that was already visited as an input. In addition, our methodology can consider all transaction activities conducted by customers for a whole city in addition to the location of stores dispersed in the city. This approach enables us to uncover not only simple linked trips such as transition movements between stores but also the edge weight for each linked trip in the specific district. Thus, the proposed methodo...

  1. 结合SOM的关联规则挖掘研究%Research on association rule based on SOM

    Institute of Scientific and Technical Information of China (English)

    景波; 刘莹; 陈耿

    2014-01-01

    为了实现在海量数据中的审计线索的快速发现,通过数据挖掘FMA算法对被审数据和审计专家经验库进行关联规则快速提取;再利用自组织神经网络改良CLARANS算法对审计专家经验库抽取的规则划分出相似规则群;然后通过对被审单位关联规则集合和专家经验的相似规则群进行相对强弱、趋近率和价值率的比较,最终得到审计线索集合。%In order to achieve the audit trail of the massive data quickly found through data mining FMA algorithms to quickly extract trial data and audit expertise library association rules;re-use of self-organizing neural network improved CLARANS algorithm to extract audit expertise library divide a similar rule base rules;then by trial set of association rules and expert experience similar rules group relative strength, the approach value and the different rate of comparing the resulting set of audit trail.

  2. Algorithm for Generating Non-Redundant Association Rules%一种无冗余的关联规则发现算法

    Institute of Scientific and Technical Information of China (English)

    高峰; 谢剑英

    2001-01-01

    关联规则是数据挖掘的重要研究内容之一,而传统算法生成的关联规则之间存在着大量的冗余规则.本文提出了一种通用的由最大频繁项目集生成无冗余关联规则的GNRR算法,利用规则之间的冗余关系,按一定顺序挖掘不同的规则,消除了规则之间的冗余性,使发现的规则数目呈指数倍减少.%The discovery of association rules is an important research topic in data mining, but the traditional association rules discovery algorithm produces too many redundant rules. This paper presented a general algorithm for mining non-redundant rules from the largest frequent itemsets using the redundant relationship of rules. The algorithm eliminates the redundancy between the rules and reduces the number of rules exponentially.

  3. An improved predictive association rule based classifier using gain ratio and T-test for health care data diagnosis

    Indian Academy of Sciences (India)

    M Nandhini; S N Sivanandam

    2015-09-01

    Health care data diagnosis is a significant task that needs to be executed precisely, which requires much experience and domain-knowledge. Traditional symptoms-based disease diagnosis may perhaps lead to false presumptions. In recent times, Associative Classification (AC), the combination of association rule mining and classification has received attention in health care applications which desires maximum accuracy. Though several AC techniques exist, they lack in generating quality rules for building efficient associative classifier. This paper aims to enhance the accuracy of the existing CPAR (Classification based on Predictive Association Rule) algorithm by generating quality rules using Gain Ratio. Mostly, health care applications deal with high dimensional datasets. Existence of high dimensions causes unfair estimates in disease diagnosis. Dimensionality reduction is commonly applied as a preprocessing step before classification task to improve classifier accuracy. It eliminates redundant and insignificant dimensions by keeping good ones without information loss. In this work, dimensionality reductions by T-test and reduct sets (or simply reducts) are performed as preprocessing step before CPAR and CPAR using Gain Ratio (CPAR-GR) algorithms. An investigation was also performed to determine the impact of T-test and reducts on CPAR and CPAR-GR. This paper synthesizes the existing work carried out in AC, and also discusses the factors that influence the performance of CPAR and CPAR-GR. Experiments were conducted using six health care datasets from UCI machine learning repository. Based on the experiments, CPAR-GR with T-test yields better classification accuracy than CPAR.

  4. Improving Leung's bidirectional learning rule for associative memories.

    Science.gov (United States)

    Lenze, B

    2001-01-01

    Leung (1994) introduced a perceptron-like learning rule to enhance the recall performance of bidirectional associative memories (BAMs). He proved that his so-called bidirectional learning scheme always yields a solution within a finite number of learning iterations in case that a solution exists. Unfortunately, in the setting of Leung a solution only exists in case that the training set is strongly linear separable by hyperplanes through the origin. We extend Leung's approach by considering conditionally strong linear separable sets allowing separating hyperplanes not containing the origin. Moreover, we deal with BAMs, which are generalized by defining so-called dilation and translation parameters enlarging their capacity, while leaving their complexity almost unaffected. The whole approach leads to a generalized bidirectional learning rule which generates BAMs with dilation and translation that perform perfectly on the training set in a case that the latter satisfies the conditionally strong linear separability assumption. Therefore, in the sense of Leung, we conclude with an optimal learning strategy which contains Leung's initial idea as a special case.

  5. Data mining algorithm for discovering matrix association regions (MARs)

    Science.gov (United States)

    Singh, Gautam B.; Krawetz, Shephan A.

    2000-04-01

    Lately, there has been considerable interest in applying Data Mining techniques to scientific and data analysis problems in bioinformatics. Data mining research is being fueled by novel application areas that are helping the development of newer applied algorithms in the field of bioinformatics, an emerging discipline representing the integration of biological and information sciences. This is a shift in paradigm from the earlier and the continuing data mining efforts in marketing research and support for business intelligence. The problem described in this paper is along a new dimension in DNA sequence analysis research and supplements the previously studied stochastic models for evolution and variability. The discovery of novel patterns from genetic databases as described is quite significant because biological patterns play an important role in a large variety of cellular processes and constitute the basis for gene therapy. Biological databases containing the genetic codes from a wide variety of organisms, including humans, have continued their exponential growth over the last decade. At the time of this writing, the GenBank database contains over 300 million sequences and over 2.5 billion characters of sequenced nucleotides. The focus of this paper is on developing a general data mining algorithm for discovering regions of locus control, i.e. those regions that are instrumental for determining cell type. One such type of element of locus control are the MARs or the Matrix Association Regions. Our limited knowledge about MARs has hampered their detection using classical pattern recognition techniques. Consequently, their detection is formulated by utilizing a statistical interestingness measure derived from a set of empirical features that are known to be associated with MARs. This paper presents a systematic approach for finding associations between such empirical features in genomic sequences, and for utilizing this knowledge to detect biologically interesting

  6. Exchange Rates: Predictable but not Explainable? Data Mining with Leading Indicators and Technical Trading Rules

    OpenAIRE

    Brandl, Bernd

    2005-01-01

    This paper presents a data mining approach to forecasting exchange rates. It is assumed that exchange rates are determined by both fundamental and technical factors. The balance of fundamental and technical factors varies for each exchange rate and frequency. It is difficult for forecasters to establish the relative relevance of different kinds of factors given this mixture; therefore the utilization of data mining algorithms is advantageous. The approach applied uses a genetic...

  7. Associations between rule-based parenting practices and child screen viewing: A cross-sectional study

    Directory of Open Access Journals (Sweden)

    Joanna M. Kesten

    2015-01-01

    Conclusions: Limit setting is associated with greater SV. Collaborative rule setting may be effective for managing boys' game-console use. More research is needed to understand rule-based parenting practices.

  8. Overview of the Texas Mining and Reclamation Association`s education project

    Energy Technology Data Exchange (ETDEWEB)

    Hutchins, M.F. [Texas Mining and Reclamation Association, Austin, TX (United States)

    1997-12-31

    The Texas Mining and Reclamation Association (TMRA) sponsors {open_quotes}Resources and the Environment,{close_quotes} a teacher workshop held at a lignite mine each summer. Over a period of five years more than two hundred science teachers have participated in the 4-day workshop, and through them approximately 50,000 middle school students have been exposed to the curriculum. The workshop was developed with a grant from Phillips Petroleum Foundation, provided to the Center for Engineering Geosciences at Texas A&M University. The funding enabled the development of a program consisting of a science education curriculum addressing the earth-science concepts associated with lignite production and reclamation activities. The workshop is currently being instructed by Jim Luppens, Phillips Coal Company, and two assisting earth science specialists. The workshop includes classroom instruction, presentations by guest speakers, hands-on activities, and a tour of a lignite mine. The workshop ends with a mock public hearing involving role-playing. Roles include mining personnel, regulatory agencies, local townspeople, and adjacent landowners. The curriculum is provided as a resource for teachers and includes 55 teaching units; each comprised of student story, teacher outline, and classroom/lab activities. The objective of the curriculum is to provide middle school students with an opportunity to learn about earth science and apply that knowledge to a real situation. The unifying theme of the workshop is geology and the development of lignite coal resources; from the planning stages of a mine to final reclamation.

  9. Analisis Keterkaitan Penyakit Pasien pada Puskesmas Menggunakan Metode Association Rule

    Directory of Open Access Journals (Sweden)

    karina auliasari

    2016-08-01

    The data used in this research is an inpatient medical records of patients Brang Rea Puskesmas from January to June 2015. The system was developed using the programming language Visual Basic and Microsoft SQL Server 2008 as the database. Tests on the analysis modules generate output system of rules "if" "then" or "if" "it", rule or rule illness taken from the rules that exceed the value or the minimum support and minimum confidence. From the results of system testing conducted seen that the rules that can be used by the health center for analysis Brang Rea inpatients disease is a rule that has a value of minimum support and minimum confidence-value equals or exceeds the value specified by the administrator.

  10. Data Mining Rules for Ultrasonic B-Type Detection and Diagnosis for Cholecystolithiasis

    Institute of Scientific and Technical Information of China (English)

    LOU Wei; YAN Li-min; HE Guo-sen

    2004-01-01

    This paper presents realistic data mining based on the data of B-type ultrasonic detection and diagnosis for cholrcystolithiasis (gallbladder stone in biliary tract) recorded by a district central hospital in Shanghai during the past several years. Computer simulation and modeling is described.

  11. DrugQuest - a text mining workflow for drug association discovery

    OpenAIRE

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A.; Theodosiou ,Theodosios; Vizirianakis, Ioannis S.; Iliopoulos, Ioannis

    2016-01-01

    Background Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Results Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based...

  12. An Efficient Method for Mining Event-Related Potential Patterns

    Directory of Open Access Journals (Sweden)

    Seyed Aliakbar Mousavi

    2011-11-01

    Full Text Available In the present paper, we propose a Neuroelectromagnetic Ontology Framework (NOF for mining Event-related Potentials (ERP patterns as well as the process. The aim for this research is to develop an infrastructure for mining, analysis and sharing the ERP domain ontologies. The outcome of this research is a Neuroelectromagnetic knowledge-based system. The framework has 5 stages: 1 Data pre-processing and preparation; 2 Data mining application; 3 Rule Comparison and Evaluation; 4 Association rules Post-processing 5 Domain Ontologies. In 5th stage a new set of hidden rules can be discovered base on comparing association rules by domain ontologies and expert rules.

  13. An Efficient Method for Mining Event-Related Potential Patterns

    CERN Document Server

    Mousavi, Seyed Aliakbar; Mohamed, Hasimah Hj; Alomari, Saleh Ali

    2012-01-01

    In the present paper, we propose a Neuroelectromagnetic Ontology Framework (NOF) for mining Event-related Potentials (ERP) patterns as well as the process. The aim for this research is to develop an infrastructure for mining, analysis and sharing the ERP domain ontologies. The outcome of this research is a Neuroelectromagnetic knowledge-based system. The framework has 5 stages: 1) Data pre-processing and preparation; 2) Data mining application; 3) Rule Comparison and Evaluation; 4) Association rules Post-processing 5) Domain Ontologies. In 5th stage a new set of hidden rules can be discovered base on comparing association rules by domain ontologies and expert rules.

  14. 一种基于后项不定长关联规则的Web个性化推荐方法%A Web Personalized Recommendation Method Based on Uncertain Consequent Association Rules

    Institute of Scientific and Technical Information of China (English)

    丁增喜; 王菊英; 王大玲; 鲍玉斌; 于戈

    2003-01-01

    Web usage mining plays an important part in supporting personalized recommendation on Web and association rule uncovers the interesting relations among items hidden in data. The paper gives an idea of association rule merging-deleting based on the analysis of association rule characteristics and implements it in the rule preparation before the Web personalized recommendation. Furthermore, based on the comparisons in precision, coverage and F1 of recommendation system and the rule numbers used in three kinds of association rules, a Web personalized recommendation method based on uncertain consequent is put forward. After integrative analysis of several recommendation methods, the method given in the paper can be thought as a good selection. At last several pageweighted techniques are introduced in the paper.

  15. Formal and Computational Properties of the Confidence Boost of Association Rules

    OpenAIRE

    Balcázar, José L.

    2011-01-01

    Some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. One can push the intuition of redundancy further and find an intuitive notion of interest of an association rule, in terms of its "novelty" with respect to other rules. Namely: an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher? We propose to mea...

  16. A potential causal association mining algorithm for screening adverse drug reactions in postmarketing surveillance.

    Science.gov (United States)

    Ji, Yanqing; Ying, Hao; Dews, Peter; Mansour, Ayman; Tran, John; Miller, Richard E; Massanari, R Michael

    2011-05-01

    Early detection of unknown adverse drug reactions (ADRs) in postmarketing surveillance saves lives and prevents harmful consequences. We propose a novel data mining approach to signaling potential ADRs from electronic health databases. More specifically, we introduce potential causal association rules (PCARs) to represent the potential causal relationship between a drug and ICD-9 (CDC. (2010). International Classification of Diseases, Ninth Revision (ICD-9). [Online]. Available: http://www.cdc.gov/nchs/icd/icd9.html) coded signs or symptoms representing potential ADRs. Due to the infrequent nature of ADRs, the existing frequency-based data mining methods cannot effectively discover PCARs. We introduce a new interestingness measure, potential causal leverage, to quantify the degree of association of a PCAR. This measure is based on the computational, experience-based fuzzy recognition-primed decision (RPD) model that we developed previously (Y. Ji, R. M. Massanari, J. Ager, J. Yen, R. E. Miller, and H. Ying, "A fuzzy logic-based computational recognition-primed decision model," Inf. Sci., vol. 177, pp. 4338-4353, 2007) on the basis of the well-known, psychology-originated qualitative RPD model (G. A. Klein, "A recognition-primed decision making model of rapid decision making," in Decision Making in Action: Models and Methods, 1993, pp. 138-147). The potential causal leverage assesses the strength of the association of a drug-symptom pair given a collection of patient cases. To test our data mining approach, we retrieved electronic medical data for 16,206 patients treated by one or more than eight drugs of our interest at the Veterans Affairs Medical Center in Detroit between 2007 and 2009. We selected enalapril as the target drug for this ADR signal generation study. We used our algorithm to preliminarily evaluate the associations between enalapril and all the ICD-9 codes associated with it. The experimental results indicate that our approach has a potential to

  17. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  18. Fast Vertical Mining Using Boolean Algebra

    Directory of Open Access Journals (Sweden)

    Hosny M. Ibrahim

    2015-01-01

    Full Text Available The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The sizes of bit vectors for itemsets are the main space expense of the algorithm that restricts its expansibility. Therefore, in this paper, a proposed algorithm that compresses the bit vectors of frequent itemsets will be presented. The new bit vector schema presented here depends on Boolean algebra rules to compute the intersection of two compressed bit vectors without making any costly decompression operation. The experimental results show that the proposed algorithm, Vertical Boolean Mining (VBM algorithm is better than both Apriori algorithm and the classical vertical association rule mining algorithm in the mining time and the memory usage.

  19. A Remote-Sensing-Driven System for Mining Marine Spatiotemporal Association Patterns

    OpenAIRE

    Cunjin Xue; Qing Dong; Xiaohong Li; Xing Fan; Yilong Li; Shuchao Wu

    2015-01-01

    Remote sensing is widely used to analyze marine environments. While many effective and advanced methods have been developed, they are generally used independently of each other, despite the potential advantages of combining different modules into an integrated system. We develop here an image-driven remote-sensing mining system, RSMapMining (Remote Sensing driven Marine spatiotemporal Association Pattern Mining system), which consists of three modules. The image preprocessing module integrate...

  20. Text Association Analysis and Ambiguity in Text Mining

    Science.gov (United States)

    Bhonde, S. B.; Paikrao, R. L.; Rahane, K. U.

    2010-11-01

    Text Mining is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Text Mining will enhance human's ability to process massive quantities of information, and it has high commercial values. Firstly, the paper discusses the introduction of TM its definition and then gives an overview of the process of text mining and the applications. Up to now, not much research in text mining especially in concept/entity extraction has focused on the ambiguity problem. This paper addresses ambiguity issues in natural language texts, and presents a new technique for resolving ambiguity problem in extracting concept/entity from texts. In the end, it shows the importance of TM in knowledge discovery and highlights the up-coming challenges of document mining and the opportunities it offers.

  1. 数据挖掘发展研究%The Develepment Research on the Data Mining

    Institute of Scientific and Technical Information of China (English)

    张伟; 刘勇国; 彭军; 廖晓峰; 吴中福

    2001-01-01

    Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.

  2. Design and Realization of user Behaviors Recommendation System Based on Association rules under Cloud Environment

    Directory of Open Access Journals (Sweden)

    Wei Dai

    2013-07-01

    Full Text Available This study introduces the basal principles of association rules, properties and advantages of Map Reduce model and Hbase in Hadoop ecosystem. And giving design steps of the user's actions recommend system in detail, many time experiences proves that the exploration combined association rules theory with cloud computing is successful and effective.

  3. The study of slip line field and upper bound method based on associated flow and non-associated flow rules

    Institute of Scientific and Technical Information of China (English)

    Zheng Yingren; Deng Chujian; Wang Jinglin

    2010-01-01

    At present,associated flow rule of traditional plastic theory is adopted in the slip line field theory and upper bound method of geotechnical materials.So the stress characteristic line conforms to the velocity line.It is proved that geotechnical materials do not abide by the associated flow rule.It is impossible for the stress characteristic line to conform to the velocity line.Generalized plastic mechanics theoretically proved that plastic potential surface intersects the Mohr-Coulomb yield surface with an angle,so that the velocity line must be studied by non-associated flow rule.According to limit analysis theory,the theory of slip line field is put forward in this paper,and then the ultimate boating capacity of strip footing is obtained based on the associated flow rule and the non-associated flow rule individually.These two results are identical since the ultimate bearing capacity is independent of flow rule.On the contrary,the velocity fields of associated and non-associated flow rules are different which shows the velocity field based on the associated flow rule is incorrect.

  4. Rules Mining Research on the Eye Features Computation in the Spirit Diagnosing of TCM%中医察目望神规则挖掘的关键技术

    Institute of Scientific and Technical Information of China (English)

    郭锋; 李绍滋; 戴莹; 周昌乐; 林颖

    2011-01-01

    Quantitative features reflecting the human body's appearance are very helpful in diagnosing human's health state in Traditional Chinese Medicine (TCM). This paper presents a novel application of eye features computation in the Spirit diagnosing, for which rules describing "the Spirit" srate are mined by the quanritative fearures regarding the human eyes. With videos capturing the eye condition during a short time,a set of eye features are extracred. On the basis of it, attribute intervals of the eye feature space are generated by the CAIM. Then,a various of candidate rules are mined by the association rule mining based on the Cloud model. Finally, three comptementary rule-pruning methods are modified and combined to trim the boring candidate rules. The cross validation test for mined rules reaches the average accuracy of 93%,which shows the good performance of the proposed method.%中医客观化中有一个很重要的问题是根据量化的人体特征推导出人体的状态.提出了一种用于中医察目望神客观化中从眼部特征推导出人体"神"的状态的规则挖掘方法.首先给出了视频采集方法,接着使用类-属性依赖最大化方法(CAIM)对眼部特征数据进行离散化形成规则挖掘中的属性区间,然后使用云模型进行关联规则挖掘得到大量候选规则,再对3种互补的规则裁剪方法进行修改和组合用于候选规则的整理,并形成最终的规则集合.利用交叉验证法检验规则挖掘的效果,得到了93%的平均精确度,达到了很好的效果.

  5. [Analysis on medication rules of state medical master yan zhenghua's prescriptions that including Polygoni Multiflori Caulis based on data mining].

    Science.gov (United States)

    Wu, Jia-rui; Guo, Wei-xian; Zhang, Xiao-meng; Yang, Bing; Zhang, Bing; Zhao, Meng-di; Sheng, Xiao-guang

    2014-11-01

    The prescriptions including Polygoni Multiflori Caulis that built by Pro. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. The method of association rules with apriori algorithm was used to achieve frequency of single medicine, frequency of drug combinations, association rules between drugs and core drug combinations. The datamining results indicated that in the prescriptions that including Polygoni Multiflori Caulis, the highest frequency used drugs were parched Ziziphi Spinosae Semen, Ostreae Concha, Ossis Mastodi Fossilia, Salviae Miltiorrhizae Radix Et Rhizoma, Paeoniae Rubra Radix, and so on. The most frequent drug combinations were "Polygoni Multiflori Caulis-parched Ziziphi Spinosae Semen", "Ostreae Concha-Polygoni Multiflori Caulis", and "Polygoni Multiflori Caulis-Ossis Mastodi Fossilia". The drug association rules of confidence coefficient 1 were "Ostreae Concha-->Polygoni Multiflori Caulis", "Poria-->Polygoni Multiflori Caulis", "parched Ziziphi Spinosae Semen-->Polygoni Multiflori Caulis", and "Paeoniae Alba Radix-->Polygoni Multiflori Caulis". The core drug combinations in the treatment of insomnia were Ossis Mastodi Fossilia, Polygoni Multiflori Caulis, Salviae Miltiorrhizae Radix et Rhizoma, Ostreae Concha, Polygalae Radix, Margaritifera Concha, Poria, and parched Ziziphi Spinosae Semen. And the core drug combinations in the treatment of obstruction of Qi in chest were Salviae Miltiorrhizae Radix Et Rhizoma, Polygoni Multiflori Caulis, parched Ziziphi Spinosae Semen, Trichosanthis Fructus, Allii Macrostemonis Bulbus, and Paeoniae Rubra Radix.

  6. E-commerce Website Recommender System Based on Dissimilarity and Association Rule

    OpenAIRE

    MingWang Zhang; ShuWen Yang; LiFeng Zhang

    2013-01-01

    By analyzing the current electronic commerce recommendation algorithm analysis, put forward a kind to use dissimilarity clustering and association recommendation algorithm, the algorithm realized web website shopping user data clustering by use of the dissimilarity, and then use the association rules algorithm for clustering results of association recommendation, experiments show that the algorithm compared with traditional clustering association algorithm of iteration times decrease, improve...

  7. Mine dumps, wheeze, asthma, and rhinoconjunctivitis among adolescents in South Africa: any association?

    Science.gov (United States)

    Nkosi, Vusumuzi; Wichmann, Janine; Voyi, Kuku

    2015-01-01

    The study investigated the association between community proximity to mine dumps, and current wheeze, rhinoconjunctivitis, and asthma among adolescents. This study was conducted during May-November 2012 around five mine dumps in South Africa. Communities in close proximity to mine dumps had an increased likelihood of current wheeze OR 1.38 (95 % CI: 1.10-1.71), rhinoconjunctivitis OR 1.54 (95 % CI: 1.29-1.82), and a protective association with asthma OR 0.29 (95 % CI: 0.23-0.35). Factors associated with health outcomes included other indoor and outdoor pollution sources. Wheeze and rhinoconjunctivitis appear to be a public health problem in these communities. The findings of this study serve as a base for further detailed epidemiological studies for communities in close proximity to the mine dumps e.g. a planned birth cohort study.

  8. Data Mining Techniques: A Source for Consumer Behavior Analysis

    CERN Document Server

    Raorane, Abhijit

    2011-01-01

    Various studies on consumer purchasing behaviors have been presented and used in real problems. Data mining techniques are expected to be a more effective tool for analyzing consumer behaviors. However, the data mining method has disadvantages as well as advantages. Therefore, it is important to select appropriate techniques to mine databases. The objective of this paper is to know consumer behavior, his psychological condition at the time of purchase and how suitable data mining method apply to improve conventional method. Moreover, in an experiment, association rule is employed to mine rules for trusted customers using sales data in a super market industry

  9. 基于MDPI的多维关联规则算法的研究%The Research for Multidimensional Association Rules Algorithm Based on MDPI

    Institute of Scientific and Technical Information of China (English)

    彭硕; 吴昊

    2011-01-01

    Multidimensional data mining association rules is an important research direction. In this paper, we propose an efficient algorithm for mining multidimensial association rules,which combine data cube technique with FP-Growth efficiently by constructing a MDPI-tree,the algorithm can explores both inter-dimension and hybrid-dimension association rules. Lastly this algorithm is applied to cross-selling model of mobile communication, and we can verificate the practicality and effectiveness of the algorithm by experiment.%多维关联规则是数据挖掘中的一个重要研究方向,由此提出了一种高效的多维关联规则挖掘算法,该方法通过引入MDPI-tree(多维谓词索引树)结构,有效地将数据立方体技术和频繁项集挖掘算法FP-Growth结合起来,能用于挖掘维间和混合维关联规则.最后将此算法应用于移动通信交叉销售模型,通过实验验证算法的有效性和实用性.

  10. A Remote-Sensing-Driven System for Mining Marine Spatiotemporal Association Patterns

    Directory of Open Access Journals (Sweden)

    Cunjin Xue

    2015-07-01

    Full Text Available Remote sensing is widely used to analyze marine environments. While many effective and advanced methods have been developed, they are generally used independently of each other, despite the potential advantages of combining different modules into an integrated system. We develop here an image-driven remote-sensing mining system, RSMapMining (Remote Sensing driven Marine spatiotemporal Association Pattern Mining system, which consists of three modules. The image preprocessing module integrates image processing techniques and marine extraction methods to build a mining database. The pattern mining module integrates popular algorithms to implement the mining process according to the mining strategies. The third module, knowledge visualization, designs a series of interactive interfaces to visualize the marine data at a variety of scales, from global to grid pixel. The effectiveness of the integrated system is tested in a case study of the northwestern Pacific Ocean. The main contribution of this study is the development of a mining system to deal with marine remote sensing images by integrating popular techniques and methods ranging from information extraction, through visualization, to knowledge discovery.

  11. Mining and Visualizing Family History Associations in the Electronic Health Record: A Case Study for Pediatric Asthma.

    Science.gov (United States)

    Chen, Elizabeth S; Melton, Genevieve B; Wasserman, Richard C; Rosenau, Paul T; Howard, Diantha B; Sarkar, Indra Neil

    2015-01-01

    Asthma is the most common chronic childhood disease and has seen increasing prevalence worldwide. While there is existing evidence of familial and other risk factors for pediatric asthma, there is a need for further studies to explore and understand interactions among these risk factors. The goal of this study was to develop an approach for mining, visualizing, and evaluating association rules representing pairwise interactions among potential familial risk factors based on information documented as part of a patient's family history in the electronic health record. As a case study, 10,260 structured family history entries for a cohort of 1,531 pediatric asthma patients were extracted and analyzed to generate family history associations at different levels of granularity. The preliminary results highlight the potential of this approach for validating known knowledge and suggesting opportunities for further investigation that may contribute to improving prediction of asthma risk in children.

  12. Data Mining Approaches for Intrusion Detection

    Science.gov (United States)

    2007-11-02

    In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques...two general data mining algorithms that we have implemented: the association rules algorithm and the frequent episodes algorithm. These algorithms can

  13. Clinic-Genomic Association Mining for Colorectal Cancer Using Publicly Available Datasets

    Directory of Open Access Journals (Sweden)

    Fang Liu

    2014-01-01

    Full Text Available In recent years, a growing number of researchers began to focus on how to establish associations between clinical and genomic data. However, up to now, there is lack of research mining clinic-genomic associations by comprehensively analysing available gene expression data for a single disease. Colorectal cancer is one of the malignant tumours. A number of genetic syndromes have been proven to be associated with colorectal cancer. This paper presents our research on mining clinic-genomic associations for colorectal cancer under biomedical big data environment. The proposed method is engineered with multiple technologies, including extracting clinical concepts using the unified medical language system (UMLS, extracting genes through the literature mining, and mining clinic-genomic associations through statistical analysis. We applied this method to datasets extracted from both gene expression omnibus (GEO and genetic association database (GAD. A total of 23517 clinic-genomic associations between 139 clinical concepts and 7914 genes were obtained, of which 3474 associations between 31 clinical concepts and 1689 genes were identified as highly reliable ones. Evaluation and interpretation were performed using UMLS, KEGG, and Gephi, and potential new discoveries were explored. The proposed method is effective in mining valuable knowledge from available biomedical big data and achieves a good performance in bridging clinical data with genomic data for colorectal cancer.

  14. Association of rule of law and health outcomes: an ecological study

    Science.gov (United States)

    Pinzon-Rondon, Angela Maria; Attaran, Amir; Botero, Juan Carlos; Ruiz-Sternberg, Angela Maria

    2015-01-01

    Objectives To explore whether the rule of law is a foundational determinant of health that underlies other socioeconomic, political and cultural factors that have been associated with health outcomes. Setting Global project. Participants Data set of 96 countries, comprising 91% of the global population. Primary and secondary outcome measures The following health indicators, infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, were included to explore their association with the rule of law. We used a novel Rule of Law Index, gathered from survey sources, in a cross-sectional and ecological design. The Index is based on eight subindices: (1) Constraints on Government Powers; (2) Absence of Corruption; (3) Order and Security; (4) Fundamental Rights; (5) Open Government; (6) Regulatory Enforcement, (7) Civil Justice; and (8) Criminal Justice. Results The rule of law showed an independent association with infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, after adjusting for the countries’ level of per capita income, their expenditures in health, their level of political and civil freedom, their Gini measure of inequality and women's status (p<0.05). Rule of law remained significant in all the multivariate models, and the following adjustment for potential confounders remained robust for at least one or more of the health outcomes across all eight subindices of the rule of law. Findings show that the higher the country's level of adherence to the rule of law, the better the health of the population. Conclusions It is necessary to start considering the country's adherence to the rule of law as a foundational determinant of health. Health advocates should consider the improvement of rule of law as a tool to improve population health. Conversely, lack of progress in rule of law may constitute a structural barrier to health improvement

  15. Data Mining Foundations and Intelligent Paradigms Volume 1 Clustering, Association and Classification

    CERN Document Server

    Jain, Lakhmi

    2012-01-01

    Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 1of this three volume series, we have brought together contributions from some of the most prestigious researchers in the fundamental data mining tasks of clustering, association and classification. Each of the chapters is self contained. Theoreticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in these aspects of data mining.

  16. Optimization of Association Rule Apriori Algorithm%关联规则挖掘算法的优化

    Institute of Scientific and Technical Information of China (English)

    张青

    2015-01-01

    Apriori算法是关联规则挖掘的经典算法,该算法在处理规模巨大的候选项目集时存在耗时长和效率低的问题,提出了采用分割法对数据进行分片的优化算法。实验证明该算法不仅能减少数据挖掘对系统资源的占用,而且解决了数据库中数据分割下局部频繁项目序列集产生和全局频繁项目序列集的转换问题。%The Apriori algorithm is a classical methodology used for data mining association rules ,but this algorithm is rather time-consuming and low-efficient in dealing with massive sets of candidate items. This thesis has put forth an optimal algorithm of data segmentation based on data division,and the experiments prove that this new algorithm not only works well to make a significiant reduction in the amount of systemic resources engaged in data mining,but also provides a fine solution to the formation and conversion of series of item sets occuring frequently in the process of data-segmentation and data-division in databases.

  17. 大数据分析中的关联挖掘磁%Data Mining Association in the Data Analysis

    Institute of Scientific and Technical Information of China (English)

    金宗泽; 冯亚丽; 纪博; 张希; 高快

    2014-01-01

    In this era with the amount information explosion ,the big data is more and more close to our lives .Firstly where the big data came from and how to study the big data are introduced .Then ,the framework of the data analysis pro-cessing is introduced and the importance of the big data mining is elaborated .It provided the studying ways of the big data mining ,and the analytic system can analyze the mining scheme ,meanwhile ,the users can use the artificial selection of pa-rameters to manage the parameters for analysis ,selection and retention .In the course of big data analysis ,if we can use min-ing association rules better ,it will bring more value .%在这个信息量爆炸的年代,大数据越来越贴近我们的生活。论文从大数据从何而来、如何研究大数据入手,通过对大数据分析流程框架进行阐述,提出了大数据分析中关联挖掘的重要性。并通过对大数据关联挖掘给出了相应的研究方案,通过系统对其关联模式进行分析,同时也可通过人为的参数选择对研究的参数进行分析、筛选和保留。在大数据分析的过程中,若能很好地利用关联规则的挖掘,将会带来更广阔的实际价值。

  18. Prospectors and Developers Association of Canada Mining Matters: A Model of Effective Outreach

    Science.gov (United States)

    Hymers, L.; Heenan, S.

    2009-05-01

    Prospectors and Developers Association of Canada Mining Matters is a charitable organization whose mandate is to bring the wonders of Canada's geology and mineral resources to students, educators and industry. The organization provides current information about rocks, minerals, metals, and mining and offers exceptional educational resources, developed by teachers and for teachers that meet Junior, Intermediate and Senior Provincial Earth Science and Geography curriculum expectations. Since 1994, Mining Matters has reached more than 400,000 educators, students, industry representatives, and Aboriginal Youth through Earth Science resources. At the time of the program's inception, members of the Prospectors and Developers Association of Canada (PDAC) realized that their mining and mineral industry expertise could be of help to teachers and students. Consulting experts in education, government, and business, and the PDAC worked together to develop the first Mining Matters Earth Science curriculum kit for Grades 6 and 7 teachers in Ontario. PDAC Mining Matters became the official educational arm of the Association and a charitable organization in 1997. Since then, the organization has partnered with government, industry, and educators to develop bilingual Earth science teaching units for Grades 4 and 7, and senior High School. The teaching units consist of kits that contain curriculum correlated lesson plans, inform bulletins, genuine data sets, rock and mineral samples, equipment and additional instructional resources. Mining Matters offers instructional development workshops for the purposes of training pre-service and in- service educators to use our teaching units in the classroom. The workshops are meant to provide teachers with the knowledge and confidence they need to successfully employ the units in the classroom. Formal mechanisms for resource and workshop evaluations are in place. Overwhelmingly teacher feedback is positive, describing the excellence

  19. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  20. Objective Evaluation Method of Association Rule Interestingness%基于客观兴趣度的关联规则评价方法①

    Institute of Scientific and Technical Information of China (English)

    亓文娟; 晏杰

    2013-01-01

    目前衡量和生成关联规则的主要准则是考虑支持度和置信度阈值,而在实际应用中仅按此准则来挖掘是不够的,这主要是因为关联规则的评价标准不合理产生的。针对关联规则评价指标进行了深入的研究,分析了“支持度-置信度”架构的局限性,提出了基于相关性的兴趣度的评价指标PS公式,根据其数学特性指出了它的优点与不足,为关联规则评价体系的改进奠定了理论基础。%Current main guidelines is to measure and generate Association rules take into account support and confidence threshold, and only in the practical application of this guideline to mining is insufficient, this is mainly because the associated rule evaluation criterion is not reasonable. This article for the associated rule evaluation conducted an in-depth study, analyzed the "support-confidence" schema limitations, presenting an interest based on correlation degree of evaluation indicators PS formula, based on its mathematical properties that has its advantages and disadvantages, laid the theoretical foundation for improvement of evaluation system of Mining Association rules.

  1. NARG Algorithm of Extracting Non-redundant Association Rule in Concept Lattice%概念格上无冗余关联规则的提取算法NARG

    Institute of Scientific and Technical Information of China (English)

    苗茹; 沈夏炯; 胡小华

    2009-01-01

    Association roles are the very valuable kind of law in data mining. A large number of rules arc usually generated from database using ordinary mining algorithms. Especially when the minimal support and minimal confidence are reduced, the number of association rules rise rapidly. The key of eliminating redundant association rules is to reduce rules without losing data information. This paper presents a new algorithm called NARG to extract non-redundant association rules based on concept lattice and properties of redundant association rules. This algorithm can gain the minimal non-redundant set of association rules while effectively improve efficiency of extracting rules without losing any information of data.%在数据挖掘中,关联规则是很有价值的一类规律.普通的挖掘算法会产生大量的规则,尤其是当最小支持度和最小可信度减少时,关联规则的数目急剧上升.如何对规则进行约减而又不丢失数据信息是消除冗余关联规则的关键.根据概念格的理论和冗余关联规则的性质,提出在概念格上提取无冗余关联规则的NARG算法.该算法可以得到最小的无冗余的关联规则集,而且不丢失任何信息,可有效提高关联规则生成的效率.

  2. ON MINING ENTREPRENEURSHIP IN BANOVINA REGION (CROATIA

    Directory of Open Access Journals (Sweden)

    Berislav Šebečić

    2000-12-01

    Full Text Available Mining activities in exploitation of iron, copper, and lead (-silver ores in Trgovska gora Mountain had been developed back in Illyrian and Roman times as well as in the Middle Ages and recent times whereas in Petrova gora Mountain exploitation of iron oreš and coal developed as late as in 19 and 20 centuries. In the Middle Ages and more recent times, Croatian nobility (counts of Zrinski and Keglević and later on also the foreign nobility or foreign and domestic mining associations were given mining concessions.The mining enterprise in the Banovina Region passed to different owners and managers from mid —19 century to mid — 20 century. During the Austro-Hungarian rule the main mining concession was owned by »Gewerkschaft der Eisenbergwerke und Huttenwerke Petrova gora zu Topusko« or its shorter version »Petrova gora Gewerkschaft«. The major mining entrepreneurs on the Trgovska gora Mountain at Bešlinac were Desire Gilain, Joseph Steinauer and Alois Frohm. After the World War I and confiscation of properties of foreign mining associations and entrepreneurs, there were constituted and bankrupted rather quickly the Petrova gora Association of Mines and Foundry at Topusko, the Slavenska Bank Zagreb (until 1923, as well as the Iron Mine and Foundry Inc. at Topusko. After the bancruptey of National Industrial Enterprise Zagreb (1929, the Mining Association and (Iron Foundry was founded at Bešlinac (1934. In the region of Banovina there were operating also: the Kupa-Glina Mining Association (active also during the Austro-Hungarian rule, Mineral Mining Association from Topusko, as vvell as the Iron Mine and Foundry Topusko-Vojnić Headquarters. All the mentioned associations and entrepreneurs were confiscated by the Federal People's Republic of Yugoslavia in 1946.

  3. Improved association rules and its application in Computer Forensics%关联规则改进及其在计算机取证中的应用

    Institute of Scientific and Technical Information of China (English)

    刘锋; 詹焰霞; 陈玉萍

    2012-01-01

    随着科学技术的发展,计算机早已走进千家万户,由此带来的计算机犯罪等一系列问题也越发引起社会的关注,而计算机取证是遏制这种行为的一个强有力的工具。本文将计算机取证技术与数据挖掘中的关联规则挖掘结合起来,首先介绍了数据挖掘和关联规则的相关概念,提出了关联规则挖掘中最典型的Apriori算法,并总结了其不足之处,然后针对不足提出了基于排序的apriori改进算法,提高了算法的效率,并将之运用到计算机取证中,通过具体实例验证了其可行性。%With the development of science and technology, computer has already gone into thousands of households, which brings a series of problems such as computer crime which is also increasingly attracted the attention of the society, and computer forensics is a powerful tool to curb the behavior. In this paper, the technology of computer forensics and association rules mining are combined, first introduced the data mining, association rule and the related concept, and then proposed the typical Apriori algorithm of associa- tion rules mining, and summarizes its deficiency, then put forward the improved Apriori algorithm based on sort, improves the effi- ciency of algorithm, and apply it to computer forensics, through specific example test and verify its feasibility.

  4. Objective novelty of association rules: measuring the confidence boost

    OpenAIRE

    Balcázar Navarro, José Luis

    2010-01-01

    On sait bien que la confiance des régles d’association n’est pas vraiment satisfaisant comme mésure d’interêt. Nous proposons, au lieu de la substituer par des autres mésures (soit, en l’employant de façon conjointe a des autres mésures), évaluer la nouveauté de chaque régle par comparaison de sa confiance par rapport á des régles plus fortes qu’on trouve au même ensemble de données. C’est á dire, on considère un seuil “relative” de confiance au lieu du seuil absolute habituel. Cette idée se ...

  5. Algorithm of Intrusion Detection Based on Data Mining and Its Implementation

    Institute of Scientific and Technical Information of China (English)

    SUN Hai-bin; XU Liang-xian; CHEN Yan-hua

    2004-01-01

    Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined from audit logs. Some attributes in audit logs are important for detecting intrusion but their values are distributed skewedly. A relative support concept is proposed to deal with such situation. To mine class association rules effectively, an algorithms based on FP-tree is exploited. Experiment result proves that this method has better performance.

  6. A Survey of latest Algorithms for Frequent Itemset Mining in Data Stream

    Directory of Open Access Journals (Sweden)

    U.Chandrasekhar

    2013-03-01

    Full Text Available Association rule mining and finding frequent patterns in data base has been a very old topic. With the advent of Big Data, the need for stream mining has increased. Hence the paper surveys various latest frequent pattern mining algorithms on data streams to understand various problems to be solved, their short comings and advantages over others.

  7. DISEASES: text mining and data integration of disease-gene associations.

    Science.gov (United States)

    Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

    2015-03-01

    Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.

  8. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  9. Health Effects Associated with Inhalation of Airborne Arsenic Arising from Mining Operations

    Directory of Open Access Journals (Sweden)

    Rachael Martin

    2014-08-01

    Full Text Available Arsenic in dust and aerosol generated by mining, mineral processing and metallurgical extraction industries, is a serious threat to human populations throughout the world. Major sources of contamination include smelting operations, coal combustion, hard rock mining, as well as their associated waste products, including fly ash, mine wastes and tailings. The number of uncontained arsenic-rich mine waste sites throughout the world is of growing concern, as is the number of people at risk of exposure. Inhalation exposures to arsenic-bearing dusts and aerosol, in both occupational and environmental settings, have been definitively linked to increased systemic uptake, as well as carcinogenic and non-carcinogenic health outcomes. It is therefore becoming increasingly important to identify human populations and sensitive sub-populations at risk of exposure, and to better understand the modes of action for pulmonary arsenic toxicity and carcinogenesis. In this paper we explore the contribution of smelting, coal combustion, hard rock mining and their associated waste products to atmospheric arsenic. We also report on the current understanding of the health effects of inhaled arsenic, citing results from various toxicological, biomedical and epidemiological studies. This review is particularly aimed at those researchers engaged in the distinct, but complementary areas of arsenic research within the multidisciplinary field of medical geology.

  10. Applied data mining for business and industry

    CERN Document Server

    Giudici, Paolo

    2009-01-01

    The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...

  11. MIDClass: microarray data classification by association rules and gene expression intervals.

    Directory of Open Access Journals (Sweden)

    Rosalba Giugno

    Full Text Available We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier, based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.

  12. Improved Personalized Recommendation Based on Causal Association Rule and Collaborative Filtering

    Science.gov (United States)

    Lei, Wu; Qing, Fang; Zhou, Jin

    2016-01-01

    There are usually limited user evaluation of resources on a recommender system, which caused an extremely sparse user rating matrix, and this greatly reduce the accuracy of personalized recommendation, especially for new users or new items. This paper presents a recommendation method based on rating prediction using causal association rules.…

  13. A study of trends in occupational risks associated with coal mining

    Energy Technology Data Exchange (ETDEWEB)

    Amoundru, C.

    1980-10-01

    The occupational risks associated with underground coal mining can be categorized as either industrial accidents or occupational diseases. Since 1957, the number of fatal accidents per million tons of coal produced has dropped by a factor of four. The number of industrial accidents in general decreased by 30% during 1967-75. The main occupational diseases affecting miners are arthrosis, deafness, and pneumoconiosis. To make an objective comparison with the health hazards from other sources of energy, the probable risks facing workers in a modern mine should be compared with those currently confronting workers in other industries.

  14. Impact of gold mining associated with mercury contamination in soil, biota sediments and tailings in Kenya.

    Science.gov (United States)

    Odumo, Benjamin Okang'; Carbonell, Gregoria; Angeyo, Hudson Kalambuka; Patel, Jayanti Purshottam; Torrijos, Manuel; Rodríguez Martín, José Antonio

    2014-11-01

    This work considered the environmental impact of artisanal mining gold activity in the Migori-Transmara area (Kenya). From artisanal gold mining, mercury is released to the environment, thus contributing to degradation of soil and water bodies. High mercury contents have been quantified in soil (140 μg kg(-1)), sediment (430 μg kg(-1)) and tailings (8,900 μg kg(-1)), as expected. The results reveal that the mechanism for transporting mercury to the terrestrial ecosystem is associated with wet and dry depositions. Lichens and mosses, used as bioindicators of pollution, are related to the proximity to mining areas. The further the distance from mining areas, the lower the mercury levels. This study also provides risk maps to evaluate potential negative repercussions. We conclude that the Migori-Transmara region can be considered a strongly polluted area with high mercury contents. The technology used to extract gold throughout amalgamation processes causes a high degree of mercury pollution around this gold mining area. Thus, alternative gold extraction methods should be considered to reduce mercury levels that can be released to the environment.

  15. Integrated assessmet of the impacts associated with uranium mining and milling

    Energy Technology Data Exchange (ETDEWEB)

    Parzyck, D.C.; Baes, C.F. III; Berry, L.G.

    1979-07-01

    The occupational health and safety impacts are assessed for domestic underground mining, open pit mining, and milling. Public health impacts are calculated for a population of 53,000 located within 88 km (55 miles) of a typical southwestern uranium mill. The collective annual dose would be 6.5 man-lung rem/year, 89% of which is from /sup 222/Rn emitted from mill tailings. The dose to the United States population is estimated to be 6 x 10/sup 4/ man-lung rem from combined mining and milling operations. This may be comparedd with 5.7 x 10/sup 5/ man-lung rem from domestic use of natural gas and 4.4 x 10/sup 7/ man-lung rem from building interiors. Unavoidable adverse environmental impacts appear to be severe in a 250 ha area surrounding a mill site but negligible in the entire potentially impacted area (500,000 ha). The contemporary uranium resource and supply industry and its institutional settings are described in relation to the socio-economic impacts likely to emerge from high levels of uranium mining and milling. Radon and radon daughter monitoring techniques associated with uranium mining and milling are discussed.

  16. Association text classification of mining ItemSet significance%挖掘重要项集的关联文本分类

    Institute of Scientific and Technical Information of China (English)

    蔡金凤; 白清源

    2011-01-01

    针对在关联规则分类算法的构造分类器阶段中只考虑特征词是否存在,忽略了文本特征权重的问题,基于关联规则的文本分类方法(ARC-BC)的基础上提出一种可以提高关联文本分类准确率的ISARC(ItemSet Significance-based ARC)算法.该算法利用特征项权重定义了k-项集重要度,通过挖掘重要项集来产生关联规则,并考虑提升度对待分类文本的影响.实验结果表明,挖掘重要项集的ISARC算法可以提高关联文本分类的准确率.%Text classification technology is an important basis of information retrieval and text mining,and its main task is to mark category according to a given category set.Text classification has a wide range of applications in natural language processing and understanding、information organization and management、information filtering and other areas.At present,text classification can be mainly divided into three groups: based on statistical methods、based on connection method and the method based on rules. The basic idea of the traditional association text classification algorithm associative rule-based classifier by category(ARC-BC) is to use the association rule mining algorithm Apriori which generates frequent items that appear frequently feature items or itemsets,and then use these frequent items as rule antecedent and category is used as rule consequent to form the rule set and then make these rules constitute a classifier.During classifying the test samples,if the test sample matches the rule antecedent,put the rule that belongs to the class counterm to the cumulative confidence.If the confidence of the category counter is the maximum,then determine the test sample belongs to that category. However,ARC-BC algorithm has two main drawbacks:(1) During the structure classifier,it only considers the existence of feature words and ignores the weight of text features for mining frequent itemsets and generated association rules

  17. E-commerce Website Recommender System Based on Dissimilarity and Association Rule

    Directory of Open Access Journals (Sweden)

    MingWang Zhang

    2013-07-01

    Full Text Available By analyzing the current electronic commerce recommendation algorithm analysis, put forward a kind to use dissimilarity clustering and association recommendation algorithm, the algorithm realized web website shopping user data clustering by use of the dissimilarity, and then use the association rules algorithm for clustering results of association recommendation, experiments show that the algorithm compared with traditional clustering association algorithm of iteration times decrease, improve operational efficiency, to prove the method by use of the actual users purchase the recommended, and evidence of the effectiveness of the algorithm in recommendation.  

  18. Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices

    CERN Document Server

    Kaosar, Md Golam

    2010-01-01

    In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary to deploy mining applications in RCD. In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets encrypted by all mining sites without revealing individual values. The proposed algorithm is built on with a well known communication efficient association rule mining algorithm named count distribution (CD). Security proofs along with performance analysis and comparison show the well acceptability and effectiveness of the proposed algorithm. Efficient and straightforward privacy model and satisfactory perf...

  19. EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    D.Kerana Hanirex

    2011-03-01

    Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.

  20. Mining geographic episode association patterns of abnormal events in global earth science data

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Abnormal events in earth science have great influence on both the natural envi-ronment and the human society. Finding association patterns among these events has great significance. Because data in earth science has characteristics of mass,high dimension,spatial autocorrelation and time delay,existing mining technolo-gies cannot be directly used on it. We propose a RSNN (range-based searching nearest neighbors) spatial clustering algorithm to reduce the data size and auto-correlation. Based on the clustered data,we propose a GEAM (geographic episode association pattern mining) algorithm which can deal with events time lags and find interesting patterns with specific constraints,to mine the association patterns. We carried out experiments on global climate datasets and found many interesting association patterns. Some of the patterns are coincident with known knowledge in climate science,which indicates the correctness and feasibilities of our methods,and the others are unknown to us before,which will give new information to this research field.

  1. A Meta-information-Based Method for Rough Sets Rule Parallel Mining%基于元信息的粗糙集规则并行挖掘方法

    Institute of Scientific and Technical Information of China (English)

    苏健; 高济

    2003-01-01

    Rough sets is one important method of data mining. Data mining processes such a great quantity of data inlarge database that the speed of Rough Sets Data Mining Algorithm is critical to Data Mining System. Utilizing net-work computing resources is an effective approach to improve the performance of Data Mining System. This paperproposes the concept of meta-information,which is used to describes the result of Rough Sets Data Mining in informa-tion system,and a meta-information-based method for rule parallel mining. This method decomposes the information-system into a lot of sub-information-system,dispatchs the task of generating meta-information of sub-information-sys-tem to some task performer in the network,and lets them parallel compute meta-information,then synthesizes themeta-information of sub-information-system to the meta-information of information system in the task synthesizer,and finally produces the rule according to the meta-information.

  2. Environmental risks associated to wind erosion in a metal mining area from SE Spain

    Energy Technology Data Exchange (ETDEWEB)

    Garcia Fernandez, G.; Romero Diaz, A.

    2009-07-01

    Soils and mining wastes from the Mediterranean mining area placed in the Sierra Minera Mountains are highly enriched in heavy metals such as lead and zinc, but also other metals such as cadmium and arsenic. Wind erosion in this area could be considered extremely high and hazards associated to this eroded sediments seems to be high because the huge amount of metals present in this wastes. Therefore, combination of high erosion rates and high metal concentration in this mining waste, make those environmental risks can be considered high for the surrounding ecosystems, but also for public health of the nearby villages and towns. In order, to study these wind erosion processes over these mining materials, some experiments for the evaluation of the transportation of soil particles were carried out. Erosion rates in this realm is particularly important during spring months, when increased activity of the eastern winds brings intense soil dragging, with strong effects on the metals dispersion, including the massive removal of sediments. (Author) 16 refs.

  3. Research of Improved FP-Growth Algorithm in Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Yi Zeng

    2015-01-01

    with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.

  4. A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm

    OpenAIRE

    Wanaskar, Ujwala; Vij, Sheetal; Mukhopadhyay, Debajyoti

    2013-01-01

    As the growing interest of web recommendation systems those are applied to deliver customized data for their users, we started working on this system. Generally the recommendation systems are divided into two major categories such as collaborative recommendation system and content based recommendation system. In case of collaborative recommen-dation systems, these try to seek out users who share same tastes that of given user as well as recommends the websites according to the liking given us...

  5. COLLABORATIVE WEB RECOMMENDATION SYSTEMS BASED ON AN EFFECTIVE FUZZY ASSOCIATION RULE MINING ALGORITHM (FARM)

    OpenAIRE

    Dr. P. THAMBIDURAI; A.KUMAR,

    2010-01-01

    With increasing popularity of the web-based systems that are applied in many different areas, they tend to deliver customized informationfor their users by means of utilization of recommendation methods. This recommendation system is mainly classified into two groups:Content-based recommendation and collaborative recommendation system. Content based recommendation tries to recommend web sites similar to those web sites the user has liked, whereas collaborative ecommendation tries to find som...

  6. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  7. An Extensive Review of Significant Researches in Data Mining

    Directory of Open Access Journals (Sweden)

    Paul P. Mathai

    2014-06-01

    Full Text Available An action that removes a few novel nontrivial data enclosed in large databases is defined as Data Mining. On noticing the statistical connections between the items that are more regular in the operation databases traditional data mining methods have spotlighted mostly. Numerous functions are using data mining in dissimilar fields like medical, marketing and so on commonly. Several methods and techniques have been extended for mine the in order from the databases. In this study, we provide a comprehensive survey and study of various methods in existence for item set mining based on the utility and frequency and association rule mining based research works and also presented a brief introduction about data mining and its advantages. Moreover we present a concise description about the Data Mining techniques, performance review and the instructions for future research.

  8. Study on Association between Spatial Distribution of Metal Mines and Disease Mortality: A Case Study in Suxian District, South China

    Directory of Open Access Journals (Sweden)

    Wei Chen

    2013-10-01

    Full Text Available Metal mines release toxic substances into the environment and can therefore negatively impact the health of residents in nearby regions. This paper sought to investigate whether there was excess disease mortality in populations in the vicinity of the mining area in Suxian District, South China. The spatial distribution of metal mining and related activities from 1985 to 2012, which was derived from remote sensing imagery, was overlapped with disease mortality data. Three hotspot areas with high disease mortality were identified around the Shizhuyuan mine sites, i.e., the Dengjiatang metal smelting sites, and the Xianxichong mine sites. Disease mortality decreased with the distance to the mining and smelting areas. Population exposure to pollution was estimated on the basis of distance from town of residence to pollution source. The risk of dying according to disease mortality rates was analyzed within 7–25 km buffers. The results suggested that there was a close relationship between the risk of disease mortality and proximity to the Suxian District mining industries. These associations were dependent on the type and scale of mining activities, the area influenced by mining and so on.

  9. An assessment of microbial communities associated with surface mining-disturbed overburden.

    Science.gov (United States)

    Poncelet, Dominique M; Cavender, Nicole; Cutright, Teresa J; Senko, John M

    2014-03-01

    To assess the microbiological changes that occur during the maturation of overburden that has been disturbed by surface mining of coal, a surface mining-disturbed overburden unit in southeastern Ohio, USA was characterized. Overburden from the same unit that had been disturbed for 37 and 16 years were compared to undisturbed soil from the same region. Overburden and soil samples were collected as shallow subsurface cores from each subregion of the mined area (i.e., land 16 years and 37 years post-mining, and unmined land). Chemical and mineralogical characteristics of overburden samples were determined, as were microbial respiration rates. The composition of microbial communities associated with overburden and soil were determined using culture-independent, nucleic acid-based approaches. Chemical and mineralogical evaluation of overburden suggested that weathering of disturbed overburden gave rise to a setting with lower pH and more oxidized chemical constituents. Overburden-associated microbial biomass and respiration rates increased with time after overburden disturbance. Evaluation of 16S rRNA gene libraries that were produced by "next-generation" sequencing technology revealed that recently disturbed overburden contained an abundance of phylotypes attributable to sulfur-oxidizing Limnobacter spp., but with increasing time post-disturbance, overburden-associated microbial communities developed a structure similar to that of undisturbed soil, but retained characteristics of more recently disturbed overburden. Our results indicate that over time, the biogeochemical weathering of disturbed overburden leads to the development of geochemical conditions and microbial communities that approximate those of undisturbed soil, but that this transition is incomplete after 37 years of overburden maturation.

  10. Mining-induced fault reactivation associated with the main conveyor belt roadway and safety of the Barapukuria Coal Mine in Bangladesh: Constraints from BEM simulations

    Energy Technology Data Exchange (ETDEWEB)

    Islam, Md. Rafiqul; Shinjo, Ryuichi [Department of Physics and Earth Sciences, University of the Ryukyus, Okinawa, 903-0213 (Japan)

    2009-09-01

    Fault reactivation during underground mining is a critical problem in coal mines worldwide. This paper investigates the mining-induced reactivation of faults associated with the main conveyor belt roadway (CBR) of the Barapukuria Coal Mine in Bangladesh. The stress characteristics and deformation around the faults were investigated by boundary element method (BEM) numerical modeling. The model consists of a simple geometry with two faults (Fb and Fb1) near the CBR and the surrounding rock strata. A Mohr-Coulomb failure criterion with bulk rock properties is applied to analyze the stability and safety around the fault zones, as well as for the entire mining operation. The simulation results illustrate that the mining-induced redistribution of stresses causes significant deformation within and around the two faults. The horizontal and vertical stresses influence the faults, and higher stresses are concentrated near the ends of the two faults. Higher vertical tensional stress is prominent at the upper end of fault Fb. High deviatoric stress values that concentrated at the ends of faults Fb and Fb1 indicate the tendency towards block failure around the fault zones. The deviatoric stress patterns imply that the reinforcement strength to support the roof of the roadway should be greater than 55 MPa along the fault core zone, and should be more than 20 MPa adjacent to the damage zone of the fault. Failure trajectories that extend towards the roof and left side of fault Fb indicate that mining-induced reactivation of faults is not sufficient to generate water inflow into the mine. However, if movement of strata occurs along the fault planes due to regional earthquakes, and if the faults intersect the overlying Lower Dupi Tila aquiclude, then liquefaction could occur along the fault zones and enhance water inflow into the mine. The study also reveals that the hydraulic gradient and the general direction of groundwater flow are almost at right angles with the trends of

  11. DDMGD: the database of text-mined associations between genes methylated in diseases from different species

    KAUST Repository

    Raies, A. B.

    2014-11-14

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  12. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  13. Enterprise Human Resources Information Mining Based on Improved Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Lei He

    2013-05-01

    Full Text Available With the unceasing development of information and technology in today’s modern society, enterprises’ demand of human resources information mining is getting bigger and bigger. Based on the enterprise human resources information mining situation, this paper puts forward a kind of improved Apriori algorithm based model on the enterprise human resources information mining, this model introduced data mining technology and traditional Apriori algorithm, and improved on its basis, divided the association rules mining task of the original algorithm into two subtasks of producing frequent item sets and producing rule, using SQL technology to directly generating frequent item sets, and using the method of establishing chart to extract the information which are interested to customers. The experimental results show that the improved Apriori algorithm based model on the enterprise human resources information mining is better in efficiency than the original algorithm, and the practical application test results show that the improved algorithm is practical and effective.

  14. Data Mining for Quality Prediction in Textile Engineering

    Institute of Scientific and Technical Information of China (English)

    YANG Jian-guo; LI Bei-zhi; ZHAO Ya-mei

    2006-01-01

    A data mining method for quality prediction using association rule (DMAR) is presented in this paper.Association rule is used to mine the valuable relations of items among amounts of textile process data for ANN prediction model. DMAR consists of three main steps: setup knowledge data set; data cleaning and converting; find the item set with large supports and generate the expected rules.DMAR effectively improves the precision of prediction in yarn breaking. It rapidly gets rid of the negative influence of training parameters on prediction model. Then more satisfactory quality prediction result can be reached.

  15. 基于Apriori算法的购物篮关联规则分析%Apriori Algorithm Based on Association Rules Analysis of the Shopping Basket

    Institute of Scientific and Technical Information of China (English)

    赵祖应; 丁勇; 邓平

    2012-01-01

    Data mining is the new discipline evolved due to the need of information retrieval from immense amount of data in databases.It relates to subjects in statistics,machine learning,database technique,pattern recognition,artificial intelligence,etc.The competition in IT jobs market is enormous,and data mining-the core technique in data processingis gaining more and more attention.Association rules are commonly used to figure out what relations exist between different data sets in transactional databases and to find out further the customers′purchasing behavior pattern,for example,the influence on customers′buying other products after having bought some kind of products.These rules can be applied in supermarkets to product shelf design,goods deposit and classification of customers according to customers′purchasing pattern.Through discovering of the association rules the development and trend of the underlying objects can be better realized and mastered.In marketing and business investment data mining plays an important role.%数据挖掘是适应信息社会从海量的数据库中提取信息的需要而产生的新学科。它是统计学、机器学习、数据库、模式识别、人工智能等学科的交叉。IT就业市场竞争已经相当激烈,而数据处理的核心技术——数据挖掘更是得到了前所未有的重视。关联规则一般用以发现交易数据库中不同商品(项)之间的联系,用这些规则找出顾客的购买行为模式,比如购买了某一种商品对购买其他商品的影响,这种规则可以应用于超市商品货架设计、货物摆放以及根据购买模式对用户进行分类等。通过发现这个关联的规则,可以更好地了解和掌握事物的发展、动向等。在市场营销、企业投资中具有重要的作用。

  16. Intrusion detection: a novel approach that combines boosting genetic fuzzy classifier and data mining techniques

    Science.gov (United States)

    Ozyer, Tansel; Alhajj, Reda; Barker, Ken

    2005-03-01

    This paper proposes an intelligent intrusion detection system (IDS) which is an integrated approach that employs fuzziness and two of the well-known data mining techniques: namely classification and association rule mining. By using these two techniques, we adopted the idea of using an iterative rule learning that extracts out rules from the data set. Our final intention is to predict different behaviors in networked computers. To achieve this, we propose to use a fuzzy rule based genetic classifier. Our approach has two main stages. First, fuzzy association rule mining is applied and a large number of candidate rules are generated for each class. Then the rules pass through pre-screening mechanism in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the specified classes. Classes are defined as Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L- remote to local. Second, an iterative rule learning mechanism is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. A Boosting mechanism evaluates the weight of each data item in order to help the rule extraction mechanism focus more on data having relatively higher weight. Finally, extracted fuzzy rules having the corresponding weight values are aggregated on class basis to find the vote of each class label for each data item.

  17. A Case Investigation of Product Structure Complexity in Mass Customization Using a Data Mining Approach

    DEFF Research Database (Denmark)

    2014-01-01

    This paper presents a data mining method for analyzing historical configuration data providing a number of opportunities for improving mass customization capabilities. The overall objective of this paper is to investigate how specific quantitative analyses, more specifically the association rule...

  18. The Stability of Memory Rules Associative with the Mathematical Thinking Core

    Directory of Open Access Journals (Sweden)

    Xiuzhen Wang

    2011-02-01

    Full Text Available Activation of how and where arithmetic operations are displayed in the brain has been observed in various number-processing tasks. However, it remains poorly understood whether stabilized memory of Boolean rules are associated with background knowledge. The present study reviewed behavioral and imaging evidence demonstrating that Boolean problem-solving abilities depend on the core systems of number-processing. The core systems account for a mathematical cultural background, and serve as the foundation for sophisticated mathematical knowledge. The Ebbinghaus paradigm was used to investigate learning-induced changes by functional magnetic resonance imaging (fMRI in a retrieval task of Boolean rules. Functional imaging data revealed a common activation pattern in the left inferior parietal lobule and left inferior frontal gyrus during all Boolean tasks, which has been used for number-processing processing in former studies. All other regional activations were tasks-specific and prominently distributed in the left thalamus, bilateral parahippocampal gyrus, bilateral occipital lobe, and other subcortices during contrasting stabilized memory retrieval of Boolean tasks and number-processing tasks. The present results largely verified previous studies suggesting that activation patterns due to number-processing appear to reflect a basic anatomical substrate of stability of Boolean rules memory, which are derived from a network originally related to the core systems of number-processing.

  19. 从肿瘤基因表达数据挖掘分类规则的研究%Mining Classifying Rules from Tumor Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    马猛; 汝颖; 马腾; 钮俊清; 李龙澍; 王煦法

    2009-01-01

    Establishing tumor prediction and classification models using methodology and technology of information science based on the tumor gene expression data is meaningful to the research of tumor gene expression patterns identification and tumor diagnosis and recognition as well. This paper presented a method to construct tumor classifier using the classifying rules directly mined from tumor gene expression data. According to this method, we extracted the experiment sample dataset and then searched classifying features that could respectively mark the tumor and normal sample from this dataset. Based on the classifying features mined, the classifying rules were generated and used to predict each unknown sample according to the principle of highest confidence. The experiment made on the prostate cancer gene expression data from Broad Institute showed that the prediction accuracy of this method was over 90% and a lot of classifying rules with transparent prediction structure were generated at the same time. The experimental results proved the feasibility and effectiveness of this method.%基于肿瘤基因表达数据,利用信息科学的方法和技术建立肿瘤预测分类模型,对肿瘤基因表达模式研究和肿瘤的诊断识别具有重要意义.本研究提出一种从肿瘤基因表达数据中直接挖掘分类规则建立肿瘤预测分类器的方法.该方法首先抽取实验样本集,分别找出标记肿瘤和正常组织样本的分类特征,由此生成可预测样本类别的分类规则,对每个未知类别样本,按照置信度最高原则,选择一个分类规则作为预测结构.本研究的实验数据来自Broad Institute的前列腺癌基因表达数据,实验结果显示该方法的预测精度在90%以上,且同时获得了大量结构透明的分类预测规则,表明本研究的方法是可行的和有效的.

  20. Classification Rule Mining Based on Improved Ant-miner Algorithm%基于改进Ant-miner算法的分类规则挖掘

    Institute of Scientific and Technical Information of China (English)

    肖菁; 梁燕辉

    2012-01-01

    为提高基于传统Ant-miner算法分类规则的预测准确性,提出一种基于改进Ant-miner的分类规则挖掘算法.利用样例在总样本中的密度及比例构造启发式函数,以避免在多个具有相同概率的选择条件下造成算法偏见.对剪枝规则按变异系数进行单点变异,由此扩大规则的搜索空间,提高规则的预测准确度.在Ant-miner算法的信息素更新公式中加入挥发系数,使其更接近现实蚂蚁的觅食行为,防止算法过早收敛.基于UCI标准数据的实验结果表明,该算法相比传统Ant-miner算法具有更高的预测准确度.%In order to improve the classification rule accuracy of the classical Ant-miner algorithm, this paper proposes an improved Ant-miner algorithm for classification rule mining. Heuristic function with sample density and sample proportion is constructed to avoid the bias caused by the same probability in Ant-miner. A pruning strategy with mutation probability is emploied to expand the search space and improve the rule accuracy. An evaporation coefficient in Ant-miner's pheromone update formula is added to slow down the convergence rate of the algorithm. Experimental results on UCI datasets show that the proposed algorithm is promising and can obtain higher predication accuracy than the original Ant-miner algorithm.

  1. Could parental rules play a role in the association between short sleep and obesity in young children?

    Science.gov (United States)

    Jones, Caroline H D; Pollard, Tessa M; Summerbell, Carolyn D; Ball, Helen

    2014-05-01

    Short sleep duration is associated with obesity in young children. This study develops the hypothesis that parental rules play a role in this association. Participants were 3-year-old children and their parents, recruited at nursery schools in socioeconomically deprived and non-deprived areas of a North-East England town. Parents were interviewed to assess their use of sleep, television-viewing and dietary rules, and given diaries to document their child's sleep for 4 days/5 nights. Children were measured for height, weight, waist circumference and triceps and subscapular skinfold thicknesses. One-hundred and eight families participated (84 with complete sleep data and 96 with complete body composition data). Parental rules were significantly associated together, were associated with longer night-time sleep and were more prevalent in the non-deprived-area compared with the deprived-area group. Television-viewing and dietary rules were associated with leaner body composition. Parental rules may in part confound the association between night-time sleep duration and obesity in young children, as rules cluster together across behavioural domains and are associated with both sleep duration and body composition. This hypothesis should be tested rigorously in large representative samples.

  2. 关联规则在阿尔茨海默病中医诊疗中的应用研究%Study on the Application of Association Rules in the TCM Diagnosis and Treatment of Alzheimer Disease

    Institute of Scientific and Technical Information of China (English)

    杨婕

    2013-01-01

    Objectives:find out the relation between Alzheimer disease TCM syndrome and symptom. Methods:mine the data of 109 pieces of clinical cases with Apriori method under association rules, and meanwhile, aimed at the specialty of TCM data, put forward improvements for Apriori method. Results: a series of association rules were mined, which provide important basis for the definite diagnosis of Alzheimer disease. Conclusions:association rules are suitable for interior principles of TCM treatment based on syndrome differentiation in study of Alzheimer disease and provide reliable basis for the definite diagnosis of Alzheimer disease.%  目的:寻找阿尔茨海默病中医证型与中医症状之间的关系。方法:利用关联规则的Apriori算法对109例临床数据进行挖掘,同时针对中医药数据的特殊性,提出关于Apriori算法的改进。结果:挖掘出一系列关联规则,为阿尔茨海默病的确诊提供了重要依据。结论:关联规则适合于研究阿尔茨海默病中医药辨证论治的内部规律,为阿尔茨海默病的确诊提供可靠依据。

  3. An Improved Image Mining Technique For Brain Tumour Classification Using Efficient Classifier

    OpenAIRE

    Rajendran, P.; M.Madheswaran

    2010-01-01

    An improved image mining technique for brain tumor classification using pruned association rule with MARI algorithm is presented in this paper. The method proposed makes use of association rule mining technique to classify the CT scan brain images into three categories namely normal, benign and malign. It combines the low-level features extracted from images and high level knowledge from specialists. The developed algorithm can assist the physicians for efficient classification with multiple ...

  4. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

  5. MeInfoText: associated gene methylation and cancer information from text mining

    Directory of Open Access Journals (Sweden)

    Juan Hsueh-Fen

    2008-01-01

    Full Text Available Abstract Background DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells. As more epigenetics research is published electronically, it is desirable to extract relevant information from biological literature. To facilitate epigenetics research, we have developed a database called MeInfoText to provide gene methylation information from text mining. Description MeInfoText presents comprehensive association information about gene methylation and cancer, the profile of gene methylation among human cancer types and the gene methylation profile of a specific cancer type, based on association mining from large amounts of literature. In addition, MeInfoText offers integrated protein-protein interaction and biological pathway information collected from the Internet. MeInfoText also provides pathway cluster information regarding to a set of genes which may contribute the development of cancer due to aberrant methylation. The extracted evidence with highlighted keywords and the gene names identified from each methylation-related abstract is also retrieved. The database is now available at http://mit.lifescience.ntu.edu.tw/. Conclusion MeInfoText is a unique database that provides comprehensive gene methylation and cancer association information. It will complement existing DNA methylation information and will be useful in epigenetics research and the prevention of cancer.

  6. A Knowledge Mining Model for Ranking Institutions using Rough Computing with Ordering Rules and Formal Concept Analysis

    Directory of Open Access Journals (Sweden)

    D P Acharjya

    2011-03-01

    Full Text Available Emergences of computers and information technological revolution made tremendous changes in the real world and provides a different dimension for the intelligent data analysis. Well formed fact, the information at right time and at right place deploy a better knowledge. However, the challenge arises when larger volume of inconsistent data is given for decision making and knowledge extraction. To handle such imprecise data certain mathematical tools of greater importance has developed by researches in recent past namely fuzzy set, intuitionistic fuzzy set, rough Set, formal concept analysis and ordering rules. It is also observed that many information system contains numerical attribute values and therefore they are almost similar instead of exact similar. To handle such type of information system, in this paper we use two processes such as pre process and post process. In pre process we use rough set on intuitionistic fuzzy approximation space with ordering rules for finding the knowledge whereas in post process we use formal concept analysis to explore better knowledge and vital factors affecting decisions.

  7. Polypharmacy in older adults: Association Rule and Frequent-Set Analysis to evaluate concomitant medication use.

    Science.gov (United States)

    Held, Fabian; Le Couteur, David G; Blyth, Fiona M; Hirani, Vasant; Naganathan, Vasi; Waite, Louise M; Seibel, Markus J; Handelsman, David J; Cumming, Robert G; Allore, Heather G; Gnjidic, Danijela

    2017-02-01

    The aim of this study was to apply Association Rule and Frequent-Set analysis, and novel means of data visualisation to ascertain patterns of medication use and medication combinations contributing to medication group clusters according to geriatric syndrome status in older adults. Participants were community-dwelling men (aged ≥70 years, n=1686), Sydney, Australia. Medication exposure was categorised at medication class level and data were analysed according to geriatric syndrome status (presence of at least one syndrome including frailty, falls, cognitive impairment and urinary incontinence). Association Rule and Frequent-Set analysis were performed to identify "interesting" patterns of medication combinations that occur together. This analysis involves advanced computer algorithms that investigated all possible combinations of medications in the dataset in order to identify those which are observed more or much less frequently than expected. Frequent-Set Analysis demonstrated one unexpected medication combination, antiulcer and antidiabetic medications (3.5% of participants) in the overall population (n=1687). Frequency of medication combinations was similar in participants with (n=666) and without (n=1020) geriatric syndromes. Among participants with geriatric syndromes, the most frequent combinations included antigout with lipid-lowering agents (5.7%) followed by angiotensin II and diuretics combination (22%). This novel methodology can be used to detect common medication combinations overall by data visualisation, and against specific adverse drug reactions such as geriatric syndromes. This methodology may be a valuable pharmacovigilance approach to monitor large databases for the safety of medications.

  8. A Chaotic Home Environment Accounts for the Association between Respect for Rules Disposition and Reading Comprehension: A Twin Study.

    Science.gov (United States)

    Taylor, Jeanette; Hart, Sara A

    2014-10-01

    This study examined the association between socioemotional dispositions from the developmental propensity model and reading comprehension and whether those associations could be accounted for by level of chaos in the home. Data from 342 monozygotic and 333 same-sex dizygotic twin pairs age 7-13 years were used. A parent rated the twins on sympathy, respect for rules, negative emotionality, and daring and level of chaos in the twins' home. Reading comprehension was measured using a state-wide school assessment. Only respect for rules significantly and uniquely predicted reading comprehension. Biometric models indicated that respect for rules was positively associated with reading comprehension via the shared environment and home chaos accounted for a significant amount of that shared environmental variance even after controlling for family income. Children with higher respect for rules have better reading comprehension scores in school and this relationship owes partly to the level of chaos in the family home.

  9. FP-tree association rules algorithm in recommendation system%基于FP-tree算法的推荐系统设计与实现

    Institute of Scientific and Technical Information of China (English)

    刘华; 张亚昕

    2015-01-01

    当前是信息爆炸的时代,推荐系统已成为解决当前网络信息超载的有效工具。文章针对网上书店的电子商务网站的销售特点,详细地设计了推荐系统,并利用挖掘技术中的FP-tree关联规则算法实现数据挖掘运算,很好的实现了在线推荐的系统功能。%This is the era of information explosion, recommendation system has become an effective tool for solving the current network information overload. Aiming at the characteristics of online bookstores sell e-commerce site, a detailed design of the recommendation system, and using mining techniques in FP-tree data mining association rules algorithm computation, to achieve a good online recommendation system functions.

  10. Radio-Ecological Situation in the Area of the Priargun Production Mining and Chemical Association - 13522

    Energy Technology Data Exchange (ETDEWEB)

    Semenova, M.P.; Seregin, V.A.; Kiselev, S.M.; Titov, A.V. [FSBI SRC A.I. Burnasyan Federal Medical Biophysical Center of FMBA of Russia, Zhivopisnaya Street, 46, Moscow (Russian Federation); Zhuravleva, L.A. [FSHE ' Centre of Hygiene and Epidemiology no. 107' under FMBA of Russia (Russian Federation); Marenny, A.M. [Ltd ' Radiation and Environmental Researches' (Russian Federation)

    2013-07-01

    'The Priargun Production Mining and Chemical Association' (hereinafter referred to as PPMCA) is a diversified mining company which, in addition to underground mining of uranium ore, carries out refining of such ores in hydrometallurgical process to produce natural uranium oxide. The PPMCA facilities are sources of radiation and chemical contamination of the environment in the areas of their location. In order to establish the strategy and develop criteria for the site remediation, independent radiation hygienic monitoring is being carried out over some years. In particular, this monitoring includes determination of concentration of the main dose-forming nuclides in the environmental media. The subjects of research include: soil, grass and local foodstuff (milk and potato), as well as media of open ponds (water, bottom sediments, water vegetation). We also measured the radon activity concentration inside surface workshops and auxiliaries. We determined the specific activity of the following natural radionuclides: U-238, Th-232, K-40, Ra-226. The researches performed showed that in soil, vegetation, groundwater and local foods sampled in the vicinity of the uranium mines, there is a significant excess of {sup 226}Ra and {sup 232}Th content compared to areas outside the zone of influence of uranium mining. The ecological and hygienic situation is as follows: - at health protection zone (HPZ) gamma dose rate outdoors varies within 0.11 to 5.4 μSv/h (The mean value in the reference (background) settlement (Soktui-Molozan village) is 0.14 μSv/h); - gamma dose rate in workshops within HPZ varies over the range 0.14 - 4.3 μSv/h. - the specific activity of natural radionuclides in soil at HPZ reaches 12800 Bq/kg and 510 Bq/kg for Ra-226 and Th-232, respectively. - beyond HPZ the elevated values for {sup 226}Ra have been registered near Lantsovo Lake - 430 Bq/kg; - the radon activity concentration in workshops within HPZ varies over the range 22 - 10800 Bq

  11. Ecological and human health risks associated with abandoned gold mine tailings contaminated soil

    Science.gov (United States)

    Ngole-Jeme, Veronica Mpode; Fantke, Peter

    2017-01-01

    Gold mining is a major source of metal and metalloid emissions into the environment. Studies were carried out in Krugersdorp, South Africa, to evaluate the ecological and human health risks associated with exposure to metals and metalloids in mine tailings contaminated soils. Concentrations of arsenic (As), cadmium (Cd), chromium (Cr), cobalt (Co), copper (Cu), lead (Pb), manganese (Mn), nickel (Ni), and zinc (Zn) in soil samples from the area varied with the highest contamination factors (expressed as ratio of metal or metalloid concentration in the tailings contaminated soil to that of the control site) observed for As (3.5x102), Co (2.8x102) and Ni (1.1x102). Potential ecological risk index values for metals and metalloids determined from soil metal and metalloid concentrations and their respective risk factors were correspondingly highest for As (3.5x103) and Co (1.4x103), whereas Mn (0.6) presented the lowest ecological risk. Human health risk was assessed using Hazard Quotient (HQ), Chronic Hazard Index (CHI) and carcinogenic risk levels, where values of HQ > 1, CHI > 1 and carcinogenic risk values > 1×10−4 represent elevated risks. Values for HQ indicated high exposure-related risk for As (53.7), Cr (14.8), Ni (2.2), Zn (2.64) and Mn (1.67). Children were more at risk from heavy metal and metalloid exposure than adults. Cancer-related risks associated with metal and metalloid exposure among children were also higher than in adults with cancer risk values of 3×10−2 and 4×10−2 for As and Ni respectively among children, and 5×10−3 and 4×10−3 for As and Ni respectively among adults. There is significant potential ecological and human health risk associated with metal and metalloid exposure from contaminated soils around gold mine tailings dumps. This could be a potential contributing factor to a setback in the health of residents in informal settlements dominating this mining area as the immune systems of some of these residents are already

  12. Hierarchical Approach for Online Mining--Emphasis towards Software Metrics

    CERN Document Server

    Saradhi, M V Vijaya; Satish, P

    2010-01-01

    Several multi-pass algorithms have been proposed for Association Rule Mining from static repositories. However, such algorithms are incapable of online processing of transaction streams. In this paper we introduce an efficient single-pass algorithm for mining association rules, given a hierarchical classification amongest items. Processing efficiency is achieved by utilizing two optimizations, hierarchy aware counting and transaction reduction, which become possible in the context of hierarchical classification. This paper considers the problem of integrating constraints that are Boolean expression over the presence or absence of items into the association discovery algorithm. This paper present three integrated algorithms for mining association rules with item constraints and discuss their tradeoffs. It is concluded that the variation of complexity depends on the measure of DIT (Depth of Inheritance Tree) and NOC (Number of Children) in the context of Hierarchical Classification.

  13. Data Mining for Identifying Novel Associations and Temporal Relationships with Charcot Foot

    Directory of Open Access Journals (Sweden)

    Michael E. Munson

    2014-01-01

    Full Text Available Introduction. Charcot foot is a rare and devastating complication of diabetes. While some risk factors are known, debate continues regarding etiology. Elucidating other associated disorders and their temporal occurrence could lead to a better understanding of its pathogenesis. We applied a large data mining approach to Charcot foot for elucidating novel associations. Methods. We conducted an association analysis using ICD-9 diagnosis codes for every patient in our health system (n=1.6 million with 41.2 million time-stamped ICD-9 codes. For the current analysis, we focused on the 388 patients with Charcot foot (ICD-9 713.5. Results. We found 710 associations, 676 (95.2% of which had a P value for the association less than 1.0×10−5 and 603 (84.9% of which had an odds ratio > 5.0. There were 111 (15.6% associations with a significant temporal relationship P<1.0×10−3. The three novel associations with the strongest temporal component were cardiac dysrhythmia, pulmonary eosinophilia, and volume depletion disorder. Conclusion. We identified novel associations with Charcot foot in the context of pathogenesis models that include neurotrophic, neurovascular, and microtraumatic factors mediated through inflammatory cytokines. Future work should focus on confirmatory analyses. These novel areas of investigation could lead to prevention or earlier diagnosis.

  14. [Research on medical data mining and its applications].

    Science.gov (United States)

    Liu, Chanzhen; Wang, Youjun

    2014-10-01

    With the development of computer technology, medical data has developed from traditional paper pattern into electronic mode, which could effectively promote the medical development. This paper at first presents the status and characteristics of medical data mining. Then, it discusses the critical method of medical data mining in classification, clustering and prediction, respectively. The paper focuses on the application and assessment of five algorithms which are designed for medical data mining, including decision tree, cluster analysis, association rule, intelligent algorithm and the mix algorithm. Finally, this paper outlooks the data mining application in medical domain.

  15. The speed of learning instructed stimulus-response association rules in human: experimental data and model.

    Science.gov (United States)

    Bugmann, Guido; Goslin, Jeremy; Duchamp-Viret, Patricia

    2013-11-01

    Humans can learn associations between visual stimuli and motor responses from just a single instruction. This is known to be a fast process, but how fast is it? To answer this question, we asked participants to learn a briefly presented (200ms) stimulus-response rule, which they then had to rapidly apply after a variable delay of between 50 and 1300ms. Participants showed a longer response time with increased variability for short delays. The error rate was low and did not vary with the delay, showing that participants were able to encode the rule correctly in less than 250ms. This time is close to the fastest synaptic learning speed deemed possible by diffusive influx of AMPA receptors. Learning continued at a slower pace in the delay period and was fully completed in average 900ms after rule presentation onset, when response latencies dropped to levels consistent with basic reaction times. A neural model was proposed that explains the reduction of response times and of their variability with the delay by (i) a random synaptic learning process that generates weights of average values increasing with the learning time, followed by (ii) random crossing of the firing threshold by a leaky integrate-and-fire neuron model, and (iii) assuming that the behavioural response is initiated when all neurons in a pool of m neurons have fired their first spike after input onset. Values of m=2 or 3 were consistent with the experimental data. The proposed model is the simplest solution consistent with neurophysiological knowledge. Additional experiments are suggested to test the hypothesis underlying the model and also to explore forgetting effects for which there were indications for the longer delay conditions. This article is part of a Special Issue entitled Neural Coding 2012.

  16. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    Directory of Open Access Journals (Sweden)

    Joanna F Dipnall

    Full Text Available Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010. Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30, serum glucose (OR 1.01; 95% CI 1.00, 1.01 and total bilirubin (OR 0.12; 95% CI 0.05, 0.28. Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016, and current smokers (p<0.001.The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling

  17. Seismic failure mechanisms for loaded slopes with associated and nonassociated flow rules

    Institute of Scientific and Technical Information of China (English)

    YANG Xiao-li; SUI Zhi-rong

    2008-01-01

    Seismic failure mechanisms were investigated for soil slopes subjected to strip load with upper bound method of limit analysis and finite difference method of numerical simulation, considering the influence of associated and nonassociated flow rules. Quasi-static representation of soil inertia effects using a seismic coefficient concept was adopted for seismic failure analysis. Numerical study was conducted to investigate the influences of dilative angle and earthquake on the seismic failure mechanisms for the loaded slope, and the failure mechanisms for different dilation angles were compared. The results show that dilation angle has influences on the seismic failure surfaces, that seismic maximum displacement vector decreases as the dilation angle increases, and that seismic maximum shear strain rate decreases as the dilation angle increases.

  18. Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the demand of text classification is increasing tremendously. Keeping this demand into consideration, new and updated techniques are being developed for the purpose of automated text classification. This paper presents a new algorithm for text classification. Instead of using words, word relation i.e. association rules is used to derive feature set from pre-classified text documents. The concept of Naive Bayes Classifier is then used on derived features and finally a concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental ...

  19. Data mining for identifying novel associations and temporal relationships with Charcot foot.

    Science.gov (United States)

    Munson, Michael E; Wrobel, James S; Holmes, Crystal M; Hanauer, David A

    2014-01-01

    INTRODUCTION. Charcot foot is a rare and devastating complication of diabetes. While some risk factors are known, debate continues regarding etiology. Elucidating other associated disorders and their temporal occurrence could lead to a better understanding of its pathogenesis. We applied a large data mining approach to Charcot foot for elucidating novel associations. METHODS. We conducted an association analysis using ICD-9 diagnosis codes for every patient in our health system (n = 1.6 million with 41.2 million time-stamped ICD-9 codes). For the current analysis, we focused on the 388 patients with Charcot foot (ICD-9 713.5). RESULTS. We found 710 associations, 676 (95.2%) of which had a P value for the association less than 1.0 × 10⁻⁵ and 603 (84.9%) of which had an odds ratio > 5.0. There were 111 (15.6%) associations with a significant temporal relationship (P Charcot foot in the context of pathogenesis models that include neurotrophic, neurovascular, and microtraumatic factors mediated through inflammatory cytokines. Future work should focus on confirmatory analyses. These novel areas of investigation could lead to prevention or earlier diagnosis.

  20. Pollution of the stream waters and sediments associated with the Crucea uranium mine (East Carpathians, Romania)

    Science.gov (United States)

    Petrescu, L.; Bilal, E.; Iatan, E. L.

    2009-04-01

    standards limits. The uranium concentration ranged from a value of 0.016-mg•L-1 to 1.43-mg•L-1, with a mean of 0.365-mg•L-1. A remarkably good correlation exists between dissolved U and the total anion concentrations, indicating that uranium in these stream waters derived mainly from oxidation of uraniferous bitumen and/or dissolution of carbonates. Based on the correlation dependence (r= 0.69) between U and the sum of Ca + Mg + K + Na major cations and the linear correlation (r= 0.70) between U and silica, we find silicate weathering as an additional source of soluble uranium. The concentrations of dissolved Th are quite low, with median values of 0.015- mg•L-1. The linear variation of dissolved thorium concentration with carbonate alkalinity (r = 0.86) strongly suggests that these concentrations are due to the increase alkalinity. The metals released (U, Th and Pb) are amplified by mining activities. The pollution degree of the sediments was classified using the index of geo-accumulation (Igeo). The Igeo of U, Th and Pb presents medium and punctual high values that represent sediments with strongly to extremely polluted classification (Igeo > 6), while the rest of the elements presents concentration close to the background values or lowers to them. 71% of uranium from bottom sediments is present as primary fractions and 21% is associated to carbonates. Thorium resulted even more insoluble (94% in primary fractions). In view of the substantial mobility and bioavailability of the fractions, this is not an alarming feature. Although neither U nor Th has an appreciable "exchangeable" fraction, the isolation of specific U- and Th-rich sediment fractions helped to identify connections between bioavailability and genesis of sediments, which control ecosystem cycling of U and Th. The measurements carried out in the surroundings of a local uranium mine show that the impact of Crucea mine on water quality downstream of mining area is insignificant.

  1. Mercury contamination associated with small-scale gold mining in Tanzania and Zimbabwe.

    Science.gov (United States)

    van Straaten, P

    2000-10-01

    Mercury contamination associated with small-scale gold mining and processing represents a major environmental and human health concern in Eastern and Southern Africa. Approximately 200,000-300,000 persons are involved in small-scale gold mining activities in Tanzania and > 200,000 persons in Zimbabwe. Mercury (Hg) is used mainly for the processing of primary gold quartz veins and supergene gold mineralizations. Gravimetric material flow analyses show that 70-80% of the Hg is lost to the atmosphere during processing, 20-30% are lost to tailings, soils, stream sediments and water. For every 1 g Au produced, 1.2-1.5 g Hg are lost to the environment. Cumulatively, the anthropogenic Hg released annually into the atmosphere is approximately 3-4 t in the whole Lake Victoria Goldfields of Tanzania and > 3 t in Zimbabwe. Tailings are local 'hot spots' with high concentrations of As, Pb, Cu and Hg. Lateral and vertical dispersion of Hg lost to soils and stream sediments is very limited (laterally Dispersion of mercury from tailings is low because Hg is transported largely in the elemental, metallic form. In addition, Fe-oxide rich laterites and swamps appear to be natural barriers for the dispersion of metals in soils and streams. Ground and surface water quality data indicate very low dispersion rates during the dry season.

  2. Effects of mining-associated lead and zinc soil contamination on native floristic quality

    Science.gov (United States)

    Struckhoff, Matthew A.; Stroh, Esther D.; Grabner, Keith W.

    2013-01-01

    We assessed the quality of plant communities across a range of lead (Pb) and zinc (Zn) soil concentrations at a variety of sites associated with Pb mining in southeast Missouri, USA. In a novel application, two standard floristic quality measures, Mean Coefficient of Conservatism (Mean C) and Floristic Quality Index (FQI), were examined in relation to concentrations of Pb and Zn, soil nutrients, and other soil characteristics. Nonmetric Multidimensional Scaling and Regression Tree Analyses identified soil Pb and Zn concentrations as primary explanatory variables for plant community composition and indicated negative relationships between soil metals concentrations and both Mean C and FQI. Univariate regression also demonstrated significant negative relationships between metals concentrations and floristic quality. The negative effects of metals in native soils with otherwise relatively undisturbed conditions indicate that elevated soil metals concentrations adversely affect native floristic quality where no other human disturbance is evident.

  3. Characterization and resource recovery potential of precipitates associated with abandoned coal mine drainage

    Energy Technology Data Exchange (ETDEWEB)

    Kairies, C.L.; Watzlaf, G.R.; Hedin, R.S.; Capo, R.C. [University of Pittsburgh, Pittsburgh, PA (United States). Dept. of Geology and Planetary Science

    2001-07-01

    Sludge samples from untreated and passively treated coal mine drainage discharges were characterized using NAA, ICP-AES, XRD and SEM. Iron content ranges from 25 to 68 dry wt%, and goethite is the dominant mineral (40-90 dry wt%). The majority of particles have a spiky spherical morphology (0.5-2.0 {mu}m diameter). Within several passive treatment systems, iron content remains relatively constant, and concentrations of Mn, Co, Ni and Zn increase, while As concentration decrease. Initial findings indicate that some sludges are suitable for industrial and manufacturing uses although high concentrations of trace elements such as As may prevent use in cosmetics or foods. These associations could be related to the depositional environment of the coal seam from which the discharge originates. Subsurface cation exchange and sorption processes can influence the trace elements that accumulate in the sludge. 5 refs., 1 tab.

  4. Domain-oriented evaluation method of association rules and its application%面向领域的关联规则评价方法及其应用

    Institute of Scientific and Technical Information of China (English)

    陈鹏; 谭励; 于重重

    2011-01-01

    To deal with the problems of evaluation criteria of support-confidence framework in association rule mining, such as being lack of specific applications analysis and hard to use mining results for decision-making, a method for evaluating domain-oriented association rules is proposed. Taking domain knowledge as a basis, the rules that meet the degrees of technical interest and commercial interest are given out. According to 40 healthy residential survey data in the pilot project of national housing engineer center, some experiments and analysis are carried out. Meanwhile, a data mining system for health living domain is constructed. The system is designed by multi-level software architecture with several modules, including knowledge base management, mining data selection, data preprocessing, domain-driven mining and results evaluation. Consequently, performances of the proposed method are demonstrated by experiments and the application system.%针对关联规则挖掘中,基于支持度-置信度框架的关联规则评价标准存在缺乏具体应用领域的分析,挖掘结果很难用于用户决策等问题,提出一种面向领域关联规则评价方法.该方法以领域知识为基准,发现满足技术兴趣度和商业兴趣度的规则,以国家住宅工程中心40个健康住宅试点项目的实际调查数据为例,进行试验和分析.在此基础上,设计并开发了居住健康领域挖掘系统,该系统采用多层次软件架构,包括知识库管理、挖掘数据选择、数据预处理、领域挖掘和结果评价等功能.实验结果和系统应用结果表明了面向领域关联规则评价方法的有效性.

  5. Application of Data Mining Techniques to a Selected Business Organisation with Special Reference to Buying Behaviour

    CERN Document Server

    Hilage, Tejaswini

    2011-01-01

    Data mining is a new concept & an exploration and analysis of large data sets, in order to discover meaningful patterns and rules. Many organizations are now using the data mining techniques to find out meaningful patterns from the database. The present paper studies how data mining techniques can be apply to the large database. These data mining techniques give certain behavioral pattern from the database. The results which come after analysis of the database are useful for organization. This paper examines the result after applying association rule mining technique, rule induction technique and Apriori algorithm. These techniques are applied to the database of shopping mall. Market basket analysis is performing by the above mentioned techniques and some important results are found such as buying behavior.

  6. Application of data mining techniques to a selected business organization with special reference to buying behavior

    Directory of Open Access Journals (Sweden)

    Tejaswini Abhijit Hilage

    2011-12-01

    Full Text Available Data mining is a new concept & an exploration and analysis of large data sets, in order to discover meaningful patterns and rules. Many organizations are now using the data mining techniques to find outmeaningful patterns from the database. The present paper studies how data mining techniques can be apply to the large database. These data mining techniques give certain behavioral pattern from the database. The results which come after analysis of the database are useful for organization. This paper examines theresult after applying association rule mining technique, rule induction technique and Apriori algorithm. These techniques are applied to the database of shopping mall. Market basket analysis is performing by the above mentioned techniques and some important results are found such as buying behavior.

  7. Pushing Multiple Convertible Constrains into Frequent Itemsets Mining

    Institute of Scientific and Technical Information of China (English)

    SONG Baoli; QIN Zheng

    2006-01-01

    Constraint pushing techniques have been developed for mining frequent patterns and association rules. However, multiple constraints cannot be handled with existing techniques in frequent pattern mining. In this paper, a new algorithm MCFMC (mining complete set of frequent itemsets with multiple constraints) is introduced. The algorithm takes advantage of the fact that a convertible constraint can be pushed into mining algorithm to reduce mining research spaces. By using a sample database, the algorithm develops techniques which select an optimal method based on a sample database to convert multiple constraints into multiple convertible constraints, disjoined by conjunction and/or, and then partition these constraints into two parts. One part is pushed deep inside the mining process to reduce the research spaces for frequent itemsets, the other part that cannot be pushed in algorithm is used to filter the complete set of frequent itemsets and get the final result. Results from our detailed experiment show the feasibility and effectiveness of the algorithm.

  8. Disease Prediction in Data Mining Technique – A Survey

    Directory of Open Access Journals (Sweden)

    S. Sudha

    2013-01-01

    Full Text Available Data mining is defined as sifting through very large amounts of data for useful information. Some of the most important and popular data mining techniques are association rules, classification, clustering, prediction and sequential patterns. Data mining techniques are used for variety of applications. In health care industry, data mining plays an important role for predicting diseases. For detecting a disease number of tests should be required from the patient. But using data mining technique the number of test should be reduced. This reduced test plays an important role in time and performance. This technique has an advantages and disadvantages. This research paper analyzes how data mining techniques are used for predicting different types of diseases. This paper reviewed the research papers which mainly concentrated on predicting heart disease, Diabetes and Breast cancer.

  9. An Improved Image Mining Technique For Brain Tumour Classification Using Efficient Classifier

    Directory of Open Access Journals (Sweden)

    P. Rajendran

    2009-12-01

    Full Text Available An improved image mining technique for brain tumor classification using pruned association rule with MARI algorithm is presented in this paper. The method proposed makes use of association rule mining technique to classify the CT scan brain images into three categories namely normal, benign and malign. It combines the low-level features extracted from images and high level knowledge from specialists. The developed algorithm can assist the physicians for efficient classification with multiple keywords per image to improve the accuracy. The experimental result on pre-diagnosed database of brain images showed 96% and 93% sensitivity and accuracy respectively.Keywords- Data mining; Image ming; Association rule mining; Medical Imaging; Medical image diagnosis; Classification;

  10. Goal directed worry rules are associated with distinct patterns of amygdala functional connectivity and vagal modulation during perseverative cognition

    Directory of Open Access Journals (Sweden)

    Frances Meeten

    2016-11-01

    Full Text Available Excessive and uncontrollable worry is a defining feature of Generalized Anxiety Disorder. An important endeavor in the treatment of pathological worry is to understand why some people are unable to stop worrying once they have started. Worry perseveration is associated with a tendency to deploy goal-directed worry rules (known as ‘as many as can’ worry rules; AMA. These require attention to the goal of the worry task and continuation of worry until the aims of the ‘worry bout’ are achieved. This study examined the association between the tendency to use AMA worry rules and neural and autonomic responses to a perseverative cognition induction. To differentiate processes underlying AMA worry rule use from trait worry, we also examined the relationship between scores on the Penn State Worry Questionnaire and neural and autonomic responses following the same induction. We used resting-state functional magnetic resonance brain imaging while measuring emotional bodily arousal from heart rate variability (where decreased HRV indicates stress-related parasympathetic withdrawal in 19 patients with GAD and 21 control participants. Seed-based analyses were conducted to quantify brain changes in functional connectivity with the amygdala. The tendency to adopt an AMA worry rule was associated with validated measures of worry, anxiety, depression, and rumination. AMA worry rule endorsement predicted a stronger decrease in HRV and was positively associated with increased connectivity between right amygdala and locus coeruleus, a brainstem noradrenergic projection nucleus. Higher AMA scores were also associated with increased connectivity between amygdala and rostral superior frontal gyrus. Higher PSWQ scores amplified decreases in functional connectivity between right amygdala and subcallosal cortex, bilateral inferior frontal gyrus, middle frontal gyrus, and areas of parietal cortex. Our results identify neural mechanisms underlying the deployment of

  11. 75 FR 1426 - National Futures Association; Notice of Filing and Immediate Effectiveness of Proposed Rule...

    Science.gov (United States)

    2010-01-11

    ... advertising of security futures products. The NFA believes the proposed rule change accomplishes this by... From the Federal Register Online via the Government Publishing Office SECURITIES AND EXCHANGE... Section 19b(7) of the Securities Exchange Act of 1934 (``Act''),\\1\\ and Rule 19b-7 under the...

  12. 一种面向时空数据的关联规则更新算法%An Updating Algorithm for Spatial and Temporal Data Association Rule

    Institute of Scientific and Technical Information of China (English)

    刘伯红; 王娟娟

    2015-01-01

    Most of the present updating association rule algorithms have drawbacks that produce a large number of can‐didate sets ,multiple scans of the database ,and have a little research on the spatial and temporal data .To solve this problem , an updating association rule algorithm based on sliding window is proposed in this paper which encodes access data in memory and then only mines the encoding data in memory directly ,without repeatedly reading the database information .Meanwhile , the algorithm adds a space constraints to filter irrelevant space data when generating candidate sets by frequent itemsets to improve the execution speed and processing performance .Experiment results show that the algorithm has higher mining effi‐ciency and has important application value for intelligent transportation ,command and control ,etc .%现有的关联规则更新算法大多具有产生大量候选项集和多次扫描数据库的弊端,而且对时空数据的研究少之又少。针对此问题,论文提出一种基于滑动窗口的关联规则更新算法,此算法将访问数据进行行程长度编码并存储于存储器中,然后只需对存储器中的编码数据进行挖掘,不需反复读取数据库信息。同时该算法在由频繁项集产生候选项集时添加了空间约束条件,过滤了空间不相关数据,提高了算法的执行速度和处理效能。通过实验论证,此算法具有更高的挖掘效率,对智能交通、指挥控制等领域有着重要的应用价值。

  13. Associated rules between microstructure characterization parameters and contact characteristic parameters of two cylinders

    Institute of Scientific and Technical Information of China (English)

    周炜; 唐进元; 何艳飞; 廖东日

    2015-01-01

    The contact strength calculation of two curved rough surfaces is a forefront issue of Hertz contact theory and method. Associated rules between rough surface characterization parameters (correlation length, and root mean square deviation) and contact characteristic parameters (contact area, maximum contact pressure, contact number, and contact width) of two rough cylinders are mainly studied. The contact model of rough cylinders is deduced based on GW model. As there is no analytical solution for the pressure distribution equation, an approximate iterative solution method for the pressure distribution is adopted. Furthermore, the quantitative relationships among the correlation length, the root mean square deviation, the asperity radius of curvature and the asperity density are also obtained based on a numerical simulation method. The maximum contact pressure and the contact number decrease with the increase of correlation length, while the contact width and the contact area are on the contrary. The contact width increases with the increase of root mean square deviation while the maximum contact pressure, the contact area and the contact number decrease.

  14. Associated rules between microstructure characterization parameters and contact characteristic parameters of two cylinders

    Institute of Scientific and Technical Information of China (English)

    周炜; 唐进元; 何艳飞; 廖东日

    2015-01-01

    The contact strength calculation of two curved rough surfaces is a forefront issue of Hertz contact theory and method. Associated rules between rough surface characterization parameters(correlation length, and root mean square deviation) and contact characteristic parameters(contact area, maximum contact pressure, contact number, and contact width) of two rough cylinders are mainly studied. The contact model of rough cylinders is deduced based on GW model. As there is no analytical solution for the pressure distribution equation, an approximate iterative solution method for the pressure distribution is adopted. Furthermore, the quantitative relationships among the correlation length, the root mean square deviation, the asperity radius of curvature and the asperity density are also obtained based on a numerical simulation method. The maximum contact pressure and the contact number decrease with the increase of correlation length, while the contact width and the contact area are on the contrary. The contact width increases with the increase of root mean square deviation while the maximum contact pressure, the contact area and the contact number decrease.

  15. 基于关联规则与熵聚类的安神类中成药组方规律研究%Analysis on Composition Rules of TCM Tranquilizer Based on Association Rules and Clustering Algorithm

    Institute of Scientific and Technical Information of China (English)

    吴嘉瑞; 金燕萍; 张晓朦; 张冰; 盛晓光

    2015-01-01

    Objective:To explore composition rules of TCM tranquilizer prescriptions.Methods:The tranquilizer prescriptions in“The New National Medicine”were collected to build a database based on traditional Chinese medicine inheritance assist system. The methods of association rules with apriori algorithm and complex system entropy cluster were used to achieve the frequency of medicines and association rules between drugs.Results:The data-mining results indicated that in the tranquilizer prescriptions,the highest frequently used drugs were Poria Cocos Wolff,Radix Glycyrrhizae,Angelica sinensis,Radix Ophiopogonis,Cinnabaris. The most frequent drug combinations were “Angelica sinensis,Poria Cocos Wolff”,“Poria Cocos Wolff,Parched Semen Ziziphi Spinosae”,“Radix Glycyrrhizae,Poria Cocos Wolff”.The drugs with a high degree confidence coefficient of association rules in-cluded “Calculus Bovis,Cinnabaris”,“Semen Ziziphi Spinosae,Poria Cocos Wolff”.The new prescriptions contained Poria Co-cos Wolff,Parched Semen Ziziphi Spinosae,Radix Rehmanniae Preparata,Fructus Schisandrae Chinensis,Radix Salviae Miltior-rhizae,Radix Ophiopogonis,and Radix Rehmanniae Exsiccata.Conclusion:Chinese medicine drugs in tranquilizer prescriptions usually have the effects of nourishing the blood,calming mind,benefiting the qi,replenishing the yin and quieting the spirit.%目的:分析常用安神类中成药的处方用药规律。方法:收集《新编国家中成药》中的安神类药品处方,基于中医传承辅助系统建立处方数据库,采用关联规则apriori算法、复杂系统熵聚类等方法开展研究,确定处方中各种药物的使用频次及药物之间的关联规则等。结果:高频次药物包括茯苓、甘草、当归、麦冬、朱砂等;高频次药物组合包括“当归、茯苓”“茯苓、炒酸枣仁”“甘草、茯苓”等;置信度较高的关联规则包括“牛黄、朱砂”“酸枣仁、茯苓”等,新处

  16. GFExtractor:事件序列上有效挖掘无冗余情节规则的算法%GFExtractor:algorithm of mining non-redundant episode rules effectively in event sequence

    Institute of Scientific and Technical Information of China (English)

    袁红娟

    2013-01-01

    事件序列上挖掘情节规则,旨在发现情节之间的因果关系。基于非重叠的最小发生的支持度定义及深度优先搜索策略,提出在事件序列上挖掘无冗余情节规则的GFExtractor算法。利用非生成子情节的剪枝策略,淘汰非生成子情节;利用向前、向后扩展检查,淘汰非闭情节;最终在情节生成子集Gen与频繁闭情节集FCE之间产生无冗余的情节规则。实验结果证实了算法在事件序列上挖掘无冗余情节规则的有效性。%Mining episode rules in event sequence aims to discover the causal relationship between the episodes. To mine non-redundant episode rules in event sequence, the algorithm of GFExtractor is proposed in this paper, based on the support defi-nition of non-overlapping minimal occurrences and the depth-first search strategy. GFExtractor uses the pruning technology to eliminate non-generator episodes, and uses the forward and backward extension check to eliminate non-closed episodes. Non-redundant episode rules are generated between a superset of Gen and FCE. Experimental results confirm the validity of algo-rithm in mining non-redundant episode rules in event sequence.

  17. Application of a New Probabilistic Model for Mining Implicit Associated Cancer Genes from OMIM and Medline

    Directory of Open Access Journals (Sweden)

    Shanfeng Zhu

    2006-01-01

    Full Text Available An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As bio-medical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man. We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html

  18. Spatiotemporal Data Mining: Issues, Tasks And Applications

    Directory of Open Access Journals (Sweden)

    K.Venkateswara Rao

    2012-03-01

    Full Text Available Spatiotemporal data usually contain the states of an object, an event or a position in space over a period of time. Vast amount of spatiotemporal data can be found in several application fields such as trafficmanagement, environment monitoring, and weather forecast. These datasets might be collected at different locations at various points of time in different formats. It poses many challenges in representing, processing, analysis and mining of such datasets due to complex structure of spatiotemporal objects and the relationships among them in both spatial and temporal dimensions. In this paper, the issues and challenges related to spatiotemporal data representation, analysis, mining and visualization of knowledge are presented. Various kinds of data mining tasks such as association rules, classification clustering for discovering knowledge from spatiotemporal datasets are examined and reviewed. System functional requirements for such kind of knowledge discovery and database structure are discussed. Finally applications of spatiotemporal data mining are presented.

  19. Habituation: a non-associative learning rule design for spiking neurons and an autonomous mobile robots implementation.

    Science.gov (United States)

    Cyr, André; Boukadoum, Mounir

    2013-03-01

    This paper presents a novel bio-inspired habituation function for robots under control by an artificial spiking neural network. This non-associative learning rule is modelled at the synaptic level and validated through robotic behaviours in reaction to different stimuli patterns in a dynamical virtual 3D world. Habituation is minimally represented to show an attenuated response after exposure to and perception of persistent external stimuli. Based on current neurosciences research, the originality of this rule includes modulated response to variable frequencies of the captured stimuli. Filtering out repetitive data from the natural habituation mechanism has been demonstrated to be a key factor in the attention phenomenon, and inserting such a rule operating at multiple temporal dimensions of stimuli increases a robot's adaptive behaviours by ignoring broader contextual irrelevant information.

  20. 基于增量队列的在全置信度下的关联挖掘%Association Mining on Massive Text under Full Confidence Based on Incremental Queue

    Institute of Scientific and Technical Information of China (English)

    刘炜

    2015-01-01

    关联挖掘是一种重要的数据分析方法, 提出了一种在全置信度下的增量队列关联挖掘算法模型, 在传统的 FP-Growth 及 PF-Tree 算法的关联挖掘中使用了全置信度规则, 算法的适应性得到提升, 由此提出FP4W-Growth 算法并运用到对文本数据的关联计算以及对增量式的数据进行关联性挖掘的研究中, 通过实验验证了此算法及模型的可行性与优化性, 为在庞大的文本数据中发现隐藏着的先前未知的并潜在有用的新信息和新模式, 提供了科学的决策方法.%Association mining is an important data analysis method, this article proposes an incremental queue association mining algorithm model under full confidence,using the full confidence rules in the traditional FP-Growth and PF-Tree association mining algorithm can improve the algorithm adaptability. Thus, the article proposes FP4W-Growth algorithm, and applies this algotithm to the association calculation of text data and association mining of incremental data. Then this paper conducted verification experiment. The experimental results show the feasibility of this algorithm and model. The article provides a scientific approach to finding hidden but useful information and patterns from large amount of text data.

  1. Association between Benzodiazepine Use and Dementia: Data Mining of Different Medical Databases.

    Science.gov (United States)

    Takada, Mitsutaka; Fujimoto, Mai; Hosomi, Kouichi

    2016-01-01

    Purpose: Some studies have suggested that the use of benzodiazepines in the elderly is associated with an increased risk of dementia. However, this association might be due to confounding by indication and reverse causation. To examine the association between benzodiazepine anxiolytic drug use and the risk of dementia, we conducted data mining of a spontaneous reporting database and a large organized database of prescriptions. Methods: Data from the US Food and Drug Administration Adverse Event Reporting System (FAERS) from the first quarter of 2004 through the end of 2013 and data from the Canada Vigilance Adverse Reaction Online Database from the first quarter of 1965 through the end of 2013 were used for the analyses. The reporting odds ratio (ROR) and information component (IC) were calculated. In addition, prescription sequence symmetry analysis (PSSA) was performed to identify the risk of dementia after using benzodiazepine anxiolytic drugs over the period of January 2006 to May 2014. Results: Benzodiazepine use was found to be associated with dementia in analyses using the FAERS database (ROR: 1.63, 95% CI: 1.61-1.64; IC: 0.66, 95% CI: 0.65-0.67) and the Canada Vigilance Adverse Reaction Online Database (ROR: 1.88, 95% CI: 1.83-1.94; IC: 0.85, 95% CI: 0.80-0.89). ROR and IC values increased with the duration of action of benzodiazepines. In the PSSA, a significant association was found, with adjusted sequence ratios of 1.24 (1.05-1.45), 1.20 (1.06-1.37), 1.23 (1.11-1.37), 1.34 (1.23-1.47), 1.41 (1.29-1.53), and 1.44 (1.33-1.56) at intervals of 3, 6, 12, 24, 36, and 48 months, respectively. Furthermore, the additional PSSA, in which patients who initiated a new treatment with benzodiazepines and anti-dementia drugs within 12- and 24-month periods were excluded from the analysis, demonstrated significant associations of benzodiazepine use with dementia risk. Conclusion: Multi-methodological approaches using different methods, algorithms, and databases suggest

  2. Association between Benzodiazepine Use and Dementia: Data Mining of Different Medical Databases

    Science.gov (United States)

    Takada, Mitsutaka; Fujimoto, Mai; Hosomi, Kouichi

    2016-01-01

    Purpose: Some studies have suggested that the use of benzodiazepines in the elderly is associated with an increased risk of dementia. However, this association might be due to confounding by indication and reverse causation. To examine the association between benzodiazepine anxiolytic drug use and the risk of dementia, we conducted data mining of a spontaneous reporting database and a large organized database of prescriptions. Methods: Data from the US Food and Drug Administration Adverse Event Reporting System (FAERS) from the first quarter of 2004 through the end of 2013 and data from the Canada Vigilance Adverse Reaction Online Database from the first quarter of 1965 through the end of 2013 were used for the analyses. The reporting odds ratio (ROR) and information component (IC) were calculated. In addition, prescription sequence symmetry analysis (PSSA) was performed to identify the risk of dementia after using benzodiazepine anxiolytic drugs over the period of January 2006 to May 2014. Results: Benzodiazepine use was found to be associated with dementia in analyses using the FAERS database (ROR: 1.63, 95% CI: 1.61-1.64; IC: 0.66, 95% CI: 0.65-0.67) and the Canada Vigilance Adverse Reaction Online Database (ROR: 1.88, 95% CI: 1.83-1.94; IC: 0.85, 95% CI: 0.80-0.89). ROR and IC values increased with the duration of action of benzodiazepines. In the PSSA, a significant association was found, with adjusted sequence ratios of 1.24 (1.05-1.45), 1.20 (1.06-1.37), 1.23 (1.11-1.37), 1.34 (1.23-1.47), 1.41 (1.29-1.53), and 1.44 (1.33-1.56) at intervals of 3, 6, 12, 24, 36, and 48 months, respectively. Furthermore, the additional PSSA, in which patients who initiated a new treatment with benzodiazepines and anti-dementia drugs within 12- and 24-month periods were excluded from the analysis, demonstrated significant associations of benzodiazepine use with dementia risk. Conclusion: Multi-methodological approaches using different methods, algorithms, and databases suggest

  3. The study of personalized information recommendation system based on data mining

    Science.gov (United States)

    Chen, Ke; Ke, Wende; Li, Sansi

    2011-12-01

    For the current Internet information access of contradictions and difficulties, the study on the basis of the data mining technique and recommender system, propose and implement a facing internet personalization information recommendation system based on data mining. The system is divided into offline and online, offline part to complete the from the site server log files access the appropriate online intelligent personalized recommendation service transaction mode, using the association rules mining. The online part, realizes personalized intelligence recommendation service based on the connection rule excavation. Provides the personalization information referral service method based mining association rules, And through the experiment to this system has carried on the test, has confirmed this system's feasibility and the validity.

  4. A Frame Work for Frequent Pattern Mining Using Dynamic Function

    Directory of Open Access Journals (Sweden)

    Sunil Joshi

    2011-05-01

    Full Text Available Discovering frequent objects (item sets, sequential patterns is one of the most vital fields in data mining. It is well understood that it require running time and memory for defining candidates and this is the motivation for developing large number of algorithm. Frequent patterns mining is the paying attention research issue in association rules analysis. Apriori algorithm is a standard algorithm of association rules mining. Plenty of algorithms for mining association rules and their mutations are projected on the foundation of Apriori Algorithm. Most of the earlier studies adopted Apriori-like algorithms which are based on generate-and-test candidates theme and improving algorithm approach and formation but no one give attention to the structure of database. Several modifications on apriori algorithms are focused on algorithm Strategy but no one-algorithm emphasis on least transaction and more attribute representation of database. We presented a new research trend on frequent pattern mining in which generate Transaction pair to lighten current methods from the traditional blockage, providing scalability to massive data sets and improving response time. In order to mine patterns in database with more columns than rows, we proposed a complete framework for the frequent pattern mining. A simple approach is if we generate pair of transaction instead of item id where attributes are much larger then transaction so result is very fast. Newly, different works anticipated a new way to mine patterns in transposed databases where there is a database with thousands of attributes but merely tens of stuff. We suggest a novel dynamic algorithm for frequent pattern mining in which generate transaction pair and for generating frequent pattern we find out by longest common subsequence using dynamic function. Our solutions give result more rapidly. A quantitative investigation of these tradeoffs is conducted through a wide investigational study on artificial and

  5. Explore the medication rules of Chinese medicine of Professor Huang Wenzheng for the treatment of chronic kidney disease based on data mining

    Institute of Scientific and Technical Information of China (English)

    Shao-Ning Dong; Yao-Guang Wang; Bing Xu; Liang Jin; Cui-Han Wang

    2016-01-01

    近年来,慢性肾病由于其发病率高,预后不良等备受关注。由于缺乏有效的根治手段,很多人采用中医治疗。中医治疗慢性肾病已有几千年的历史,本文运用数据挖掘的方法,对名老中医黄文政教授治疗慢性肾脏病的用药规律进行总结。收集131例(1909诊次)慢性肾病患者资料,建立结构化病例数据库,运用复杂网络分析方法,系统的分析了黄文政教授在治疗慢性肾脏病用药规律和特点。同质网络分析得到了慢性肾脏病的用药骨干网络与三级网络。中药频次前5位的依次是丹参、茯苓、黄芩、生地黄、砂仁。%In recent years, chronic kidney disease (CKD) received increased attention due to its high prevalence and adverse effects. Current treatment options for CKD are limited. Many patients seek alternative therapies, such as traditional Chinese medicine (TCM). TCM has been used widely in the treatment CKD for many years. The purpose of this study is to explore the medication rules of Professor Huang Wenzheng's treatment of CKD with Chinese medicine based on data mining. The structured case database was built for 131 patients with CKD treated by Prof. Huang. The complex network analysis was applied to analyze the medication principle of Chinese medicines.The study found the backbone network and three-level network of the Chinese medicines for the treatment of CKD by homogeneous network analysis. The 5 top commonly used herbs were Dan Shen (Radix Salviae Miltiorrhizae) , Fu Ling (Poria) , Huang Qin (Radix Scutellariae), Sheng Di Huang (Radix Rehmanniae Recens) and Sha Ren (Fructus Amomi Villosi), respectively.

  6. Natural radioactivity level of associated bone-coal mining area in Zhejiang Province

    Institute of Scientific and Technical Information of China (English)

    YE Ji-Da; ZHENG Hui-Di; SONG Wei-Li; ZENG Guang-Jian; WANG Sha-Ling; WU Zong-Mei

    2005-01-01

    The geographic distribution, γ-radiation level and specific activity of radionuclides of the bone-coal mines in Zhejiang Province were reported. The weighted average of γ-radiation dose rate of the bone-coal mines is 566 nGy/h for 107 main bone-coal mines. The weighted mean activity of 238U, 226Ra, 232Th and 40K in the samples are 949, 918, 34 and 554 Bq/kg for 171 samples of bone-coal, respectively.

  7. The effect of the depth and groundwater on the formation of sinkholes or ground subsidence associated with abandoned room and pillar lignite mines under static and dynamic conditions

    Science.gov (United States)

    Aydan, Ö.; Ito, T.

    2015-11-01

    It is well known that some sinkholes or subsidence take place from time to time in the areas where abandoned room and pillar type mines exist. The author has been involved with the stability of abandoned mines beneath urbanized residential areas in Tokai region and there is a great concern about the stability of these abandoned mines during large earthquakes as well as in the long term. The 2003 Miyagi Hokubu and 2011 Great East Japan earthquakes caused great damage to abandoned mines and resulted in many collapses. The author presents the effect of the depth and groundwater on the formation of sinkholes or ground subsidence associated with abandoned room and pillar lignite mines under static and dynamic conditions and discusses the implications on the areas above abandoned lignite mines in this paper.

  8. Application of text mining for customer evaluations in commercial banking

    Science.gov (United States)

    Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

    2015-07-01

    Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.

  9. DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Directory of Open Access Journals (Sweden)

    Wu Cathy H

    2005-08-01

    Full Text Available Abstract Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations. Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms. Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO. DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete

  10. A fuzzy hill-climbing algorithm for the development of a compact associative classifier

    Science.gov (United States)

    Mitra, Soumyaroop; Lam, Sarah S.

    2012-02-01

    Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.

  11. Energy analysis of stability on shallow tunnels based on non-associated flow rule and non-linear failure criterion

    Institute of Scientific and Technical Information of China (English)

    张佳华; 王成洋

    2015-01-01

    On the basis of upper bound theorem, non-associated flow rule and non-linear failure criterion were considered together. The modified shear strength parameters of materials were obtained with the help of the tangent method. Employing the virtual power principle and strength reduction technique, the effects of dilatancy of materials, non-linear failure criterion, pore water pressure, surface loads and buried depth, on the stability of shallow tunnel were studied. In order to validate the effectiveness of the proposed approach, the solutions in the present work agree well with the existing results when the non-associated flow rule is reduced to the associated flow rule and the non-linear failure criterion is degenerated to the linear failure criterion. Compared with dilatancy of materials, the non-linear failure criterion exerts greater impact on the stability of shallow tunnels. The safety factor of shallow tunnels decreases and the failure surface expands outward when the dilatancy coefficient decreases. While the increase of nonlinear coefficient, the pore water pressure coefficient, the surface load and the buried depth results in the small safety factor. Therefore, the dilatancy as well as non-linear failure criterion should be taken into account in the design of shallow tunnel supporting structure. The supporting structure must be reinforced promptly to prevent potential mud from gushing or collapse accident in the areas with abundant pore water, large surface load or buried depth.

  12. 基于蚁群规则挖掘算法的多特征遥感数据分类%Study on the multi-feature remote sensing data classification based on ACO rule mining algorithm

    Institute of Scientific and Technical Information of China (English)

    戴芹; 刘建波

    2009-01-01

    蚁群算法作为一种新型的智能优化算法,已经成功应用在许多领域,然而应用蚁群优化算法进行遥感数据处理则是一个新的研究热点.蚁群规则挖掘算法是基于分类规则挖掘进行分类,能够处理多特征的数据.因此,论文将蚁群规则挖掘算法应用到多特征遥感数据分类处理中,并采用北京地区的Landsat TM和Envisat ASAR数据作为实验数据,对选择的遥感数据进行了多特征分类实验.实验结果分别与最大似然分类法、C4.5方法进行对比,分析表明:1)蚁群规则挖掘算法是一种无参数分类的智能方法,具有很好的鲁棒性,2)能够挖掘较简单的分类规则;3)能够充分利用多源遥感数据等.它可以充分利用多特征数据进行土地覆盖分类,从而能够提高分类的效率.%Remote sensing data classification is an important source of land cover map, and remote sensing research focusing on image classification has long attracted the attention of the remote sensing community. For several decades the remote sensing data classification technology has gained a great achievement, but with the more multi-source and multi-di-mensional data, the conventional remote sensing data classification methods based on sta-tistical theory have some weaknesses. For instance, when the remote sensing data does not obey the pre-assumption of normal distribution, the classification result using Maximum Likelihood Classifier (MLC) will deviate from the actual situation, and the classification accuracy will not be satisfied. So in recent years, many artificial intelligence techniques were applied to remote sensing data classification, aiming to reduce the undesired limita-tions of the conventional classification methods.Ant colony algorithm as a novel intelligent optimization algorithm has been used suc-cessfully in many fields, but its application in remote sensing data processing is a new re-search topic. Due to the ant colony rule mining

  13. Mining The Data From Distributed Database Using An Improved Mining Algorithm

    CERN Document Server

    Renjit, J Arokia

    2010-01-01

    Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data sets from different sites incurs huge network communication costs. In this paper, an Improved algorithm based on good performance level for data mining is being proposed. In local sites, it runs the application based on the improved LMatrix algorithm, which is used to calculate local support counts. Local Site also finds a centre site to manage every message exchanged to obtain all globally frequent item sets. It also reduces the time of scan of partition database by using LMatrix which increases the performance of the algorithm. Therefore, the research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs, superior running efficiency, and stronger scalability than direct application of a sequential algorithm in d...

  14. Regional scale selenium loading associated with surface coal mining, Elk Valley, British Columbia, Canada.

    Science.gov (United States)

    Wellen, Christopher C; Shatilla, Nadine J; Carey, Sean K

    2015-11-01

    Selenium (Se) concentrations in surface water downstream of surface mining operations have been reported at levels in excess of water quality guidelines for the protection of wildlife. Previous research in surface mining environments has focused on downstream water quality impacts, yet little is known about the fundamental controls on Se loading. This study investigated the relationship between mining practices, stream flows and Se concentrations using a SPAtially Referenced Regression On Watershed attributes (SPARROW) model. This work is part of a R&D program examining the influence of surface coal mining on hydrological and water quality responses in the Elk Valley, British Columbia, Canada, aimed at informing effective management responses. Results indicate that waste rock volume, a product of mining activity, accounted for roughly 80% of the Se load from the Elk Valley, while background sources accounted for roughly 13%. Wet years were characterized by more than twice the Se load of dry years. A number of variables regarding placement of waste rock within the catchments, length of buried streams, and the construction of rock drains did not significantly influence the Se load. The age of the waste rock, the proportion of waste rock surface reclaimed, and the ratio of waste rock pile side area to top area all varied inversely with the Se load from watersheds containing waste rock. These results suggest operational practices that are likely to reduce the release of Se to surface waters.

  15. 重复开采上覆岩体与地移动规律研究%Research on Overlying Strata and Surface Movement Rule in Repeated Mining

    Institute of Scientific and Technical Information of China (English)

    栾元重; 李静涛; 刘娜; 刘阳; 栾亨宣; 马德鹏

    2012-01-01

    In view of the problems of repeated mining in both the upper and lower No.3 coal seams in one mine under the Weishan Lake. Shandong Province, we measured the ground movement and deformation values of lake area with GPS-RTK technology, monitored the heights of water flowing fractured zone in the mining of both upper and lower No.3 coal seams using the technology of double-head water-stopped machine of upward drill holes under the well, and established the fitted function of surface subsidence and horizontal movement. Moreover, the functional equation of intermediate layer and strata movement boundary in the repeated mining of upper and lower No.3 seams was set up by use of FLAC3D, which reveals the characteristics of surface movement and strata deformation in repeated mining of upper and lower No.3 seams in south Shandong mining area. The study is significant for the safety mining under water, and has a certain reference value for the mining areas with the similar geological and mining conditions.%针对山东某煤矿在微山湖下重复开采3上、3下煤的问题,采用GPS-RTK技术实测了湖区地表移动变形值,运用“双端堵水器”技术采用井下打仰上孔方法,实测了3上、3下煤开采覆岩导水裂缝带发育高度,并建立了地表下沉、水平移动拟合方程.运用FLAC3D数值模拟数据建立了此煤层重复开采中间岩体与岩层移动边界函数方程表达式,揭示该煤层重复开采对地表移动与岩体移动变形特征,对水体下安全开采具重大意义,对地质采矿条件类同的矿区也有参考借鉴价值.

  16. The spatial decision-supporting system combination of RBR & CBR based on artificial neural network and association rules

    Science.gov (United States)

    Tian, Yangge; Bian, Fuling

    2007-06-01

    The technology of artificial intelligence should be imported on the basis of the geographic information system to bring up the spatial decision-supporting system (SDSS). The paper discusses the structure of SDSS, after comparing the characteristics of RBR and CBR, the paper brings up the frame of a spatial decisional system that combines RBR and CBR, which has combined the advantages of them both. And the paper discusses the CBR in agriculture spatial decisions, the application of ANN (Artificial Neural Network) in CBR, and enriching the inference rule base based on association rules, etc. And the paper tests and verifies the design of this system with the examples of the evaluation of the crops' adaptability.

  17. Numerical analysis and geotechnical assessment of mine scale model

    Institute of Scientific and Technical Information of China (English)

    Khanal Manoj; Adhikary Deepak; Balusu Rao

    2012-01-01

    Various numerical methods are available to model,simulate,analyse and interpret the results; however a major task is to select a reliable and intended tool to perform a realistic assessment of any problem.For a model to be a representative of the realistic mining scenario,a verified tool must be chosen to perform an assessment of mine roof support requirement and address the geotechnical risks associated with longwall mining.The dependable tools provide a safe working environment,increased production,efficient management of resources and reduce environmental impacts of mining.Although various methods,for example,analytical,experimental and empirical are being adopted in mining,in recent days numerical tools are becoming popular due to the advancement in computer hardware and numerical methods.Empirical rules based on past experiences do provide a general guide,however due to the heterogeneous nature of mine geology (i.e.,none of the mine sites are identical),numerical simulations of mine site specific conditions would lend better insights into some underlying issues.The paper highlights the use of a continuum mechanics based tool in coal mining with a mine scale model.The continuum modelling can provide close to accurate stress fields and deformation.The paper describes the use of existing mine data to calibrate and validate the model parameters,which then are used to assess geotechnical issues related with installing a new high capacity longwall mine at the mine site.A variety of parameters,for example,chock convergences,caveability of overlying sandstones,abutment and vertical stresses have been estimated.

  18. A Recent Review on XML data mining and FFP

    Directory of Open Access Journals (Sweden)

    Amit Kumar Mishra, Hitesh Gupta

    2013-01-01

    Full Text Available The goal of data mining is to extract or mine" knowledge from large amounts of data. Emerging technologies of semi-structured data have attracted wide attention of networks, e-commerce, information retrieval and databases.XML has become very popular for representing semi structured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. However, the structure of the XML data can be more complex and irregular than that. Association Rule Mining plays a key role in the process of mining data for frequent pattern matching. First Frequent Patterngrowth, for mining the complete set of frequent patterns by pattern fragment growth. First Frequent Pattern-tree based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets and a partition-based, divideand-conquer method is used. This paper shows a complete review of XML data mining using Fast Frequent Pattern mining in various domains.

  19. Applying WebMining on KM system

    Science.gov (United States)

    Shimazu, Keiko; Ozaki, Tomonobu; Furukawa, Koichi

    KM (Knowledge Management) systems have recently been adopted within the realm of enterprise management. On the other hand, data mining technology is widely acknowledged within Information systems' R&D Divisions. Specially, acquisition of meaningful information from Web usage data has become one of the most exciting eras. In this paper, we employ a Web based KM system and propose a framework for applying Web Usage Mining technology to KM data. As it turns out, task duration varies according to different user operations such as referencing a table-of-contents page, down-loading a target file, and writing to a bulletin board. This in turn makes it possible to easily predict the purpose of the user's task. By taking these observations into account, we segmented access log data manually. These results were compared with results abstained by applying the constant interval method. Next, we obtained a segmentation rule of Web access logs by applying a machine-learning algorithm to manually segmented access logs as training data. Then, the newly obtained segmentation rule was compared with other known methods including the time interval method by evaluating their segmentation results in terms of recall and precision rates and it was shown that our rule attained the best results in both measures. Furthermore, the segmented data were fed to an association rule miner and the obtained association rules were utilized to modify the Web structure.

  20. Compass: A hybrid method for clinical and biobank data mining

    DEFF Research Database (Denmark)

    Krysiak-Baltyn, Konrad; Petersen, Thomas Nordahl; Audouze, Karine Marie Laure

    2014-01-01

    We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply...... Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically...... significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we...

  1. Object-oriented High Resolution Image Classification based on Association-rule%基于关联规则的面向对象高分辨率影像分类

    Institute of Scientific and Technical Information of China (English)

    张扬; 周子勇

    2012-01-01

    以北京市昌平区Geoeye-1高分辨率遥感影像为试验数据,研究了关联规则挖掘和面向对象相结合的地物分类方法。首先探讨了关联分类法的原理,再通过图像分割、特征提取、关联规则挖掘、分类器构建一系列过程实现了基于关联规则的面向对象高分辨率影像分类,最终评估分类精度并与K—近邻法进行了对比。结果表明,该方法具有较好精度,能够在一定程度上摆脱地物分类对于专家知识的依赖。%This paper has explored the method of high resolution image classification by combining association rule mining and object-oriented method.Firstly,according to the theory of Classification Based on Association(CBA),and a modified classifier builder was discussed.Secondly,the object-oriented high resolution image classification was achieved by image segmentation,feature extraction,association-rule extracted and classifier building.After that,Class Association Rules(CARs) was mined by the process of CBA-RG.It was proved that these rules correspond with the features of the ground object.According to the order of "confidence → spectrum complexity → support → generation sequence",a modified classifier was built based on these rules.Finally,we evaluated the precision of the classification result and compared it with the result of K-Nearest Neighbors.The experiment shows that the precision is relatively high and can move away from the dependence on the expert knowledge in a certain degree.

  2. 基于数据挖掘的IDS系统数据规则库改进设计%Improvement and Design of IDS System Data Rules Bank Based on Data Mining

    Institute of Scientific and Technical Information of China (English)

    林建伟; 郭彩虹; 许臻

    2013-01-01

    Network attacks is becoming more and more frequent, the existing IDS systems detect is lack of precision, and the defense of the IDS system database has been unable to meet the needs of intrusion prevention, according to these situations. Using C4.5 algorithm of data mining techniques and sequence pattern mining algorithms to data mining of data packets obtained by system, of which the C4.5 algorithm is running for the data characterized by the description of the data system defects and known attack methods, and sequence pattern mining algorithms is running for the system call sequence data, whose goal is to improve the accuracy of the data analysis. The experiments show that these improvements of the IDS system data rules base have greatly improved the accuracy of the intrusion data analysis of system.%针对目前网络攻击越来越频繁,现有的IDS系统检测分析不够精准,IDS系统数据库的防御已经无法满足入侵防御需求的现状.采用数据挖掘技术中的C4.5算法和序列模式挖掘算法,对系统的获取的数据包进行数据挖掘,其中C4.5算法针对的是描述系统缺陷和已知攻击方法的数据,而序列模式挖掘算法针对的是系统调用序列数据,提高数据分析的准确性.实验表明,本文对IDS系统数据规则库的改进,大大提高了系统对入侵数据分析的准确性.

  3. Detection of Independent Associations of Plasma Lipidomic Parameters with Insulin Sensitivity Indices Using Data Mining Methodology

    Science.gov (United States)

    Schuhmann, Kai; Xu, Aimin; Schulte, Klaus-Martin; Simeonovic, Charmaine J.; Schwarz, Peter E. H.; Bornstein, Stefan R.; Shevchenko, Andrej; Graessler, Juergen

    2016-01-01

    Objective Glucolipotoxicity is a major pathophysiological mechanism in the development of insulin resistance and type 2 diabetes mellitus (T2D). We aimed to detect subtle changes in the circulating lipid profile by shotgun lipidomics analyses and to associate them with four different insulin sensitivity indices. Methods The cross-sectional study comprised 90 men with a broad range of insulin sensitivity including normal glucose tolerance (NGT, n = 33), impaired glucose tolerance (IGT, n = 32) and newly detected T2D (n = 25). Prior to oral glucose challenge plasma was obtained and quantitatively analyzed for 198 lipid molecular species from 13 different lipid classes including triacylglycerls (TAGs), phosphatidylcholine plasmalogen/ether (PC O-s), sphingomyelins (SMs), and lysophosphatidylcholines (LPCs). To identify a lipidomic signature of individual insulin sensitivity we applied three data mining approaches, namely least absolute shrinkage and selection operator (LASSO), Support Vector Regression (SVR) and Random Forests (RF) for the following insulin sensitivity indices: homeostasis model of insulin resistance (HOMA-IR), glucose insulin sensitivity index (GSI), insulin sensitivity index (ISI), and disposition index (DI). The LASSO procedure offers a high prediction accuracy and and an easier interpretability than SVR and RF. Results After LASSO selection, the plasma lipidome explained 3% (DI) to maximal 53% (HOMA-IR) variability of the sensitivity indexes. Among the lipid species with the highest positive LASSO regression coefficient were TAG 54:2 (HOMA-IR), PC O- 32:0 (GSI), and SM 40:3:1 (ISI). The highest negative regression coefficient was obtained for LPC 22:5 (HOMA-IR), TAG 51:1 (GSI), and TAG 58:6 (ISI). Conclusion Although a substantial part of lipid molecular species showed a significant correlation with insulin sensitivity indices we were able to identify a limited number of lipid metabolites of particular importance based on the LASSO approach. These

  4. Spatial distribution of environmental risk associated to a uranium abandoned mine (Central Portugal)

    Science.gov (United States)

    Antunes, I. M.; Ribeiro, A. F.

    2012-04-01

    The abandoned uranium mine of Canto do Lagar is located at Arcozelo da Serra, central Portugal. The mine was exploited in an open pit and produced about 12430Kg of uranium oxide (U3O8), between 1987 and 1988. The dominant geological unit is the porphyritic coarse-grained two-mica granite, with biotite>muscovite. The uranium deposit consists of two gaps crushing, parallel to the coarse-grained porphyritic granite, with average direction N30°E, silicified, sericitized and reddish jasperized, with a width of approximately 10 meters. These gaps are accompanied by two thin veins of white quartz, 70°-80° WNW, ferruginous and jasperized with chalcedony, red jasper and opal. These veins are about 6 meters away from each other. They contain secondary U-phosphates phases such as autunite and torbernite. Rejected materials (1000000ton) were deposited on two dumps and a lake was formed in the open pit. To assess the environmental risk of the abandoned uranium mine of Canto do Lagar, were collected and analysed 70 samples on stream sediments, soils and mine tailings materials. The relation between samples composition were tested using the Principal Components Analysis (PCA) (multivariate analysis) and spatial distribution using Kriging Indicator. The spatial distribution of stream sediments shows that the probability of expression for principal component 1 (explaining Y, Zr, Nb, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Hf, Th and U contents), decreases along SE-NW direction. This component is explained by the samples located inside mine influence. The probability of expression for principal component 2 (explaining Be, Na, Al, Si, P, K, Ca, Ti, Mn, Fe, Co, Ni, Cu, As, Rb, Sr, Mo, Cs, Ba, Tl and Bi contents), increases to middle stream line. This component is explained by the samples located outside mine influence. The spatial distribution of soils, shows that the probability of expression for principal component 1 (explaining Mg, P, Ca, Ge, Sr, Y, Zr, La, Ce, Pr

  5. Time Series Rule Discovery: Tough, not Meaningless

    NARCIS (Netherlands)

    Struzik, Z.R.

    2003-01-01

    `Model free' rule discovery from data has recently been subject to considerable criticism, which has cast a shadow over the emerging discipline of time series data mining. However, other than in data mining, rule discovery has long been the subject of research in statistical physics of complex pheno

  6. Quantifying Associations between Environmental Stressors and Demographic Factors

    Science.gov (United States)

    Association rule mining (ARM) [1-3], also known as frequent item set mining [4] or market basket analysis [1], has been widely applied in many different areas, such as business product portfolio planning [5], intrusion detection infrastructure design [6], gene expression analysis...

  7. Associate editors' foreword: entrepreneurship in health education and health promotion: five cardinal rules.

    Science.gov (United States)

    Cottrell, Randall R; Cooper, Hanna

    2009-07-01

    A career in health education or health promotion (HE/HP) can be developed in many ways. In past editions of this department, career development has been discussed in relation to distance (Balonna, 2001), consulting (Bookbinder, 2001), certifications (Hayden, 2005), graduate school (Cottrell & Hayden, 2007), and many other topics. This article looks at a less traditional means of career development-entrepreneurship. Health education is a field ripe with opportunities for consulting and for selling health-related products and services. Entrepreneurship can not only create financial rewards but can also provide high visibility and networking contacts that can advance one's career. This article combines both theory and practical applications to assist readers in developing entrepreneurial activities. The authors are experienced in entrepreneurial development and use that expertise to provide relevant examples and develop a framework using "five cardinal rules" for establishing an entrepreneurial enterprise in HE/HP.

  8. Processing of audiovisual associations in the human brain: dependency on expectations and rule complexity

    Directory of Open Access Journals (Sweden)

    Riikka eLindström

    2012-05-01

    Full Text Available In order to respond to environmental changes appropriately, the human brain must not only be able to detect environmental changes but also to form expectations of forthcoming events. The events in the external environment often have a number of multisensory features such as pitch and form. For integrated percepts of objects and events, crossmodal processing and crossmodally induced expectations of forthcoming events are needed. The aim of the present study was to determine whether the expectations created by visual stimuli can modulate the deviance detection in the auditory modality, as reflected by auditory event-related potentials (ERPs. Additionally, it was studied whether the complexity of the rules linking auditory and visual stimuli together affects this process. The N2 deflection of the ERP was observed in response to violations in the subjects' expectation of a forthcoming tone. Both temporal aspects and cognitive demands during the audiovisual deviance detection task modulated the brain processes involved.

  9. System and Empirical Study on Adverse Drug Reaction Warning Based on Association Rule%基于关联规则的ADR预警系统及实证研究

    Institute of Scientific and Technical Information of China (English)

    冯秀珍; 贺小红; 冯变玲

    2012-01-01

    针对目前我国药品不良反应(ADR)预警的不足,基于数据立方的多维关联规则挖掘方法引入药品不良反应预警领域,提出基于关联规则的ADR预警系统框架,并结合药品不良反应自发呈报系统(SRSs)实际数据进行实证分析.根据支持度和置信度,从药品和用药患者两个维度实现预警,为ADR预测预警问题提供一种新方法,为医生用药提供决策支持.%To deal with the current insufficiency on adverse drug reaction (ADR) warning in our country, the paper firstly introduced the multidimensional association rule mining methods based on data cube into the field of warning of ADR, then proposed the warning system framework of ADR based on association rule, and finally carried on an empirical analysis combining the data from ADR spontaneous reporting systems. Based on the support and confidence, the paper re-alized the warning function from two dimensions of patients and drugs, which provided a new method for the problem of ADR warning. This was significantly meaningful to provide supports for prescription on the illnesses treatments.

  10. 改进的关联规则在文献个性化检索中的应用研究%Application Research on Improved Association Rules in Literature Personalized Searching

    Institute of Scientific and Technical Information of China (English)

    郑羽洁; 章杰鑫

    2011-01-01

    In accordance with the shortcomings of library literature searching system which can not provide personalized searching service for different readers, this paper researches how to carry out personalized searching, brings up an idea of applying association rules in the original searching result personalized sorting by readers level, through data of a certain university library, describes the process of sorting for the mining result, to test and verify the feasibility of application association rules in literature personalized searching.%针对当前高校图书馆文献检索系统不能面向不同读者提供个性化检索服务的弱点,进行文献个性化检索的研究,提出将关联规则运用于对原始检索结果集按照读者层次进行个性化排序的设想,并以某高校图书馆的数据为例,详细描述利用改进的关联规则算法挖掘历史借阅数据,然后利用挖掘结果进行排序的过程,理论和实验验证将关联规则应用在文献个性化检索中的可行性.

  11. In Situ Generated Colloid Transport of Cu and Zn in Reclaimed Mine Soil Profiles Associated with Biosolids Application

    Directory of Open Access Journals (Sweden)

    Jarrod O. Miller

    2011-01-01

    Full Text Available Areas reclaimed for agricultural uses following coal mining often receive biosolids applications to increase organic matter and fertility. Transport of heavy metals within these soils may be enhanced by the additional presence of biosolids colloids. Intact monoliths from reclaimed and undisturbed soils in Virginia and Kentucky were leached to observe Cu and Zn mobility with and without biosolids application. Transport of Cu and Zn was observed in both solution and colloid associated phases in reclaimed and undisturbed forest soils, where the presence of unweathered spoil material and biosolids amendments contributed to higher metal release in solution fractions. Up to 81% of mobile Cu was associated with the colloid fraction, particularly when gibbsite was present, while only up to 18% of mobile Zn was associated with the colloid fraction. The colloid bound Cu was exchangeable by ammonium acetate, suggesting that it will release into groundwater resources.

  12. Effective Rule Based Classifier using Multivariate Filter and Genetic Miner for Mammographic Image Classification

    Directory of Open Access Journals (Sweden)

    Nirase Fathima Abubacker

    2015-06-01

    Full Text Available Mammography is an important examination in the early detection of breast abnormalities. Automatic classifications of mammogram images into normal, benign or malignant would help the radiologists in diagnosis of breast cancer cases. This study investigates the effectiveness of using rule-based classifiers with multivariate filter and genetic miner to classify mammogram images. The method discovers association rules with the classes as the consequence and classifies the images based on the Highest Average Confidence of the association rules (HAvC matched for the classes. In the association rules mining stage, Correlation based Feature Selection (CFS plays an enormous significance to reduce the complexity of image mining process is used in this study as a feature selection method and a modified genetic association rule mining technique, the GARM, is used to discover the rules. The method is evaluated on mammogram image dataset with 240 images taken from DDSM. The performance of the method is compared against other classifiers such as SMO; Naïve Bayes and J48. The performance of the proposed method is promising with 88% accuracy and outperforms other classifiers in the context of mammogram image classification.

  13. [Comment on applications of data mining used in studies of heritage of experiences of national medical masters].

    Science.gov (United States)

    Wu, Jia-Rui; Tang, Shi-Huan; Guo, Wei-Xian; Zhang, Xiao-Meng; Zhang, Bing

    2014-02-01

    Data mining, as known as knowledge discovery in databases, is a non-trivial process of revealing the implied, previously unknown and potentially useful information from the massive data. In recently years, the applications of data mining in the field of pharmaceutical research of traditional Chinese medicine have widespread. Especially in the field of the heritage of experiences of na-tional medical masters, data mining plays an important role. In this study, we would expound of the use of methods of data mining in the heritage of experiences of national medical masters, and analyze their advantages and disadvantages, such as association rules, Bayesian networks, neural networks, and decision trees.

  14. Ventilation problems of diesel self-propelled mining machines with special regard to shuttle services

    Energy Technology Data Exchange (ETDEWEB)

    Benke, L.; Buocz, Z.

    1985-01-01

    The basic problems associated with the ventilation of diesel-powered self-propelled equipment used in underground mines are summarized. The composition of exhaust gases and its dependence on various conditions are investigated. After an overview of ventilation regulation rules, the principles of mine air volume determination are discussed. Next, the ventilation problems of diesel vehicles used for shuttle services are considered. The main results are presented in the form of diagrams for the determination of air volume and air flow.

  15. Market Basket Analysis for a Supermarket based on Frequent Itemset Mining

    Directory of Open Access Journals (Sweden)

    Loraine Charlet Annie M.C.

    2012-09-01

    Full Text Available Market basket analysis is an important component of analytical system in retail organizations to determine the placement of goods, designing sales promotions for different segments of customers to improve customer satisfaction and hence the profit of the supermarket. These issues for a leading supermarket are addressed here using frequent itemset mining. The frequent itemsets are mined from the market basket database using the efficient K-Apriori algorithm and then the association rules are generated.

  16. Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Directory of Open Access Journals (Sweden)

    Lam Nina SN

    2006-02-01

    Full Text Available Abstract Background The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. Results Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. Conclusion Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention.

  17. Towards a New Approach for Mining Frequent Itemsets on Data Stream

    Directory of Open Access Journals (Sweden)

    Shailendra Jain

    2012-12-01

    Full Text Available From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Nowadays, many organizations generate and utilize vast data streams (Huang, 2002. Employing data mining schemes on such massive data streams can unearth real-time trends and patterns which can be utilized for dynamic and timely decisions. Mining in such a high speed, enormous data streams significantly differs from traditional data mining in several ways. Firstly, the response time of the mining algorithm should be as small as possible due to the online nature of the data and limited resources dedicated to mining activities (Charikar, 2004. Second, the underlying data is highly volatile and subject to change over period of time (Chang, 2003. Moreover, since there is no time for preprocessing the data in order to remove noise, the streamed data can have noise inherent in it. Due to all aforementioned problems, data stream mining is receiving increasing attention and current research is now focused on the efficient resolution to the problem cited above. Although, the field of data stream mining is being heavily investigated, there is still a lack of a holistic and generic approach for mining association rules from data streams. Thus, this research attempts to fill this gap by integrating ideas from previous work in data stream mining. This investigation focuses on the degree of effectiveness of using a probabilistic approach of sampling in the data stream together with an incremental approach to maintenance of frequent

  18. Association Relationship between Functions and Flavors of TCM Based on Classification Association Rules%基于分类关联规则的中药功效与药味关联关系研究

    Institute of Scientific and Technical Information of China (English)

    杨雪梅; 赖新梅; 陈梅妹; 林端宜

    2013-01-01

    目的 为中药药性五味理论的全面总结奠定大样本数据挖掘的基础,并为中药新资源的开发及临床用药提供五味药性判定的理论线索.方法 选择《中华本草》所载8980味中药的五味数据及关联的药物功效索引数据作为数据集,采用IBM SPSS Clementine 14.1数据挖掘平台,选择Apriori模型挖掘分类关联规则,设置规则前件最小支持度阈值为0.5%,最小置信度阈值为80%.结果 共挖掘出涉及甘、辛、苦3种药味的分类关联规则21条.具有生津止渴、补气、补阴、润肺、补肺、生津止渴&清热、润肺止咳、补气&补血、润燥、除烦、补脾益气功能的中药其药味多为“甘”;具有发散风寒、解表、温中、散寒止痛功能的中药其药味多为“辛”;具有消肿止痛&清热解毒、清热泻火、清热燥湿、化瘀止血&清热解毒、杀虫&清热解毒、止痛&清热解毒功能的中药其药味多为“苦”.结论 本研究挖掘出的功效与甘、辛、苦三味之间的关联规律完全基于大量中药数据,后续还需通过各种试验多方验证以构建完整的中药药性理论体系.%Objective To lay the foundation of the large sample data mining for a comprehensive summary concerning five flavors theory of TCM, and provide theory clues on determination of five flavors for the new resource development of TCM and clinical use of Chinese medicine. Methods Five flavors data of 8 980 Chinese medicines from Chinese Herbal Medicine (CHM) and associated function index data were chose as data sets. IBM SPSS Clementine 14.1 data mining platform and Apriori model were adopted to mining classification-association rules, setting the minimum support threshold of rule antecedent and the minimum confidence threshold as 0.5% and 80%. Results Twenty-one classification-association rules involved in sweet, pungent and bitter flavors were found. It was discovered that the TCM with functions of "producing

  19. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    DEFF Research Database (Denmark)

    Chen, X; Lee, G; Maher, B S

    2011-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed...... bioinformatic prioritization for all the markers with P-values ¿0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE...... in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11¿380 cases and 15¿021 controls), we...

  20. Modeling of Erosion on Jelateng Watershed Using USLE Method, Associated with an Illegal Mining Activities (PETI)

    Science.gov (United States)

    Ananda, I. N.; Aswari, F. V.; Narmaningrum, D. A.; Nugraha, A. S. A.; Asidiqi, M. A. A.; Setiawan, Y.

    2016-11-01

    The Indonesian archipelago has abundant mineral resources, and it causes many mining activities. Mineral resource is natural based resource which cannot be renewable. An abandon mining pit makes a hole in land surface and it increase the erosion severity level on the rainy season. This erosion would brought sediment to the sea, and it causes damage the ecosystem of the coastal. Erosion modeling in Jelateng watershed performed temporally using remote sensing image data, which consist of LANDSAT-5 (1995), and LANDsAt-8 (2015), and supported by field data as well. The parameters for modeling of erosion through rasterization process as input from erosion USLE models to IDRISI software. The results shown that in 1995, the majority of the area has a low level of erosion. The low erosion rate is less than 183.67 tons/hectare/year and high erosion rate is 408.34 up to 633 tons/hectare/year. Compare with in 2015, erosion models shown that erosion is most prevalent on the upstream area of Jelateng watershed, with low erosion rate is less than 432.2 tons/hectare/year and high erosion rate is 615.64 up to 1448.31 tons/hectare/year.

  1. DEVELOPMENT OF PLASTICITY MODEL USING NON ASSOCIATED FLOW RULE FOR HCP MATERIALS INCLUDING ZIRCONIUM FOR NUCLEAR APPLICATIONS

    Energy Technology Data Exchange (ETDEWEB)

    Michael V. Glazoff; Jeong-Whan Yoon

    2013-08-01

    In this report (prepared in collaboration with Prof. Jeong Whan Yoon, Deakin University, Melbourne, Australia) a research effort was made to develop a non associated flow rule for zirconium. Since Zr is a hexagonally close packed (hcp) material, it is impossible to describe its plastic response under arbitrary loading conditions with any associated flow rule (e.g. von Mises). As a result of strong tension compression asymmetry of the yield stress and anisotropy, zirconium displays plastic behavior that requires a more sophisticated approach. Consequently, a new general asymmetric yield function has been developed which accommodates mathematically the four directional anisotropies along 0 degrees, 45 degrees, 90 degrees, and biaxial, under tension and compression. Stress anisotropy has been completely decoupled from the r value by using non associated flow plasticity, where yield function and plastic potential have been treated separately to take care of stress and r value directionalities, respectively. This theoretical development has been verified using Zr alloys at room temperature as an example as these materials have very strong SD (Strength Differential) effect. The proposed yield function reasonably well models the evolution of yield surfaces for a zirconium clock rolled plate during in plane and through thickness compression. It has been found that this function can predict both tension and compression asymmetry mathematically without any numerical tolerance and shows the significant improvement compared to any reported functions. Finally, in the end of the report, a program of further research is outlined aimed at constructing tensorial relationships for the temperature and fluence dependent creep surfaces for Zr, Zircaloy 2, and Zircaloy 4.

  2. Comprehensive screening for reg1α gene rules out association with tropical calcific pancreatitis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    AIM: To investigate the allelic and haplotypic association of regla gene with tropical calcific pancreatitis (TCP). Since TCP is known to have a variable genetic basis, we investigated the interaction between mutations in the susceptibility genes, SPINK1 and CTSB with reg1α polymorphisms.METHODS: we analyzed the polymorphisms in the regla gene by sequencing the gene including its promoter region in 195 TCP patients and 150 ethnically matched controls, compared their allele and haplotype frequencies, and their association with the pathogenesis and pancreaticolithiasis in TCP and fibro-calculous pancreatic diabetes.RESULTS: We found 8 reported and 2 novel polymo-rphisms including an insertion-deletion polymorphism in the promoter region of reg1α. None of the 5' UTR variants altered any known transcription factor binding sites, neither did any show a statistically significant association with TCP. No association with any reg1α variants was observed on dichotomization of patients based on their N34S SPINK1 or L26V CTSB status.CONCLUSION: Polymorphisms in reg1α gene, including the regulatory variants singly or in combination with the known mutations in SPINK1 and/or CTSB genes, are not associated with tropical calcific pancreatitis.

  3. A new algorithm to extract hidden rules of gastric cancer data based on ontology.

    Science.gov (United States)

    Mahmoodi, Seyed Abbas; Mirzaie, Kamal; Mahmoudi, Seyed Mostafa

    2016-01-01

    Cancer is the leading cause of death in economically developed countries and the second leading cause of death in developing countries. Gastric cancers are among the most devastating and incurable forms of cancer and their treatment may be excessively complex and costly. Data mining, a technology that is used to produce analytically useful information, has been employed successfully with medical data. Although the use of traditional data mining techniques such as association rules helps to extract knowledge from large data sets, sometimes the results obtained from a data set are so large that it is a major problem. In fact, one of the disadvantages of this technique is a lot of nonsense and redundant rules due to the lack of attention to the concept and meaning of items or the samples. This paper presents a new method to discover association rules using ontology to solve the expressed problems. This paper reports a data mining based on ontology on a medical database containing clinical data on patients referring to the Imam Reza Hospital at Tabriz. The data set used in this paper is gathered from 490 random visitors to the Imam Reza Hospital at Tabriz, who had been suspicions of having gastric cancer. The proposed data mining algorithm based on ontology makes rules more intuitive, appealing and understandable, eliminates waste and useless rules, and as a minor result, significantly reduces Apriori algorithm running time. The experimental results confirm the efficiency and advantages of this algorithm.

  4. A Novel Approach for Web Page Set Mining

    CERN Document Server

    Geeta, R B; Totad, Shasikumar G; D, Prasad Reddy P V G

    2011-01-01

    The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL), the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine) provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional databa...

  5. Rules of chemokine receptor association with T cell polarization in vivo

    OpenAIRE

    2001-01-01

    Current concepts of chemokine receptor (CKR) association with Th1 and Th2 cell polarization and effector function have largely ignored the diverse nature of effector and memory T cells in vivo. Here, we systematically investigated the association of 11 CKRs, singly or in combination, with CD4 T cell polarization. We show that Th1, Th2, Th0, and nonpolarized T cells in blood and tissue can express any of the CKRs studied but that each CKR defines a characteristic pool of polarized and nonpolar...

  6. Sequential Extraction Results and Mineralogy of Mine Waste and Stream Sediments Associated With Metal Mines in Vermont, Maine, and New Zealand

    Science.gov (United States)

    Piatak, N.M.; Seal, R.R.; Sanzolone, R.F.; Lamothe, P.J.; Brown, Z.A.; Adams, M.

    2007-01-01

    We report results from sequential extraction experiments and the quantitative mineralogy for samples of stream sediments and mine wastes collected from metal mines. Samples were from the Elizabeth, Ely Copper, and Pike Hill Copper mines in Vermont, the Callahan Mine in Maine, and the Martha Mine in New Zealand. The extraction technique targeted the following operationally defined fractions and solid-phase forms: (1) soluble, adsorbed, and exchangeable fractions; (2) carbonates; (3) organic material; (4) amorphous iron- and aluminum-hydroxides and crystalline manganese-oxides; (5) crystalline iron-oxides; (6) sulfides and selenides; and (7) residual material. For most elements, the sum of an element from all extractions steps correlated well with the original unleached concentration. Also, the quantitative mineralogy of the original material compared to that of the residues from two extraction steps gave insight into the effectiveness of reagents at dissolving targeted phases. The data are presented here with minimal interpretation or discussion and further analyses and interpretation will be presented elsewhere.

  7. Research of the methods of association rules in image database%图像数据库关联规则的挖掘方法研究

    Institute of Scientific and Technical Information of China (English)

    王远敏

    2012-01-01

      In multimedia applications,the use of the image database is increasingly widespread. In order to use image database more effectively,many data mining techniques is used in image database.This paper uses FP_tree techniques in data mining to mine the rule in image database and constructs an new image database system.%  在多媒体应用中,图像数据库的使用日趋广泛,为了更有效地使用图像数据库,许多数据挖掘技术被用于图像数据库中。本文使用数据挖掘中的关联规则方法来进一步提高图像数据库的性能,基于此构建了一个图像数据库系统,在这个系统中使用了FP增长算法挖掘图像数据的关联规则。

  8. REx: An Efficient Rule Generator

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    This paper describes an efficient algorithm REx for generating symbolic rules from artificial neural network (ANN). Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifiable by domain experts, and can be modified, extended and passed on as modular knowledge. REx exploits the first order information in the data and finds shortest sufficient conditions for a rule of a class that can differentiate it from patterns of other classes. It can generate concise and perfect rules in the sense that the error rate of the rules is not worse than the inconsistency rate found in the original data. An important feature of rule extraction algorithm, REx, is its recursive nature. They are concise, comprehensible, order insensitive and do not involve any weight values. Extensive experimental studies on several benchmark classification problems, s...

  9. Analysis on Composition Rules of Chinese Patent Drugs with Tonifying Spleen Based on Association Rules and Clustering Algorithm%基于关联规则与熵聚类的健脾类中成药组方规律研究

    Institute of Scientific and Technical Information of China (English)

    金燕萍; 吴嘉瑞; 张冰; 杨冰; 周唯; 张晓朦

    2015-01-01

    目的:探讨常用健脾类中成药组方规律。方法:收录《新编国家中成药》中的健脾类中成药处方,采用关联规则Apriori 算法和复杂系统熵聚类等方法,确定处方中药物的使用频次及药物之间的关联规则等。结果:高频次药物包括茯苓、白术、甘草、党参、陈皮等;高频次药物组合包括“白术、茯苓”“甘草、茯苓”“甘草、白术”等;置信度较高的关联规则包括“陈皮->白扁豆”“陈皮->半夏”等。结论:处方用药中除常见的健脾类中药外,尚包括具有健脾作用的部分理气药、消食药及其他类药物。%Objective:To investigate composition rules of Chinese patent drugs with tonifying spleen.Methods:The prescriptions of Chinese patent drugs with tonifying spleen in “The New National Medicine”were collected to build a database.The methods of association rules with apriori algorithm and complex system entropy cluster were used to achieve the frequency of medicines and as-sociation rules between drugs.Results:The data-mining results indicated that in the prescriptions of Chinese patent drugs with ton-ifying spleen,the most frequency used drugs were Poria Cocos Wolff,Rhizoma Atractylodis Macrocephalae,Radix Glycyrrhizae, Radix Codonopsitis,Pericarpium Citri Reticulatae.The most common drug combinations were”Rhizoma Atractylodis Macrocepha-lae,Poria Cocos Wolff”,”Radix Glycyrrhizae,Poria Cocos Wolff”,”Radix Glycyrrhizae,Rhizoma Atractylodis Macrocephalae”. The drugs with a high degree confidence coefficient of association rules include “Pericarpium Citri Reticulatae->Semen Dolicho-ris”,“Pericarpium Citri Reticulatae->Pinellia ternata”.Conclusion:There are not only the common drugs tonifying spleen,but also drugs regulating the flow of vital energy,removing obstruction toit,and helping digest.

  10. 78 FR 39531 - Mine Rescue Teams

    Science.gov (United States)

    2013-07-01

    ... Rescue Teams; CFR Correction #0;#0;Federal Register / Vol. 78 , No. 126 / Monday, July 1, 2013 / Rules... Rescue Teams CFR Correction In Title 30 of the Code of Federal Regulations, Parts 1 to 199, revised as of... Miner Act Requirements for Underground Coal Mine Operators and Mine Rescue Teams Type of mine...

  11. Data mining in radiology.

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-04-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  12. Data mining in radiology

    Directory of Open Access Journals (Sweden)

    Amit T Kharat

    2014-01-01

    Full Text Available Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  13. Mining Method of Implied Association Page%一种隐式关联页面的挖掘方法

    Institute of Scientific and Technical Information of China (English)

    徐昊; 谢文阁

    2014-01-01

    点击流数据是分析互联网用户心理倾向的关键,用户感兴趣的页组关联就隐藏于WEB日志之中。网站页面间的隐式关联可以通过分析点击流数据实现。给出了一种挖掘关联页面的方法。关联页面发现算法采用了一种类似于Apriori的模型。算法克服了前人关联页面算法的一些缺点,能够更好地适应复杂的互联网环境。%The Clickstream data is the key to the analysis of Internet users psychological tendency, and the association of the user interesting pages group is hidden in the WEB log. Implied association between Web pages can be achieved by analyzing the click stream data. This paper puts forward a method of mining association page. Associated page searching algorithm adopted a model similarly to the Apriori. This algorithm overcomes some shortcomings of predecessors’ association page algorithm can better adapt to the complex Internet environment.

  14. Rules of chemokine receptor association with T cell polarization in vivo

    Science.gov (United States)

    Kim, Chang H.; Rott, Lusijah; Kunkel, Eric J.; Genovese, Mark C.; Andrew, David P.; Wu, Lijun; Butcher, Eugene C.

    2001-01-01

    Current concepts of chemokine receptor (CKR) association with Th1 and Th2 cell polarization and effector function have largely ignored the diverse nature of effector and memory T cells in vivo. Here, we systematically investigated the association of 11 CKRs, singly or in combination, with CD4 T cell polarization. We show that Th1, Th2, Th0, and nonpolarized T cells in blood and tissue can express any of the CKRs studied but that each CKR defines a characteristic pool of polarized and nonpolarized CD4 T cells. Certain combinations of CKRs define populations that are markedly enriched in major subsets of Th1 versus Th2 cells. For example, although Th0, Th1, and Th2 cells are each found among blood CD4 T cells coordinately expressing CXCR3 and CCR4, Th1 but not Th2 cells can be CXCR3+CCR4–, and Th2 but only rare Th1 cells are CCR4+CXCR3–. Contrary to recent reports, although CCR7– cells contain a higher frequency of polarized CD4 T cells, most Th1 and Th2 effector cells are CCR7+ and thus may be capable of lymphoid organ homing. Interestingly, Th1-associated CKRs show little or no preference for Th1 cells except when they are coexpressed with CXCR3. We conclude that the combinatorial expression of CKRs, which allow tissue- and subset-dependent targeting of effector cells during chemotactic navigation, defines physiologically significant subsets of polarized and nonpolarized T cells. PMID:11696578

  15. New Procedure to Derive the Performance Indices Associated with Reservoir Operation Rule

    Institute of Scientific and Technical Information of China (English)

    WANG Jin-wen; ZHANG Yong-chuan; ZHANG You-quan

    2002-01-01

    Stochastic dynamic programming (SDP) is extensively used in the optimization for long-term reservoir operations. Generally, both of the steady state optimal policy and its associated performance indices (PIs) for multipurpose reservoir are of prime importance. To derive the PIs there are two typical ways: simulation and probability formula. Among the disadvantages, one is that these approaches require the pre-specified operation policy. IHuminated by the convergence of objective function in SDP, a new approach, which has the advantage that its use can be concomitant with the solving of SDP, is proposed to determine the desired PIs. In the case study, its efficiency is also practically tested.

  16. Image Mining Using Texture and Shape Feature

    Directory of Open Access Journals (Sweden)

    Prof.Rupali Sawant

    2010-12-01

    Full Text Available Discovering knowledge from data stored in typical alphanumeric databases, such as relational databases, has been the focal point of most of the work in database mining. However, with advances in secondary and tertiary storage capacity, coupled with a relatively low storage cost, more and more non standard data (in the form of images is being accumulated. This vast collection of image data can also be mined to discover new and valuable knowledge. During theprocess of image mining, the concepts in different hierarchiesand their relationships are extracted from different hierarchies and granularities, and association rule mining and concept clustering are consequently implemented. The generalization and specialization of concepts are realized in different hierarchies, lower layer concepts can be upgraded to upper layer concepts, and upper layer concepts guide the extraction of lower layer concepts. It is a process from image data to image information, from image information to imageknowledge, from lower layer concepts to upper layer concept lattice and cloud model theory is proposed. The methods of image mining from image texture and shape features are introduced here, which include the following basic steps: firstly pre-process images secondly use cloud model to extract concepts, lastly use concept lattice to extracta series of image knowledge.

  17. Mine soils associated with open-cast coal mining in Spain: a review; Suelos mineros asociados a la mineria de carbon a cielo abierto en Espana: una revision

    Energy Technology Data Exchange (ETDEWEB)

    Arranz-Gonzalez, J. C.

    2011-07-01

    The different situations that may be found after the closure of coal mines range from the simple abandonment of pits and spoil tips to areas where reclamation work has led to the creation of artificial soils on a reconstituted surface composed of layers of rock and soil or both types of material. Soils of this type are known as mine soils, amongst which those generated by coal mining have been studied most extensively, both to assess their potential for reclamation and to learn more about their pedogenetic evolution. We present here a review of some of the more important works devoted to this subject. We have found evidence to show that in Spain, just as in other countries, the physical and chemical properties of these anthropogenic soils are changing rapidly and so the mine-soil profiles described can be considered as belonging to very young soils still undergoing incipient but rapid development. We have also found that an analysis of information obtained from the soil parameters of surface samples and its interpretation is of great practical use in restoration processes. Nevertheless, the sampling and description of soil profiles has proved to be of much greater interest, allowing us to reach a clearer understanding of the internal processes and properties that are unique to these types of anthropogenic soil. (Author) 64 refs.

  18. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    Science.gov (United States)

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  19. Identifying users of traditional and Internet-based resources for meal ideas: An association rule learning approach.

    Science.gov (United States)

    Doub, Allison E; Small, Meg L; Levin, Aron; LeVangie, Kristie; Brick, Timothy R

    2016-08-01

    Increasing home cooking while decreasing the consumption of food prepared away from home is a commonly recommended weight management strategy, however research on where individuals obtain ideas about meals to cook at home is limited. This study examined the characteristics of individuals who reported using traditional and Internet-based resources for meal ideas. 583 participants who were ≥50% responsible for household meal planning were recruited to approximate the 2014 United States Census distribution on sex, age, race/ethnicity, and household income. Participants reported demographic characteristics, home cooking frequency, and their use of 4 traditional resources for meal ideas (e.g., cookbooks), and 7 Internet-based resources for meal ideas (e.g., Pinterest) in an online survey. Independent samples t-tests compared home cooking frequency by resource use. Association rule learning identified those demographic characteristics that were significantly associated with resource use. Family and friends (71%), food community websites (45%), and cookbooks (41%) were the most common resources reported. Cookbook users reported preparing more meals at home per week (M = 9.65, SD = 5.28) compared to non-cookbook users (M = 8.11, SD = 4.93; t = -3.55, p Resource use was generally higher among parents and varied systematically with demographic characteristics. Findings suggest that home cooking interventions may benefit by modifying resources used by their target population.

  20. HST observations rule out the association between Cir X-1 and SNR G321.9-0.3

    CERN Document Server

    Mignani, R P; Caraveo, P A; Mirabel, I F

    2002-01-01

    Cir X-1 is one of the most intriguing galactic X-ray sources. It is a ~16.6 days variable X/radio source, a type-I X-ray burster and a QPO emitter. In spite of an uncertain optical counterpart classification, all these properties identify the source as an LMXB. The morphology of the surrounding radio nebula has suggested an association with the nearby (~25 arcmin) SNR G321.9-0.3, implying that Cir X-1 is a runaway binary originated from the supernova explosion 10^5 years ago. To investigate this hypothesis, we carried out a proper motion measurement of the Cir X-1 m ~19 optical counterpart using a set of HST/WFC and WFPC2 observations taken ~8.6 years apart. We obtained a 3 sigma upper limit of ~5 mas/yr on the source proper motion. Since the runaway hypothesis would have implied a proper motion due North ranging between 15 and 75 mas/yr, depending on the actual age of the SNR, our result definitively rules out the association between Cir X-1 and SNR G321.9-0.3.

  1. 78 FR 48591 - Refuge Alternatives for Underground Coal Mines

    Science.gov (United States)

    2013-08-08

    ... Refuge Alternatives for Underground Coal Mines; Proposed Rules #0;#0;Federal Register / Vol. 78 , No. 153... 30 CFR Part 75 RIN 1219-AB84 Refuge Alternatives for Underground Coal Mines AGENCY: Mine Safety and... alternatives in underground coal mines. The U.S. Court of Appeals for the District of Columbia Circuit...

  2. 78 FR 48593 - Refuge Alternatives for Underground Coal Mines

    Science.gov (United States)

    2013-08-08

    ... Coal Mines AGENCY: Mine Safety and Health Administration, Labor. ACTION: Request for information... the existing rule during underground coal mine emergencies. The Agency continues to reiterate that in the event of an underground coal mine emergency, a miner should seek escape as the first line...

  3. Ectomycorrhizal fungal communities associated with Masson pine (Pinus massoniana Lamb.) in Pb-Zn mine sites of central south China.

    Science.gov (United States)

    Huang, Jian; Nara, Kazuhide; Lian, Chunlan; Zong, Kun; Peng, Kejian; Xue, Shengguo; Shen, Zhenguo

    2012-11-01

    To advance our understanding of ectomycorrhizal fungal communities in mining areas, the diversity and composition of ectomycorrhizal fungi associated with Masson pine (Pinus massoniana Lamb.) and soil chemistry were investigated in Taolin lead-zinc (Pb-Zn) mine tailings (TLT), two fragmented forest patches in a Huayuan Pb-Zn mineland (HY1 and HY2), and a non-polluted forest in Taolin in central south China. Ectomycorrhizal fungal species were identified by morphotyping and sequence analyses of the internally transcribed spacer regions of ribosomal DNA. The two study sites in the Huayuan mineland (HY1 and HY2) were significantly different in soil Pb, Zn, and cadmium (Cd) concentrations, but no significant difference was observed in ectomycorrhizal colonization, ectomycorrhizal fungal richness, diversity, or rank-abundance. In addition, the similarity of ectomycorrhizal fungal communities between HY1 and HY2 was quite high (Sørensen similarity index = 0.47). Thus, the concentration of heavy metals may not be determining factors in the structure of these communities. In the tailings, however, significantly lower ectomycorrhizal colonization and ectomycorrhizal fungal richness were observed. The amounts of Pb and Zn in the tailing sand were higher than the non-polluted forest but far lower than in HY1. Thus, these heavy metals did not account for the reduced colonization and ectomycorrhizal fungal richness in TLT. The ectomycorrhizal fungal community in TLT was dominated by four pioneer species (Rhizopogon buenoi, Tomentella ellisii, Inocybe curvipes, and Suillus granulatus), which collectively accounted for 93.2 % of root tip colonization. The immature soil conditions in tailing (low N and P, sand texture, and lack of organic matter) may only allow certain pioneer ectomycorrhizal fungal species to colonize the site. When soil samples from four sites were combined, we found that the occurrences of major ectomycorrhizal fungal taxa were not clearly related to the

  4. The Most Advantageous Bangla Keyboard Layout Using Data Mining Technique

    CERN Document Server

    Masum, Abdul Kadar Muhammad; Kamruzzaman, S M

    2010-01-01

    Bangla alphabet has a large number of letters, for this it is complicated to type faster using Bangla keyboard. The proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Association rule of data mining to distribute the Bangla characters in the keyboard is used here. The frequencies of data consisting of monograph, digraph and trigraph are analyzed, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Experimental results on several data show the effectiveness of the proposed approach with better performance. This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort.

  5. Optimal Bangla Keyboard Layout using Data Mining Technique

    CERN Document Server

    Kamruzzaman, S M; Masum, Abdul Kadar Muhammad; Hassan, Md Mahadi

    2010-01-01

    This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the frequencies of data consisting of monograph, digraph and trigraph, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Experimental results on several data show the effectiveness of the proposed approach with better performance.

  6. 基于多维关联规则的本体规则扩展方法%Methods for the Extension Rules of Ontology Based on Multidimensional Association Rules

    Institute of Scientific and Technical Information of China (English)

    董俊; 王锁萍; 熊范纶; 张友华

    2009-01-01

    Currently, the extension and enrichment for ontology have some limitations. Therefore, an approach is presented to extend ontology rules with multi-dimensional association rule technology. The conception ontology is enriched and extended by ontology rules extraction, consistency treatment under guidance of the ontology, rules mapping establishment, and the re-identification and update for conception ontology. The experimental results of tea diseases and pests predicting ontology show that the proposed approach can be easily implemented and has good feasibility and validity.%目前扩充和丰富本体存在很大的局限性.对此,文中提出采用多维关联规则技术扩展本体规则方法.通过对本体规则提取,在本体指导下的一致性处理,规则映射的建立,以及对概念本体的重新识别和更新等技术和方法充实和扩展概念本体.茶病虫害预测本体的实验结果表明该方法易于实现且具有较高的可行性和有效性.

  7. New insight into genes in association with asthma: literature-based mining and network centrality analysis

    Institute of Scientific and Technical Information of China (English)

    LIANG Rui; WANG Lei; WANG Gang

    2013-01-01

    Background Asthma is a heterogeneous disease for which a strong genetic basis has been firmly established.Until now no studies have been undertaken to systemically explore the network of asthma-related genes using an internally developed literature-based discovery approach.This study was to explore asthma-related genes by using literaturebased mining and network centrality analysis.Methods Literature involving asthma-related genes were searched in PubMed from 2001 to 2011.Integration of natural language processing with network centrality analysis was used to identify asthma susceptibility genes and their interaction network.Asthma susceptibility genes were classified into three functional groups by gene ontology (GO) analysis and the key genes were confirmed by establishing asthma-related networks and pathways.Results Three hundred and twenty-six genes related with asthma such as IGHE (IgE),interleukin (IL)-4,5,6,10,13,17A,and tumor necrosis factor (TNF)-alpha were identified.GO analysis indicated some biological processes (developmental processes,signal transduction,death,etc.),cellular components (non-structural extracellular,plasma membrane and extracellular matrix),and molecular functions (signal transduction activity) that were involved in asthma.Furthermore,22 asthma-related pathways such as the Toll-like receptor signaling pathway,hematopoietic cell lineage,JAK-STAT signaling pathway,chemokine signaling pathway,and cytokine-cytokine receptor interaction,and 17 hub genes,such as JAK3,CCR1-3,CCR5-7,CCR8,were found.Conclusions Our study provides a remarkably detailed and comprehensive picture of asthma susceptibility genes and their interacting network.Further identification of these genes and molecular pathways may play a prominent role in establishing rational therapeutic approaches for asthma.

  8. Decoding Metal Associations in an Arid Urban Environment with Active and Legacy Mining: the Case of Copiapó, Chile

    Science.gov (United States)

    Pasten, P.; Moya, P.; Coquery, M.; Bonilla, C. A.; Vega, A.; Carkovic, A.; Calcagni, M.

    2015-12-01

    The urban and periurban area of Copiapó in the arid Atacama desert has more than 30 abandoned mine tailings, one active copper smelter, and 150,000 inhabitants. Fast development of the mining industry during the 19th century and unplanned growth has led to public concern about the presence of metals in soils and street dust. Recent floods and mud currents in the Copiapó watershed have introduced new solid material in about 40% of the urban area. We conducted a geochemical screening before and after the disaster in March 2015. We found concentrations as high as 1000 mg/kg of copper and 180 mg/kg of arsenic in urban soils. Since effective control measures require connecting sites of metal enrichment with the possible sources, we have performed a statistical analysis of metal association and complemented it with other analyses like x-ray diffraction. Cluster analyses of elemental compositions suggest that mud and tailing have different origins from the rest of the matrices, while soils and street dust have a similar one. Some clusters have a mix of matrices that suggest anthropogenic enrichment of some areas of Copiapó. Our initial results indicate that a correlation between observed enrichment and the copper smelter can be hypothesized for Cu, Pb, and Zn. Further spatial, statistical, and chemical analyses are needed to further confirm such findings, complemented with a thorough analysis of the baseline values that could be considered representative of the area. Future work include Principal Component Analysis (PCA) and Positive matrix factorization (PMF) to test the link contaminant sources and metal occurrence, while scanning electron microscopy can be used to identify the presence of smelter-related particles. The information generated by this research will be a necessary input for defining urban planning strategies and land use guidelines, defining health risk assessment studies, and for future evaluation of intervention priorities. Acknowledgements: Proyecto

  9. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

    Science.gov (United States)

    Cao, Hui; Markatou, Marianthi; Melton, Genevieve B; Chiang, Michael F; Hripcsak, George

    2005-01-01

    This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.

  10. Methods and costs of thin-seam mining. Final report, 25 September 1977-24 January 1979. [Thin seam in association with a thick seam

    Energy Technology Data Exchange (ETDEWEB)

    Finch, T.E.; Fidler, E.L.

    1981-02-01

    This report defines the state of the art (circa 1978) in removing thin coal seams associated with vastly thicker seams found in the surface coal mines of the western United States. New techniques are evaluated and an innovative method and machine is proposed. Western states resource recovery regulations are addressed and representative mining operations are examined. Thin seam recovery is investigated through its effect on (1) overburden removal, (2) conventional seam extraction methods, and (3) innovative techniques. Equations and graphs are used to accommodate the variable stratigraphic positions in the mining sequence on which thin seams occur. Industrial concern and agency regulations provided the impetus for this study of total resource recovery. The results are a compendium of thin seam removal methods and costs. The work explains how the mining industry recovers thin coal seams in western surface mines where extremely thick seams naturally hold the most attention. It explains what new developments imply and where to look for new improvements and their probable adaptability.

  11. Two non-synonymous markers in PTPN21, identified by genome-wide association study data-mining and replication, are associated with schizophrenia.

    LENUS (Irish Health Repository)

    Chen, Jingchun

    2011-09-01

    We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r(2)>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10 × 10(-3) and rs2274736, P=1.21 × 10(-3)). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86-0.97, P=5.45 × 10(-3) and rs2401751, OR=0.92, 95% CI: 0.86-0.97, P=5.29 × 10(-3)). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR=1.08, 95% CI: 1.02-1.14, P=6.43 × 10(-3)). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736\\/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.

  12. 75 FR 20918 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines

    Science.gov (United States)

    2010-04-22

    ... From the Federal Register Online via the Government Publishing Office ] DEPARTMENT OF LABOR Mine Safety and Health Administration 30 CFR Parts 18 and 75 RIN 1219-AB34 High-Voltage Continuous Mining Machine Standard for Underground Coal Mines Correction In rule document 2010-7309 beginning on page...

  13. Exploring the potential of data mining techniques for the analysis of accident patterns

    DEFF Research Database (Denmark)

    Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

    2010-01-01

    Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...

  14. A comprehensive review on privacy preserving data mining.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Razzaque, Mohammad Abdur

    2015-01-01

    Preservation of privacy in data mining has emerged as an absolute prerequisite for exchanging confidential information in terms of data analysis, validation, and publishing. Ever-escalating internet phishing posed severe threat on widespread propagation of sensitive information over the web. Conversely, the dubious feelings and contentions mediated unwillingness of various information providers towards the reliability protection of data from disclosure often results utter rejection in data sharing or incorrect information sharing. This article provides a panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories. The fundamental notions of the existing privacy preserving data mining methods, their merits, and shortcomings are presented. The current privacy preserving data mining techniques are classified based on distortion, association rule, hide association rule, taxonomy, clustering, associative classification, outsourced data mining, distributed, and k-anonymity, where their notable advantages and disadvantages are emphasized. This careful scrutiny reveals the past development, present research challenges, future trends, the gaps and weaknesses. Further significant enhancements for more robust privacy protection and preservation are affirmed to be mandatory.

  15. [Apply association rules to analysis adverse drug reactions of shuxuening injection based on spontaneous reporting system data].

    Science.gov (United States)

    Yang, Wei; Xie, Yan-Ming; Xiang, Yong-Yang

    2014-09-01

    This research based on the analysis of spontaneous reporting system (SRS) data which the 9 601 case reports of Shuxuening injection adverse drug reactions (ADR) in national adverse drug reaction monitoring center during 2005-2012. Apply to the association rules to analysis of the relationship between Shuxuening injection's ADR and the characteristics of ADR reports were. We found that ADR commonly combination were "nausea + breath + chills + vomiting", "nausea + chills + vomiting + palpitations", and their confidence level were 100%. The ADR and the case reports information commonly combination were "itching, and glucose and sodium chloride Injection, and generally ADR report, and normal dosage", "palpitation, and glucose and sodium chloride injection, and normal dosage, and new report", "chills, and generally ADR report, and normal dosage, and 0.9% sodium chloride injection", and their confidence level were 100% too. The results showed that patients using Shuxuening injection occurred most of ADRs were systemic damage, skin and its accessories damage, digestive system damage, etc. And most of cases were generally and new reports, and patients with normal dosage. The ADR's occurred had little related with solvent. It is showed that the Shuxuening injection occurred of ADR mainly related to drug composition. So Shuxuening injection used in clinical need to closely observation, and focus on the ADR reaction, and to do a good job of drug risk management.

  16. Human exposure and risk assessment associated with mercury contamination in artisanal gold mining areas in the Brazilian Amazon.

    Science.gov (United States)

    Castilhos, Zuleica; Rodrigues-Filho, Saulo; Cesar, Ricardo; Rodrigues, Ana Paula; Villas-Bôas, Roberto; de Jesus, Iracina; Lima, Marcelo; Faial, Kleber; Miranda, Antônio; Brabo, Edilson; Beinhoff, Christian; Santos, Elisabeth

    2015-08-01

    Mercury (Hg) contamination is an issue of concern in the Amazon region due to potential health effects associated with Hg exposure in artisanal gold mining areas. The study presents a human health risk assessment associated with Hg vapor inhalation and MeHg-contaminated fish ingestion, as well as Hg determination in urine, blood, and hair, of human populations (about 325 miners and 321 non-miners) from two gold mining areas in the Brazilian Amazon (São Chico and Creporizinho, Pará State). In São Chico and Creporizinho, 73 fish specimens of 13 freshwater species, and 161 specimens of 11 species, were collected for total Hg determination, respectively. The hazard quotient (HQ) is a risk indicator which defines the ratio of the exposure level and the toxicological reference dose and was applied to determine the threat of MeHg exposure. The mean Hg concentrations in fish from São Chico and Creporizinho were 0.83 ± 0.43 and 0.36 ± 0.33 μg/g, respectively. More than 60 and 22 % of fish collected in São Chico and Creporizinho, respectively, were above the Hg limit (0.5 μg/g) recommended by WHO for human consumption. For all sampling sites, HQ resulted from 1.5 to 28.5, except for the reference area. In Creporizinho, the values of HQ are close to 2 for most sites, whereas in São Chico, there is a hot spot of MeHg contamination in fish (A2-São Chico Reservoir) with the highest risk level (HQ = 28) associated with its human consumption. Mean Hg concentrations in urine, blood, and hair samples indicated that the miners group (in São Chico: urine = 17.37 μg/L; blood = 27.74 μg/L; hair = 4.50 μg/g and in Creporizinho: urine = 13.75 μg/L; blood = 25.23 μg/L; hair: 4.58 μg/g) was more exposed to mercury compared to non-miners (in São Chico: urine = 5.73 μg/L; blood = 16.50 μg/L; hair = 3.16 μg/g and in Creporizinho: urine = 3.91 μg/L; blood = 21.04 μg/L, hair = 1.88 μg/g). These high Hg levels (found

  17. Mining Long, Sharable Patterns in Trajectories of Moving Objects

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2009-01-01

    the generation of the exponential number of sub-routes of long routes. Considering alternative modelling options for trajectories, leads to the development of two effective variants of the method. SQL-based implementations are described, and extensive experiments on both real life- and large-scale synthetic data......The efficient analysis of spatio-temporal data, generated by moving objects, is an essential requirement for intelligent location-based services. Spatio-temporal rules can be found by constructing spatio-temporal baskets, from which traditional association rule mining methods can discover spatio......-temporal rules. When the items in the baskets are spatio-temporal identifiers and are derived from trajectories of moving objects, the discovered rules represent frequently travelled routes. For some applications, e.g., an intelligent ridesharing application, these frequent routes are only interesting...

  18. Mining Long, Sharable Patterns in Trajectories of Moving Objects

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2006-01-01

    The efficient analysis of spatio–temporal data, generated by moving objects, is an es- sential requirement for intelligent locationbased services. Spatio-temporal rules can be found by constructing spatio–temporal baskets, from which traditional association rule mining methods can discover spatio......–temporal rules. When the items in the baskets are spatio–temporal identifiers and are derived from trajectories of moving objects, the discovered rules represent frequently travelled routes. For some applications, e.g., an intelligent ridesharing application, these frequent routes are only interesting...... the generation of the exponential number of subroutes of long routes. A SQL–based implementation is described, and experiments on real life data show the effectiveness of the method....

  19. High resolution microgravity investigations for the detection and characterisation of subsidence associated with abandoned, coal, chalk and salt mines

    Energy Technology Data Exchange (ETDEWEB)

    Styles, P.; Toon, S.; Branston, M.; England, R. [Keele Univ., Applied And Environmental Geophysics Group, School of Physical and Geographical Sciences (United Kingdom); Thomas, E.; Mcgrath, R. [Geotechnology, Neath (United Kingdom)

    2005-07-01

    The closure and decay of industrial activity involving mining has scarred the landscape of urban areas and geo-hazards posed by subsurface cavities are ubiquitous throughout Europe. Features of concern consist of natural solution cavities (e.g. swallow holes and sinkholes in limestone gypsum and chalk) and man-made cavities (mine workings, shafts) in a great variety of post mining environments, including coal, salt, gypsum, anhydrite, tin and chalk. These problems restrict land utilisation, hinder regeneration, pose a threat to life, seriously damage property and services and blight property values. This paper outlines the application of microgravity techniques to characterise abandoned mining hazard in case studies from Coal, Chalk and Salt Mining environments in the UK. (authors)

  20. Towards a database for genotype-phenotype association research: mining data from encyclopaedia

    NARCIS (Netherlands)

    V.S. Pajić; G.M. Pavlović-Lažetić; M.V. Beljanski; B.W. Brandt; M.B. Pajić

    2013-01-01

    To associate phenotypic characteristics of an organism to molecules encoded by its genome, there is a need for well-structured genotype and phenotype data. We use a novel method for extracting data on phenotype and genotype characteristics of microorganisms from text. As a resource, we use an encycl

  1. Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

    DEFF Research Database (Denmark)

    Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

    2014-01-01

    occurring phytochemical-disease pairs and we identified 20,654 phytochemicals from 16,102 plants associated to 1,592 human disease phenotypes. We selected colon cancer as a case study and analyzed our results in three directions; i) one stop legacy knowledge-shop for the effect of food on disease, ii...

  2. Longwall mining

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1995-03-14

    As part of EIA`s program to provide information on coal, this report, Longwall-Mining, describes longwall mining and compares it with other underground mining methods. Using data from EIA and private sector surveys, the report describes major changes in the geologic, technological, and operating characteristics of longwall mining over the past decade. Most important, the report shows how these changes led to dramatic improvements in longwall mining productivity. For readers interested in the history of longwall mining and greater detail on recent developments affecting longwall mining, the report includes a bibliography.

  3. Management of mining-related damages in abandoned underground coal mine areas using GIS

    Energy Technology Data Exchange (ETDEWEB)

    Lee, U.J.; Kim, J.A.; Kim, S.S. [Coal Industry Promotion Board, Seoul (Korea, Republic of); Kim, W.K.; Yoon, S.H.; Choi, J.K. [Ssangyong Information and Communication Corp., Seoul (Korea, Republic of)

    2005-07-01

    The mining-related damages such as ground subsidence, acid mine drainage (AMD), and deforestation in the abandoned underground coal mine areas become an object of public concern. Therefore, the system to manage the mining-related damages is needed for the effective drive of rehabilitation activities. The management system for Abandoned Underground Coal Mine using GIS includes the database about mining record and information associated with the mining-related damages and application programs to support mine damage prevention business. Also, this system would support decision-making policy for rehabilitation and provide basic geological data for regional construction works in abandoned underground coal mine areas. (authors)

  4. DESTAF: A database of text-mined associations for reproductive toxins potentially affecting human fertility

    KAUST Repository

    Dawe, Adam Sean

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10. 500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database.DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. © 2011 Elsevier Inc.

  5. An evaluation of biotic integrity associated with coal mine reclamation in the Dry Creek drainage basin, Tennessee

    Energy Technology Data Exchange (ETDEWEB)

    Brookens, A.M.; DeAngelo, P.J.; Stearns, M.W. [Skelly and Loy, Inc., Hagerstown, MD (United States)

    2001-07-01

    Sequatchie Valley Coal Corporation has mined bituminous coal reserves and conducted reclamation in the Dry Creek drainage basin on the Cumberland Plateau of Tennessee over the last twenty years. The Dry Creek basin has historically been affected by discharges from numerous adjacent abandoned mine lands. During operations benthic macroinvertebrate communities within these drainage basins have been monitored to evaluate probable hydrologic consequences of proposed mining and reclamation activities. Baseline monitoring prior to active mining and reclamation activities determined that portions of these drainage basins were already heavily impaired by acid rock drainage from abandoned mine lands. These reference sections provided a means for establishing best attainable conditions for biotic integrity. The utilization of passive treatment systems has been undertaken during the reclamation process to mitigate the effects of abandoned mine drainage. Biological monitoring since 1994 has illustrated the effectiveness of passive treatment methodologies, however, the reestablishment of biotic integrity within the receiving drainage basin has not been observed. Macroinvertebrate community integrity continues to be compromised by water quality impairment, and extensive physical habitat impairment from metal hydride precipitation and sedimentation from abandoned mine lands elsewhere in the drainage basin. As mandated by NPDES permit conditions for the reclamation of Sequatchie Valley Coal Corporation operations, evaluations of biotic integrity within the Dry Creek basin utilizing macroinvertebrate communities will continue. 21 refs., 4 tabs.

  6. Mechanisms of rule acquisition and rule following in inductive reasoning.

    Science.gov (United States)

    Crescentini, Cristiano; Seyed-Allaei, Shima; De Pisapia, Nicola; Jovicich, Jorge; Amati, Daniele; Shallice, Tim

    2011-05-25

    Despite the recent interest in the neuroanatomy of inductive reasoning processes, the regional specificity within prefrontal cortex (PFC) for the different mechanisms involved in induction tasks remains to be determined. In this study, we used fMRI to investigate the contribution of PFC regions to rule acquisition (rule search and rule discovery) and rule following. Twenty-six healthy young adult participants were presented with a series of images of cards, each consisting of a set of circles numbered in sequence with one colored blue. Participants had to predict the position of the blue circle on the next card. The rules that had to be acquired pertained to the relationship among succeeding stimuli. Responses given by subjects were categorized in a series of phases either tapping rule acquisition (responses given up to and including rule discovery) or rule following (correct responses after rule acquisition). Mid-dorsolateral PFC (mid-DLPFC) was active during rule search and remained active until successful rule acquisition. By contrast, rule following was associated with activation in temporal, motor, and medial/anterior prefrontal cortex. Moreover, frontopolar cortex (FPC) was active throughout the rule acquisition and rule following phases before a rule became familiar. We attributed activation in mid-DLPFC to hypothesis generation and in FPC to integration of multiple separate inferences. The present study provides evidence that brain activation during inductive reasoning involves a complex network of frontal processes and that different subregions respond during rule acquisition and rule following phases.

  7. Evaluating the role of vegetation on the transport of contaminants associated with a mine tailing using the Phyto-DSS

    Energy Technology Data Exchange (ETDEWEB)

    Cano-Resendiz, Omar [Departamento de Ingenieria Quimica, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Rosa, Guadalupe de la, E-mail: delarosa@quijote.ugto.mx [Departamento de Ingenieria Quimica, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Cruz-Jimenez, Gustavo [Departamento de Farmacia, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Gardea-Torresdey, Jorge L. [Chemistry Department and Environmental Science and Engineering, Ph.D. Program, The University of Texas at El Paso, 500 W. University Ave., 79968 El Paso, TX (United States); Robinson, Brett H. [Agriculture and Life Sciences, Lincoln University, P.O. Box 84 Lincoln, Canterbury 7646 (New Zealand)

    2011-05-15

    We identified contaminants associated with the Cata mine tailing depot located in the outskirts of the city of Guanajuato, Mexico. We also investigated strategies for their phytomanagement. Silver and antimony were present at 39 and 31 mg kg{sup -1}, respectively, some twofold higher than the Dutch Intervention Values. Total and extractable boron (B) occurred at concentrations of 301 and 6.3 mg L{sup -1}, respectively. Concentrations of B in soil solution above 1.9 mg L{sup -1} have been shown to be toxic to plants. Plant growth may also be inhibited by the low concentrations of extractable plant nutrients. Analysis of the aerial portions of Aloe vera (L. Burm.f.) revealed that this plant accumulates negligible concentrations of the identified contaminants. Calculations using a whole system model (Phyto-DSS) showed that establishing a crop of A. vera would have little effect on the drainage or leaching from the site. However, this plant would reduce wind and water erosion and potentially produce valuable cosmetic products. In contrast, crops of poplar, a species that is tolerant to high soil B concentrations, would mitigate leaching from this site. Alternate rows of trees could be periodically harvested and be used for timber or bioenergy.

  8. Evaluating the role of vegetation on the transport of contaminants associated with a mine tailing using the Phyto-DSS.

    Science.gov (United States)

    Cano-Reséndiz, Omar; de la Rosa, Guadalupe; Cruz-Jiménez, Gustavo; Gardea-Torresdey, Jorge L; Robinson, Brett H

    2011-05-15

    We identified contaminants associated with the Cata mine tailing depot located in the outskirts of the city of Guanajuato, Mexico. We also investigated strategies for their phytomanagement. Silver and antimony were present at 39 and 31 mg kg(-1), respectively, some twofold higher than the Dutch Intervention Values. Total and extractable boron (B) occurred at concentrations of 301 and 6.3 mg L(-1), respectively. Concentrations of B in soil solution above 1.9 mg L(-1) have been shown to be toxic to plants. Plant growth may also be inhibited by the low concentrations of extractable plant nutrients. Analysis of the aerial portions of Aloe vera (L. Burm.f.) revealed that this plant accumulates negligible concentrations of the identified contaminants. Calculations using a whole system model (Phyto-DSS) showed that establishing a crop of A. vera would have little effect on the drainage or leaching from the site. However, this plant would reduce wind and water erosion and potentially produce valuable cosmetic products. In contrast, crops of poplar, a species that is tolerant to high soil B concentrations, would mitigate leaching from this site. Alternate rows of trees could be periodically harvested and be used for timber or bioenergy.

  9. The Association between Noise, Cortisol and Heart Rate in a Small-Scale Gold Mining Community—A Pilot Study

    Directory of Open Access Journals (Sweden)

    Allyson Green

    2015-08-01

    Full Text Available We performed a cross-sectional pilot study on salivary cortisol, heart rate, and personal noise exposures in a small-scale gold mining village in northeastern Ghana in 2013. Cortisol level changes between morning and evening among participants showed a relatively low decline in cortisol through the day (−1.44 ± 4.27 nmol/L, n = 18, a pattern consistent with chronic stress. A multiple linear regression, adjusting for age, sex, smoking status, and time between samples indicated a significant increase of 0.25 nmol/L cortisol from afternoon to evening per 1 dBA increase in equivalent continuous noise exposure (Leq over that period (95% CI: 0.08–0.42, Adj R2 = 0.502, n = 17. A mixed effect linear regression model adjusting for age and sex indicated a significant increase of 0.29 heart beats per minute (BPM for every 1 dB increase in Leq. Using standard deviations (SDs as measures of variation, and adjusting for age and sex over the sampling period, we found that a 1 dBA increase in noise variation over time (Leq SD was associated with a 0.5 BPM increase in heart rate SD (95% CI: 0.04–−0.9, Adj. R2 = 0.229, n = 16. Noise levels were consistently high, with 24-hour average Leq exposures ranging from 56.9 to 92.0 dBA, with a mean daily Leq of 82.2 ± 7.3 dBA (mean monitoring duration 22.1 ± 1.9 hours, n = 22. Ninety-five percent of participants had 24-hour average Leq noise levels over the 70 dBA World health Organization (WHO guideline level for prevention of hearing loss. These findings suggest that small-scale mining communities may face multiple, potentially additive health risks that are not yet well documented, including hearing loss and cardiovascular effects of stress and noise.

  10. Graph Based New Approach for Frequent Pattern Mining

    Directory of Open Access Journals (Sweden)

    Anurag Choubey

    2012-03-01

    Full Text Available Association rule mining is a function of data mining research domain and frequent pattern mining is anessential part of it. Most of the previous studies on mining frequent patterns based on an Apriori approach, which required more number of database scans and operations for counting pattern supports in the database. Since the size of each set of transaction may be massive that it makes difficult to perform traditional data mining tasks. This research intends to propose a graph structure that captures only those itemsets that needs to define a sufficiently immense dataset into a submatrix representing important weights and does not give any chance to outliers. We have devised a strategy that covers significant facts of data by drilling down the large data into a succinct form of an Adjacency Matrix at different stages of mining process. The graph structure is so designed that it can be easily maintained and the trade off in compressing the large data values is reduced. Experimental results show the effectiveness of our graphbased approach.

  11. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  12. Novel LanT associated lantibiotic clusters identified by genome database mining.

    Directory of Open Access Journals (Sweden)

    Mangal Singh

    Full Text Available BACKGROUND: Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. METHODOLOGY/FINDINGS: Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. CONCLUSION: This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and

  13. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia.

    LENUS (Irish Health Repository)

    Chen, X

    2011-11-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611-rs10043986, r(2)=0.008; rs10043986-rs4704591, r(2)=0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR)=1.11, 95% confidence interval (CI)=1.04-1.18, P=8.2 × 10(-4) and rs4704591, OR=1.07, 95% CI=1.03-1.11, P=3.0 × 10(-4)). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR=1.11, 95% CI=1.03-1.17, P=0.0026 and rs4704591, OR=1.07, 95% CI=1.02-1.11, P=0.0015). Furthermore, haplotype conditioned analyses indicated that the association

  14. Investigation of migratory bird mortality associated with exposure to Soda Ash Mine tailings water in southwestern Wyoming

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Soda ash is a pulverized mineral, commonly referred to as “trona”, and harvested from underground deposits in southwestern Wyoming. Four companies own 5 mining...

  15. Priority pollutants and associated constituents in untreated and treated discharges from coal mining or processing facilities in Pennsylvania, USA

    Science.gov (United States)

    Cravotta, III, Charles A.; Keith B.C. Brady,

    2015-01-01

    Clean sampling and analysis procedures were used to quantify more than 70 inorganic constituents, including 35 potentially toxic or hazardous constituents, organic carbon, and other characteristics of untreated (influent) and treated (effluent) coal-mine discharges (CMD) at 38 permitted coal-mining or coal-processing facilities in the bituminous coalfield and 4 facilities in the anthracite coalfield of Pennsylvania. Of the 42 facilities sampled during 2011, 26 were surface mines, 11 were underground mines, and 5 were coal refuse disposal operations. Treatment of CMD with caustic soda (NaOH), lime (CaO or Ca(OH)2), flocculent, or limestone was ongoing at 21%, 40%, 6%, and 4% of the facilities, respectively; no chemicals were added at the remaining facilities. All facilities with CMD treatment incorporated structures for active or passive aeration and settling of metal-rich precipitate.

  16. Feedback Analysis of Real Estate Advertising Effectiveness Based on Association Rules%基于关联规则的房地产广告效果反馈分析

    Institute of Scientific and Technical Information of China (English)

    王正友; 刘倩

    2012-01-01

    The idea of using data mining for advertising media selection was brought forward and the association rules were adopted in data analysis of relational database of real estate company to obtain valuable information for advertising performance inspection. A theoretical basis and practical guidance were provided to address the quantitative analysis on present real estate advertising effects and to find cost-effective advertising model.%提出了将数据挖掘技术运用于广告媒体选择的观点,并运用关联规则对房地产公司决策型关系数据库进行广告效果的数据分析,从而获得有价值的信息.为解决现阶段房地产广告效果的定量分析和寻找高性价比的广告模式提供一定的理论基础和现实指导.

  17. Development of a geochemical model to predict leachate water quality associated with coal mining practices / Karl Nicolaus van Zweel

    OpenAIRE

    2015-01-01

    South Africa mines coal to supply in the growing energy demands of the country. A majority of these mines are opencast resulting in back filled pits and above ground disposal facilities. Leachate emanating from these disposal sites are saline and in most cases highly acidic. Currently the standard testing procedure to quantify expected leachate qualities include Acid Base Accounting (ABA), Net-acid Generating test (NAG), static-and kinetic leaching. The aim of this study is to model standa...

  18. QTL detection and elite alleles mining for stigma traits in Oryza sativa by association mapping

    Directory of Open Access Journals (Sweden)

    Xiaojing Dang

    2016-08-01

    Full Text Available Stigma traits are very important for hybrid seed production in Oryza sativa, which is a self-pollinated crop; however, the genetic mechanism controlling the traits is poorly understood. In this study, we investigated the phenotypic data of 227 accessions across two years and assessed their genotypic variation with 249 simple sequence repeat (SSR markers. By combining phenotypic and genotypic data, a genome-wide association (GWA map was generated. Large phenotypic variations in stigma length (STL, stigma brush-shaped part length (SBPL and stigma non-brush-shaped part length (SNBPL were found. Significant positive correlations were identified among stigma traits. In total, 2,072 alleles were detected among 227 accessions, with an average of 8.3 alleles per SSR locus. GWA mapping detected 6 quantitative trait loci (QTLs for the STL, 2 QTLs for the SBPL and 7 QTLs for the SNBPL. Eleven, 5, and 12 elite alleles were found for the STL, SBPL and SNBPL, respectively. Optimal cross designs were predicted for improving the target traits. The detected genetic variation in stigma traits and QTLs provides helpful information for cloning candidate STL genes and breeding rice cultivars with longer STLs in the future.

  19. An improved association-mining research for exploring Chinese herbal property theory: based on data of the Shennong's Classic of Materia Medica.

    Science.gov (United States)

    Jin, Rui; Lin, Zhi-jian; Xue, Chun-miao; Zhang, Bing

    2013-09-01

    Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.

  20. Prevention and Detection of Financial Statement Fraud – An Implementation of Data Mining Framework

    Directory of Open Access Journals (Sweden)

    Rajan Gupta

    2012-08-01

    Full Text Available Every day, news of financial statement fraud is adversely affecting the economy worldwide. Considering the influence of the loss incurred due to fraud, effective measures and methods should be employed for prevention and detection of financial statement fraud. Data mining methods could possibly assist auditors in prevention and detection of fraud because data mining can use past cases of fraud to build models to identify and detect the risk of fraud and can design new techniques for preventing fraudulent financial reporting. In this study we implement a data mining methodology for preventing fraudulent financial reporting at the first place and for detection if fraud has been perpetrated. The association rules generated in this study are going to be of great importance for both researchers and practitioners in preventing fraudulent financial reporting. Decision rules produced in this research complements the prevention mechanism by detecting financial statement fraud.

  1. Study on Halitosis Medication Rules by Traditional Chinese Medicine Based on Data Mining%基于数据挖掘探索中医治疗口秽用药规律研究

    Institute of Scientific and Technical Information of China (English)

    孙红艳; 吕安坤

    2014-01-01

    This study was aimed to explore the characteristics of flavor and nature as well as the meridian entry of Chinese medicinals in the halitosis treatment by traditional Chinese medicine (TCM). Test data mining was used as the basis in the study of TCM literature on halitosis treatment. Herbs from the collected TCM halitosis treatment lit-erature, which met the inclusion criteria, were classified according to their effectiveness. And analysis was made on the nature, flavor and meridian entry of these herbs. Excellwas used in the descriptive analysis in the summariza-tion of halitosis medication rules. The results showed that the frequencies were 796 times which involved 149 herbs. The main nature of halitosis medication was cold, which was followed by warm and even. The total frequency was 97.49%. Sweet was the most frequently used flavor, which was followed by bitter and acrid. These three flavors ac-counted for 90.71%. The meridian entry of medicinal was mainly the stomach meridian, which was followed by the lung, spleen, heart and liver meridian. According to the effectiveness of classification, the main effect was an-tipyretic, which was followed by tonification and dampness-removing. It was concluded that through text data min-ing, the nature of TCM halitosis treatment medication was cold with the treatment principle of heat-clearing and dampness-removing as well as spleen-strengthening and stomach harmonization. These rules provided reliable refer-ence for clinical differentiation and medication.%目的:探讨中医治疗口秽用药的性味及归经规律,为口秽的中医临床辨证用药提供参考。方法:以文本数据为基础,开展口秽的中医文献研究。将符合纳入标准的口秽证治文献中的中药按功效归类,并分别统计其性味、归经,采用Excel进行描述性分析,归纳口秽用药规律。结果:149味中药796次数据统计显示,口秽用药主要为寒性,其次是

  2. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  3. 数据立方梯度挖掘的研究%The Research of the Cube Gradient Mining

    Institute of Scientific and Technical Information of China (English)

    刘玉葆; 冯玉才; 王元珍; 冯剑琳

    2003-01-01

    With the rapid development of data warehouse and OLAP techniques, the researchers begin to pay atten-tion to the data mining in the data cube. Recently, Dr. T. Imielinski etc. firstly presented the problem of the cubegradient mining that is a generalization of association rule in data cube. In this paper, we firstly introduce the relatedconcepts of data cube and condensed cube with an emphasis. Then we introduce some interesting problems related tothe cube gradient mining including: constrained cube gradient mining and the query language of cube gradient. Final-ly, we introduce several issues on the combination of cube gradient and the condensed cube, that is, the cube gradientmining in the materialized data cube and the integration of cube gradient mining and cube browse.

  4. Potential ecological and human health risks of heavy metals in surface soils associated with iron ore mining in Pahang, Malaysia.

    Science.gov (United States)

    Diami, Siti Merryan; Kusin, Faradiella Mohd; Madzin, Zafira

    2016-10-01

    The composition of heavy metals (and metalloid) in surface soils of iron ore mine-impacted areas has been evaluated of their potential ecological and human health risks. The mining areas included seven selected locations in the vicinity of active and abandoned iron ore-mining sites in Pahang, Malaysia. Heavy metals such as Fe, Mn, Cu, Zn, Co, Pb, Cr, Ni, and Cd and metalloid As were present in the mining soils of the studied area, while Cu was found exceeding the soil guideline value at all sampling locations. However, the assessment of the potential ecological risk index (RI) indicated low ecological risk (RI between 44 and 128) with respect to Cd, Pb, Cu, As, Zn, Co, and Ni in the surface soils. Contributions of potential ecological risk [Formula: see text]by metal elements to the total potential ecological RI were evident for Cd, As, Pb, and Cu. Contribution of Cu appears to be consistently greater in the abandoned mining area compared to active iron ore-mining site. For non-carcinogenic risk, no significant potential health risk was found to both children and adults as the hazard indices (HIs) were all below than 1. The lifetime cancer risk (LCR) indicated that As has greater potential carcinogenic risk compared to other metals that may induce carcinogenic effects such as Pb, Cr, and Cd, while the LCR of As for children fell within tolerable range for regulatory purposes. Irrespective of carcinogenic or non-carcinogenic risk, greater potential health risk was found among children (by an order of magnitude higher for most metals) compared to adults. The hazard quotient (HQ) and cancer risk indicated that the pathways for the risk to occur were found to be in the order of ingestion > dermal > inhalation. Overall, findings showed that some metals and metalloid were still present at comparable concentrations even long after cessation of the iron ore-mining activities.

  5. A cross-sectional survey on knowledge and perceptions of health risks associated with arsenic and mercury contamination from artisanal gold mining in Tanzania

    Directory of Open Access Journals (Sweden)

    Charles Elias

    2013-01-01

    Full Text Available Abstract Background An estimated 0.5 to 1.5 million informal miners, of whom 30-50% are women, rely on artisanal mining for their livelihood in Tanzania. Mercury, used in the processing gold ore, and arsenic, which is a constituent of some ores, are common occupational exposures that frequently result in widespread environmental contamination. Frequently, the mining activities are conducted haphazardly without regard for environmental, occupational, or community exposure. The primary objective of this study was to assess community risk knowledge and perception of potential mercury and arsenic toxicity and/or exposure from artisanal gold mining in Rwamagasa in northwestern Tanzania. Methods A cross-sectional survey of respondents in five sub-villages in the Rwamagasa Village located in Geita District in northwestern Tanzania near Lake Victoria was conducted. This area has a history of artisanal gold mining and many of the population continue to work as miners. Using a clustered random selection approach for recruitment, a total of 160 individuals over 18 years of age completed a structured interview. Results The interviews revealed wide variations in knowledge and risk perceptions concerning mercury and arsenic exposure, with 40.6% (n=65 and 89.4% (n=143 not aware of the health effects of mercury and arsenic exposure respectively. Males were significantly more knowledgeable (n=59, 36.9% than females (n=36, 22.5% with regard to mercury (x2=3.99, px2=22.82, p= Conclusions The knowledge of individuals living in Rwamagasa, Tanzania, an area with a history of artisanal gold mining, varied widely with regard to the health hazards of mercury and arsenic. In these communities there was limited awareness of the threats to health associated with exposure to mercury and arsenic. This lack of knowledge, combined with minimal environmental monitoring and controlled waste management practices, highlights the need for health education, surveillance, and policy

  6. An association between dietary habits and traffic accidents in patients with chronic liver disease: A data-mining analysis.

    Science.gov (United States)

    Kawaguchi, Takumi; Suetsugu, Takuro; Ogata, Shyou; Imanaga, Minami; Ishii, Kumiko; Esaki, Nao; Sugimoto, Masako; Otsuyama, Jyuri; Nagamatsu, Ayu; Taniguchi, Eitaro; Itou, Minoru; Oriishi, Tetsuharu; Iwasaki, Shoko; Miura, Hiroko; Torimura, Takuji

    2016-05-01

    The incidence of traffic accidents in patients with chronic liver disease (CLD) is high in the USA. However, the characteristics of patients, including dietary habits, differ between Japan and the USA. The present study investigated the incidence of traffic accidents in CLD patients and the clinical profiles associated with traffic accidents in Japan using a data-mining analysis. A cross-sectional study was performed and 256 subjects [148 CLD patients (CLD group) and 106 patients with other digestive diseases (disease control group)] were enrolled; 2 patients were excluded. The incidence of traffic accidents was compared between the two groups. Independent factors for traffic accidents were analyzed using logistic regression and decision-tree analyses. The incidence of traffic accidents did not differ between the CLD and disease control groups (8.8 vs. 11.3%). The results of the logistic regression analysis showed that yoghurt consumption was the only independent risk factor for traffic accidents (odds ratio, 0.37; 95% confidence interval, 0.16-0.85; P=0.0197). Similarly, the results of the decision-tree analysis showed that yoghurt consumption was the initial divergence variable. In patients who consumed yoghurt habitually, the incidence of traffic accidents was 6.6%, while that in patients who did not consume yoghurt was 16.0%. CLD was not identified as an independent factor in the logistic regression and decision-tree analyses. In conclusion, the difference in the incidence of traffic accidents in Japan between the CLD and disease control groups was insignificant. Furthermore, yoghurt consumption was an independent negative risk factor for traffic accidents in patients with digestive diseases, including CLD.

  7. CONFLICTOS ASOCIADOS A LA GRAN 4 MINERÍA EN ANTIOQUIA. CONFLICTS ASSOCIATED WITH LARGE-SCALE MINING IN ANTIOQUIA.

    Directory of Open Access Journals (Sweden)

    Alfonso Insuasty Rodriguez.

    2013-12-01

    Full Text Available El presente artículo es la primera producción de la investigación: Conflictos por el territorio asociados a la gran Minería en Antioquia - Colombia, en este texto se presentan las conclusiones de la primera fase que da cuenta de la dinámica económica extractiva que viene asumiendo Colombia en los últimos 10 años como ruta estratégica que responde a las necesidades de recursos naturales disponibles y a bajo costo que demanda la actual crisis del capital internacional,decisiones que favorecen intereses foráneos pero involucran y ponen en riesgo las lógicas culturales, las autonomía, la soberanía, la vida, la dignidad y el entorno natural de los habitantes de los territorios de interés para el desarrollo de estos grandes proyectos de extracción de recursos naturales. Abstract. This article is the first production of a piece of research: “Conflicts over the territory associated with large-scale mining in Antioquia, Colombia.” It presents the conclusions of the first phase, which gives an account of the extractive economic dynamics that Colombia has been taking in the last 10 years, as a strategic route that responds to the needs of the availability of low-cost natural resources, demanded by the current crisis of the international capital. These decisions favor foreign interests, which involve and jeopardize the cultural logics, autonomy, sovereignty, life, dignity and the natural environment of the inhabitants of the territories of interest to the development of these large natural resources extraction projects.

  8. Evaluation of Rule Extraction Algorithms

    Directory of Open Access Journals (Sweden)

    Tiruveedula GopiKrishna

    2014-05-01

    Full Text Available For the data mining domain, the lack of explanation facilities seems to be a serious drawback for techniques based on Artificial Neural Networks, or, for that matter, any technique producing opaque models In particular, the ability to generate even limited explanations is absolutely crucial for user acceptance of such systems. Since the purpose of most data mining systems is to support decision making,the need for explanation facilities in these systems is apparent. The task for the data miner is thus to identify the complex but general relationships that are likely to carry over to production data and the explanation facility makes this easier. Also focused the quality of the extracted rules; i.e. how well the required explanation is performed. In this research some important rule extraction algorithms are discussed and identified the algorithmic complexity; i.e. how efficient the underlying rule extraction algorithm is

  9. Discovery of Patterns and evaluation of Clustering Algorithms in SocialNetwork Data (Face book 100 Universities through Data Mining Techniques and Methods

    Directory of Open Access Journals (Sweden)

    Nancy.P

    2012-10-01

    Full Text Available Data mining involves the use of advanced data analysis tools to find out new, suitable patterns and projectthe relationship among the patterns which were not known prior. In data mining, association rule learningis a trendy and familiar method for ascertaining new relations between variables in large databases. Oneof the emerging research areas under Data mining is Social Networks. The objective of this paper focuseson the formulation of association rules using which decisions can be made for future Endeavour. Thisresearch applies Apriori Algorithm which is one of the classical algorithms for deriving association rules.The Algorithm is applied to Face book 100 university dataset which has originated from Adam D’Angelo ofFace book. It contains self-defined characteristics of a person including variables like residence, year, andmajor, second major, gender, school. This paper to begin with the research uses only ten Universities andhighlights the formation of association rules between the attributes or variables and explores theassociation rule between a course and gender, and discovers the influence of gender in studying a course.This paper attempts to cover the main algorithms used for clustering, with a brief and simple description ofeach.The previous research with this dataset has applied only regression models and this is the first time toapply association rules.

  10. Exploring the challenges associated with the greening of supply chains in the South African manganese and phosphate mining industry

    Directory of Open Access Journals (Sweden)

    R.I. David Pooe

    2014-03-01

    Full Text Available As with most mining activities, the mining of manganese and phosphate has serious consequences for the environment. Despite a largely adequate and progressive framework for environmental governance developed since 1994, few mines have integrated systems into their supply chain processes to minimise environmental risks and ensure the achievement of acceptable standards. Indeed, few mines have been able to implement green supply chain management (GrSCM. The purpose of this article was to explore challenges related to the implementation of GrSCM and to provide insight into how GrSCM can be implemented in the South African manganese and phosphate industry. This article reported findings of a qualitative study involving interviews with 12 participants from the manganese and phosphate industry in South Africa. Purposive sampling techniques were used. Emerging from the study were six themes, all of which were identified as key challenges in the implementation of GrSCM in the manganese and phosphate mining industry. From the findings, these challenges include the operationalisation of environmental issues, lack of collaboration and knowledge sharing, proper application of monitoring and control systems,lack of clear policy and legislative direction, the cost of implementing GrSCM practices, and the need for strong leadership and management of change. On the basis of the literature reviewed and empirical findings, conclusions were drawn and policy and management recommendations were accordingly made.

  11. Research Application in Cross Selling Based on Clustering Association Mining of OLAM%基于OLAM的聚类关联挖掘在交叉销售中的研究应用

    Institute of Scientific and Technical Information of China (English)

    王万川; 吴陈; 陆在研

    2012-01-01

    OLAM(On-line Analytical Mining)是当前的热点技术,是融合了联机分析处理(OLAP)和数据挖掘(Data Mining)的一种新的数据挖掘技术.该文主要针对商业中的交叉销售问题,提出一种基于销售多维数据集的聚类关联规则OLAM挖掘模型.利用SQLServer Analysis Services(SSAS)平台的数据挖掘工具,实现了该OLAM聚类关联挖掘模型,利用该模型的挖掘模式获得了对客户交叉销售的推荐方案.%OLAM (On-line Analytical Mining) is a currently hot technology. OLAM is a new mining method which fuses the advantages of OLAP and data mining. According to the cross selling problems, this paper puts forward a OLAM model of clustering association based on a sales cube, and then realizes the mining model by using the data mining tool of SQL Server Analysis Services. The recommendation of cross selling for the customer can obtained by taking advantage of the association pattern which can be got from the OLAM mining model.

  12. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  13. Case studies and analysis of mine shafts incidents in Europe

    OpenAIRE

    Lecomte, Amélie; Salmon, Romuald; Yang, W.; Marshall, Alec; Purvis, M.; Prusek, S.; Bock, Slawomir; Gajda, L.; Dziura, J.; Munos Niharra, Agustin

    2012-01-01

    International audience; Entry to mine workings is normally gained by means of vertical shafts or horizontal or inclined tunnels called adits. Other mining objects such as fan drifts and wheel pits are often associated with mine shafts. Such mining objects may or may not have been filled, wholly or partially, or otherwise sealed to prevent entry when the mine was abandoned. Nowadays mine entries are usually adequately protected on abandonment to prevent accidental ingress. Many earlier mine en...

  14. Leaf-mining by Phyllonorycter blancardella reprograms the host-leaf transcriptome to modulate phytohormones associated with nutrient mobilization and plant defense.

    Science.gov (United States)

    Zhang, Hui; Dugé de Bernonville, Thomas; Body, Mélanie; Glevarec, Gaëlle; Reichelt, Michael; Unsicker, Sybille; Bruneau, Maryline; Renou, Jean-Pierre; Huguet, Elisabeth; Dubreuil, Géraldine; Giron, David

    2016-01-01

    Phytohormones have long been hypothesized to play a key role in the interactions between plant-manipulating organisms and their host-plants such as insect-plant interactions that lead to gall or 'green-islands' induction. However, mechanistic understanding of how phytohormones operate in these plant reconfigurations is lacking due to limited information on the molecular and biochemical phytohormonal modulation following attack by plant-manipulating insects. In an attempt to fill this gap, the present study provides an extensive characterization of how the leaf-miner Phyllonorycter blancardella modulates the major phytohormones and the transcriptional activity of plant cells in leaves of Malus domestica. We show here, that cytokinins strongly accumulate in mined tissues despite a weak expression of plant cytokinin-related genes. Leaf-mining is also associated with enhanced biosynthesis of jasmonic acid precursors but not the active form, a weak alteration of the salicylic acid pathway and a clear inhibition of the abscisic acid pathway. Our study consolidates previous results suggesting that insects may produce and deliver cytokinins to the plant as a strategy to manipulate the physiology of the leaf to create a favorable nutritional environment. We also demonstrate that leaf-mining by P. blancardella leads to a strong reprogramming of the plant phytohormonal balance associated with increased nutrient mobilization, inhibition of leaf senescence and mitigation of plant direct and indirect defense.

  15. Advance Mining of Temporal High Utility Itemset

    Directory of Open Access Journals (Sweden)

    Swati Soni

    2012-04-01

    Full Text Available The stock market domain is a dynamic and unpredictable environment. Traditional techniques, such as fundamental and technical analysis can provide investors with some tools for managing their stocks and predicting their prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the financial markets and help in stock-price forecasting. Therefore, we propose in this paper a portfolio management solution with business intelligence characteristics. We know that the temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. We proposed the novel algorithm for temporal association mining with utility approach. This make us to find the temporal high utility itemset which can generate less candidate itemsets.

  16. 基于关联规则的地铁施工事故分析%Subway Construction Accident Analysis Based on Multidimensional Association Rules

    Institute of Scientific and Technical Information of China (English)

    陈伟珂; 李金玲; 聂凌毅

    2011-01-01

    With the rapid development of Metro construction, the subway construction accidents frequently happen. For the characteristics of the increasingly complex relationship between subway constructions accidents data, this paper puts forward a method of the multi-dimensional association rule's specifically applying in subway construction accidents data. With this "multi-dimensional association rule" tool, the potential relations of subway construction accident are to be found out, showing specifically how to find out this procedure of strong association rules between "person-instrument-environment-management " and the types of accidents. The strong association rule of construction collapse accidents can be figured out through searching the combination of frequent factors that probably lead to subway construction accidents. Furthermore,on the basis of the evaluation of strong association rules,the potential laws for subway construction accidents will be found. And these laws are used as the basis of managers making accident-prevention security measures in reality.%针对地铁施工事故数据间关系日益复杂的特点,提出了多维关联规则在地铁施工事故数据的具体应用方法.利用关联规则工具挖掘出施工事故潜在的关系,具体展示了“人-机-环境-管理”和事故发生类型之间如何挖掘强关联规则的过程.通过找出可能导致地铁施工事故发生的频繁因素的组合,来发现施工坍塌事故发生的强关联规则.在评价强关联规则的基础上,找到适合地铁施工事故发生的潜在规律,并将这些规律作为现实中管理者做出预防安全事故发生措施的依据.

  17. Design of Intrusion Detection Model Based on FP-Growth and Dynamic Rule Generation with Clustering

    Directory of Open Access Journals (Sweden)

    Manish Somani

    2013-06-01

    Full Text Available Intrusion Detection is the process used to identify intrusions. If we think of the current scenario then several new intrusion that cannot be prevented by the previous algorithm, IDS is introduced to detect possible violations of a security policy by monitoring system activities and response in all times for betterment. If we detect the attack type in a particular communication environment, a response can be initiated to prevent or minimize the damage to the system. So it is a crucial concern. In our framework we present an efficient framework for intrusion detection which is based on Association Rule Mining (ARM and K-Means Clustering. K- Means clustering is use for separation of similar elements and after that association rule mining is used for better detection. Detection Rate (DR, False Positive Rate (FPR and False Negative Rate (FNR are used to measure performance and analysis experimental results.

  18. Design of Intrusion Detection Model Based on FP-Growth and Dynamic Rule Generation with Clustering

    Directory of Open Access Journals (Sweden)

    Manish Somani

    2013-06-01

    Full Text Available ntrusion Detection is the process used to identifyintrusions. If we think of the current scenario thenseveralnew intrusion that cannot be prevented bythe previous algorithm, IDS is introduced to detectpossible violations of a security policy by monitoringsystem activities and response in all times forbetterment. If we detect the attack type in aparticular communication environment, a responsecan be initiated to prevent or minimize the damageto the system. So it is a crucial concern. In ourframework we present an efficient framework forintrusion detection which is based on AssociationRule Mining (ARM and K-Means Clustering. K-Means clustering is use for separation of similarelements and after that association rule mining isused for better detection. Detection Rate (DR, FalsePositive Rate (FPR and False Negative Rate (FNRare used to measure performance and analysisexperimental results

  19. Study on Medication Rules of Dunhuang ManuscriptsFu Xing Jue Based on Data Mining%敦煌遗书《辅行诀》用药规律数据挖掘研究

    Institute of Scientific and Technical Information of China (English)

    李廷保

    2016-01-01

    Objective To provide references for the clinical application of Dunhuang prescriptions for treating internal medicine diseases by analyzing medication rules of Dunhuang manuscriptsFu Xing Jue based on data mining method.Methods TCM prescriptions for internal medicine diseases in the Dunhuang manuscriptsFu Xing Jue were input computer. Excel 2003 software was used to establish relevant database. Data mining method was used to analyze the medication rules.Results There were 61 TCM prescriptions in the Dunhuang manuscriptsFu Xing Jue, including66 kinds of Chinese herbal medicine and 336 times of total frequency of usage. The used core single herbs were as follows: Glycyrrhizae Radix et Rhizoma, Paeoniae Radix Alba, Zingiberis Rhizoma,Zingiberis Rhizoma Recens, Scuteliariae Radix, Inulae Flos, Ginseng Radix et Rhizoma, Jujubae Fructus, Lophatheri Herba, Schisandrae Chinensis Fructus, and Cinnamomi Ramulus; medicine types were tonifying-deficiency medicine, heat-clearing medicine, relieving exterior syndrome medicine, interior-warming medicine and antitussive and antiasthmatic medicine, with the cumulative frequency of 80.66%; medicine flavors were bitterness, sweetness and pungentness, with the cumulative frequency of 83.91%; medicine properties were coldness, warmness and peace, with the cumulative frequency of 87.95%; channel tropisms were stomach, lung, spleen, heart, kidney and liver, with the cumulative frequency of 86.15%. Clinical compatibility of medicines for treating internal medicine diseases in Dunhuang medical prescriptions were mainly qi-tonifying medicines (Glycyrrhizae Radix et Rhizoma, Ginseng Radix et Rhizoma and Jujubae Fructus), blood-replenishing medicine (Paeoniae Radix Alba), and nourishing medicines (Ophiopogonis Radix), supplemented by heat-clearing medicine (Lophatheri Herba andScuteliariae Radix),with relieving exterior syndrome medicines (Zingiberis Rhizoma Recens andCinnamomi Ramulus),interior-warming medicine (Zingiberis Rhizoma

  20. Mercury and trace element contents of Donbas coals and associated mine water in the vicinity of Donetsk, Ukraine

    Energy Technology Data Exchange (ETDEWEB)

    Kolker, Allan [U.S. Geological Survey, 956 National Center, Reston, VA 20192 (United States); Panov, Boris S.; Panov, Yuri B.; Korchemagin, Viktor A.; Shendrik, Tatiana [Department of Mineral Deposits and Ecological Geology, Donetsk National Technical University, Donetsk, 83000 (Ukraine); Landa, Edward R.; Conko, Kathryn M. [U.S. Geological Survey, 430 National Center, Reston, VA 20192 (United States); McCord, Jamey D. [U.S. Geological Survey, 973 Denver Federal Center, Denver, CO 80225 (United States)

    2009-08-01

    Mercury-rich coals in the Donets Basin (Donbas region) of Ukraine were sampled in active underground mines to assess the levels of potentially harmful elements and the potential for dispersion of metals through use of this coal. For 29 samples representing c{sub 11} to m{sub 3} Carboniferous coals, mercury contents range from 0.02 to 3.5 ppm (whole-coal dry basis). Mercury is well correlated with pyritic sulfur (0.01 to 3.2 wt.%), with an r{sup 2} of 0.614 (one outlier excluded). Sulfides in these samples show enrichment of minor constituents in late-stage pyrite formed as a result of interaction of coal with hydrothermal fluids. Mine water sampled at depth and at surface collection points does not show enrichment of trace metals at harmful levels, indicating pyrite stability at subsurface conditions. Four samples of coal exposed in the defunct open-cast Nikitovka mercury mines in Gorlovka have extreme mercury contents of 12.8 to 25.5 ppm. This coal was formerly produced as a byproduct of extracting sandstone-hosted cinnabar ore. Access to these workings is unrestricted and small amounts of extreme mercury-rich coal are collected for domestic use, posing a limited human health hazard. More widespread hazards are posed by the abandoned Nikitovka mercury processing plant, the extensive mercury mine tailings, and mercury enrichment of soils extending into residential areas of Gorlovka. (author)