WorldWideScience

Sample records for association rule mining

  1. A Collaborative Educational Association Rule Mining Tool

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; de Castro, Carlos

    2011-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the ongoing improvement of e-learning courses and allowing teachers with similar course profiles to share and score the discovered information. The mining tool is oriented to be used by non-expert instructors in data mining so its internal…

  2. Efficient Mining of Intertransaction Association Rules

    NARCIS (Netherlands)

    Tung, A.K.H.; Lu, H.J.; Han, J.W.; Feng, L.

    Most of the previous studies on mining association rules are on mining intratransaction associations, i.e., the associations among items within the same transaction where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this

  3. Online association rule mining over fast data

    OpenAIRE

    Ölmezoğulları, E.; Arı, İsmail

    2013-01-01

    Due to copyright restrictions, the access to the full text of this article is only available via subscription. To extract useful and actionable information in real-time, the information technology (IT) world is coping with big data problems today. In this paper, we present implementation details and performance results of ReCEPtor, our system for "online" Association Rule Mining (ARM) over big and fast data streams. Specifically, we added Apriori and two different FP-Growth algorithms insi...

  4. Privacy-preserving distributed mining of association rules using ...

    Indian Academy of Sciences (India)

    Harendra Chahar

    2017-11-17

    Nov 17, 2017 ... hamper global mining result. In addition, it should have low communication and computational cost. 2.3 Privacy-preserving distributed association rules mining: existing proposals. There have been several works to date for privacy-pre- serving distributed association rule mining. Existing approaches can be ...

  5. AN ALGORITHM FOR GENERATING SINGLE DIMENSIONAL FUZZY ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2006-01-01

    Full Text Available Association rule mining searches for interesting relationship among items in a large data set. Market basket analysis, a typical example of association rule mining, analyzes buying habit of customers by finding association between the different items that customers put in their shopping cart (basket. Apriori algorithm is an influential algorithm for mining frequent itemset for generating association rules. For some reasons, Apriori algorithm is not based on human intuitive. To provide a more human-based concept, this paper proposes an alternative algorithm for generating the association rule by utilizing fuzzy sets in the market basket analysis.

  6. Boosting association rule mining in large datasets via Gibbs sampling

    Science.gov (United States)

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-01-01

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling–induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm. PMID:27091963

  7. A RESEARCH ON SPATIAL TOPOLOGICAL ASSOCIATION RULES MINING

    Directory of Open Access Journals (Sweden)

    J. Chen

    2012-07-01

    Full Text Available Spatial association rules mining is a process of acquiring information and knowledge from large databases. Due to the nature of geographic space and the complexity of spatial objects and relations, the classical association rule mining methods are not suitable for the spatial association rule mining. Classical association rule mining treats all input data as independent, while spatial association rules often show high autocorrelation among nearby objects. The contiguous, adjacent and neighboring relations between spatial objects are important topological relations. In this paper a new approach based on topological predictions to discover spatial association rules is presented. First, we develop a fast method to get the topological relationship of spatial data with its algebraic structure. Then the interested spatial objects are selected. To find the interested spatial objects, topological relations combining with distance were used. In this step, the frequent topological predications are gained. Next, the attribute datasets of the selected interested spatial objects are mined with Apriori algorithm. Last, get the spatial topological association rules. The presented approach has been implemented and tested by the data of GDP per capita, railroads and roads in China in the year of 2005 at county level. The results of the experiments show that the approach is effective and valid.

  8. Loss profit estimation using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-02-01

    Full Text Available Data mining is the technique to find hidden patterns from a very large volume of historical data. Association rule is a type of data mining that correlates one set of items or events with another set of items or events. Another data mining strategy is clustering technique. This technique is used to create partitions so that all members of each set are similar according to a specified set of metrics. Both the association rule mining and clustering helps in more effective individual and group decision making for optimal inventory control. Owing to the above facts, association rules are mined from each cluster to find frequent items and then loss profit is calculated for each frequent item. Initially, the clustering algorithm is used to partition the transactional database into different clusters. Apriori, a classic data mining algorithm is utilized for mining association rules from each cluster to find frequent items. Later the loss profit is calculated for each frequent item. The obtained loss profit is used to rank frequent items in each cluster. Thus, the ranking of frequent items in each cluster using the proposed approach greatly facilitate optimal inventory control. An example is illustrated to validate the results.

  9. Association Rule Mining from an Intelligent Tutor

    Science.gov (United States)

    Dogan, Buket; Camurcu, A. Yilmaz

    2008-01-01

    Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…

  10. Gain ratio based fuzzy weighted association rule mining classifier for ...

    Indian Academy of Sciences (India)

    Home; Journals; Sadhana; Volume 39; Issue 1 ... The health care environment still needs knowledge based discovery for handling wealth of data. ... approach, called gain ratio based fuzzy weighted association rule mining, is thus proposed for distinct diseases and also increase the learning time of the previous one.

  11. Konstruksi Bayesian Network Dengan Algoritma Bayesian Association Rule Mining Network

    OpenAIRE

    Octavian

    2015-01-01

    Beberapa tahun terakhir, Bayesian Network telah menjadi konsep yang populer digunakan dalam berbagai bidang kehidupan seperti dalam pengambilan sebuah keputusan dan menentukan peluang suatu kejadian dapat terjadi. Sayangnya, pengkonstruksian struktur dari Bayesian Network itu sendiri bukanlah hal yang sederhana. Oleh sebab itu, penelitian ini mencoba memperkenalkan algoritma Bayesian Association Rule Mining Network untuk memudahkan kita dalam mengkonstruksi Bayesian Network berdasarkan data ...

  12. Promoter Sequences Prediction Using Relational Association Rule Mining

    Science.gov (United States)

    Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely

    2012-01-01

    In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233

  13. Fast rule-based bioactivity prediction using associative classification mining

    Directory of Open Access Journals (Sweden)

    Yu Pulan

    2012-11-01

    Full Text Available Abstract Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM, which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR, classification based on multiple association rules (CMAR and classification based on association rules (CBA are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB, mutagenicity and hERG (the human Ether-a-go-go-Related Gene blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM methods, and produce highly interpretable models.

  14. Fast rule-based bioactivity prediction using associative classification mining.

    Science.gov (United States)

    Yu, Pulan; Wild, David J

    2012-11-23

    Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM), which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR), classification based on multiple association rules (CMAR) and classification based on association rules (CBA) are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB), mutagenicity and hERG (the human Ether-a-go-go-Related Gene) blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM) methods, and produce highly interpretable models.

  15. Feasibility study for banking loan using association rule mining classifier

    Directory of Open Access Journals (Sweden)

    Agus Sasmito Aribowo

    2015-03-01

    Full Text Available The problem of bad loans in the koperasi can be reduced if the koperasi can detect whether member can complete the mortgage debt or decline. The method used for identify characteristic patterns of prospective lenders in this study, called Association Rule Mining Classifier. Pattern of credit member will be converted into knowledge and used to classify other creditors. Classification process would separate creditors into two groups: good credit and bad credit groups. Research using prototyping for implementing the design into an application using programming language and development tool. The process of association rule mining using Weighted Itemset Tidset (WIT–tree methods. The results shown that the method can predict the prospective customer credit. Training data set using 120 customers who already know their credit history. Data test used 61 customers who apply for credit. The results concluded that 42 customers will be paying off their loans and 19 clients are decline

  16. Parametric Rough Sets with Application to Granular Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Xu He

    2013-01-01

    Full Text Available Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. The algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable.

  17. COLLABORATIVE NETWORK SECURITY MANAGEMENT SYSTEM BASED ON ASSOCIATION MINING RULE

    Directory of Open Access Journals (Sweden)

    Nisha Mariam Varughese

    2014-07-01

    Full Text Available Security is one of the major challenges in open network. There are so many types of attacks which follow fixed patterns or frequently change their patterns. It is difficult to find the malicious attack which does not have any fixed patterns. The Distributed Denial of Service (DDoS attacks like Botnets are used to slow down the system performance. To address such problems Collaborative Network Security Management System (CNSMS is proposed along with the association mining rule. CNSMS system is consists of collaborative Unified Threat Management (UTM, cloud based security centre and traffic prober. The traffic prober captures the internet traffic and given to the collaborative UTM. Traffic is analysed by the Collaborative UTM, to determine whether it contains any malicious attack or not. If any security event occurs, it will reports to the cloud based security centre. The security centre generates security rules based on association mining rule and distributes to the network. The cloud based security centre is used to store the huge amount of tragic, their logs and the security rule generated. The feedback is evaluated and the invalid rules are eliminated to improve the system efficiency.

  18. Beyond Intra-Transaction Association Analysis: Mining Multi-Dimensional Inter-Transaction Association Rules

    NARCIS (Netherlands)

    Lu, H.J.; Feng, L.; Han, J.W.

    2000-01-01

    In this paper, we extend the scope of mining association rules from traditional single-dimensional intratransaction associations, to multidimensional intertransaction associations. Intratransaction associations are the associations among items with the same transaction, where the notion of the

  19. A partition enhanced mining algorithm for distributed association rule mining systems

    Directory of Open Access Journals (Sweden)

    A.O. Ogunde

    2015-11-01

    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  20. Penguins Search Optimisation Algorithm for Association Rules Mining

    Directory of Open Access Journals (Sweden)

    Youcef Gheraibia

    2016-06-01

    Full Text Available Association Rules Mining (ARM is one of the most popular and well-known approaches for the decision-making process. All existing ARM algorithms are time consuming and generate a very large number of association rules with high overlapping. To deal with this issue, we propose a new ARM approach based on penguins search optimization algorithm (Pe-ARM for short. Moreover, an efficient measure is incorporated into the main process to evaluate the amount of overlapping among the generated rules. The proposed approach also ensures a good diversification over the whole solutions space. To demonstrate the effectiveness of the proposed approach, several experiments have been carried out on different datasets and specifically on the biological ones. The results reveal that the proposed approach outperforms the well-known ARM algorithms in both execution time and solution quality.

  1. An Algorithm of Association Rule Mining for Microbial Energy Prospection

    Science.gov (United States)

    Shaheen, Muhammad; Shahbaz, Muhammad

    2017-01-01

    The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846

  2. Mining Association Rules in the BCCA Liver Cancer Data Set.

    Science.gov (United States)

    Pinheiro, Fabiola; Kuo, Mu-Hsing; Thomo, Alex; Barnett, Jeff

    2015-01-01

    The objective of this study is to apply data mining techniques to determine factors that are commonly associated with liver cancer incidence, using an anonymized data set of 6064 patients from the British Columbia Cancer Agency (BCCA). The association rules indicate that in BC the patient demographic factors associated with increased liver cancer include: age ranges 60-69, male gender, and geographic location in the Greater Vancouver area. The main factors associated with decreased survivability in BC were being male and in the age range 70-79. In the Yukon, being male and in the age range 60-69 was the main factor associated with both increased incidence of liver cancer and decreased survivability.

  3. A novel association rule mining approach using TID intermediate itemset.

    Science.gov (United States)

    Aqra, Iyad; Herawan, Tutut; Abdul Ghani, Norjihan; Akhunzada, Adnan; Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

  4. Mining Association Rules in Dengue Gene Sequence with Latent Periodicity

    Directory of Open Access Journals (Sweden)

    Marimuthu Thangam

    2015-01-01

    Full Text Available The mining of periodic patterns in dengue database is an interesting research problem that can be used for predicting the future evolution of dengue viruses. In this paper, we propose an algorithm called Recurrence Finder (RECFIN that uses the suffix tree for detecting the periodic patterns of dengue gene sequence. Also, the RECFIN finds the presence of palindrome which indicates the possibilities of formation of proteins. Further, this paper computes the periodicity of nucleic acid and amino acid sequences of any length. The periodicity based association rules are used to diagnose the type of dengue. The time complexity of the proposed algorithm is O(n2. We demonstrate the effectiveness of the proposed approach by comparing the experimental results performed on dengue virus serotypes dataset with NCBI-BLAST algorithm.

  5. Action Rules Mining

    CERN Document Server

    Dardzinska, Agnieszka

    2013-01-01

    We are surrounded by data, numerical, categorical and otherwise, which must to be analyzed and processed to convert it into information that instructs, answers or aids understanding and decision making. Data analysts in many disciplines such as business, education or medicine, are frequently asked to analyze new data sets which are often composed of numerous tables possessing different properties. They try to find completely new correlations between attributes and show new possibilities for users.   Action rules mining discusses some of data mining and knowledge discovery principles and then describe representative concepts, methods and algorithms connected with action. The author introduces the formal definition of action rule, notion of a simple association action rule and a representative action rule, the cost of association action rule, and gives a strategy how to construct simple association action rules of a lowest cost. A new approach for generating action rules from datasets with numerical attributes...

  6. MINING ASSOCIATION RULES TO EVALUATE CONSUMER PERCEPTION: A NEW FP-TREE APPROACH

    OpenAIRE

    Nandini Das; Avishek Ghosh; Prasun Das

    2011-01-01

    Association rule mining finds interesting relationships among large set of data items. While finding the important (or, frequent) relations from the set of consumer survey data, a modified algorithm based on frequent pattern growth is developed in this work. The sensitivity of support and confidence used for rule mining on the data is tested. The interaction between the order of the attributes and the confidence used is observed in terms of the number of rules mined. Th e impact of the produc...

  7. Validity of association rules extracted by healthcare-data-mining.

    Science.gov (United States)

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  8. Gain ratio based fuzzy weighted association rule mining classifier for ...

    Indian Academy of Sciences (India)

    2: 271–277. Chen C-H, Tseng V S and Hong T-P 2008 Cluster-based evaluation in fuzzy-genetic data mining. IEEE. Trans. Fuzzy Syst. 16(1): 249 del Jesus M J, González P, Herrera F and Mesonero M 2007 Evolutionary fuzzy rule induction process for subgroup discovery: A case study in marketing. IEEE Trans. Fuzzy Syst.

  9. Discovery of association rules between syntactic variables. Data mining the Syntactic Atlas of the Dutch dialects.

    NARCIS (Netherlands)

    Spruit, M.R.; Dirix, P.; Schuurman, I.; Vandeghinste, V.; Van Eynde, F.

    2007-01-01

    This research applies an association rule mining technique to purely syntactic dialect data. The paper answers the research question of how relevant associations between syntactic variables can be discovered. The method calculates the proportional overlap between geographical distributions of

  10. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    Science.gov (United States)

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  11. Mining Interesting XML-Enabled Association Rules with Templates

    NARCIS (Netherlands)

    Feng, L.; Dillon, T.

    2004-01-01

    XML-enabled association rule framework [FDWC03] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association

  12. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets

    Directory of Open Access Journals (Sweden)

    Michael Hahsler

    2005-09-01

    Full Text Available Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

  13. PARM--an efficient algorithm to mine association rules from spatial data.

    Science.gov (United States)

    Ding, Qin; Ding, Qiang; Perrizo, William

    2008-12-01

    Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano Count Tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based Association Rule Mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.

  14. MINING ASSOCIATION RULES TO EVALUATE CONSUMER PERCEPTION: A NEW FP-TREE APPROACH

    Directory of Open Access Journals (Sweden)

    Nandini Das

    2011-06-01

    Full Text Available Association rule mining finds interesting relationships among large set of data items. While finding the important (or, frequent relations from the set of consumer survey data, a modified algorithm based on frequent pattern growth is developed in this work. The sensitivity of support and confidence used for rule mining on the data is tested. The interaction between the order of the attributes and the confidence used is observed in terms of the number of rules mined. Th e impact of the product features on the level of consumer perception is thoroughly studied.

  15. EOQ estimation for imperfect quality items using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-09-01

    Full Text Available Timely identification of newly emerging trends is needed in business process. Data mining techniques like clustering, association rule mining, classification, etc. are very important for business support and decision making. This paper presents a method for redesigning the ordering policy by including cross-selling effect. Initially, association rules are mined on the transactional database and EOQ is estimated with revenue earned. Then, transactions are clustered to obtain homogeneous clusters and association rules are mined in each cluster to estimate EOQ with revenue earned for each cluster. Further, this paper compares ordering policy for imperfect quality items which is developed by applying rules derived from apriori algorithm viz. a without clustering the transactions, and b after clustering the transactions. A numerical example is illustrated to validate the results.

  16. From Intra-transaction to Generalized Inter-transaction: Landscaping Multidimensional Contexts in Association Rule Mining

    NARCIS (Netherlands)

    Li, Q; Feng, L.; Wong, A.K.Y.

    The problem of mining multidimensional inter-transactional association rules was recently introduced in [ACM Trans. Inform. Syst. 18(4) (2000) 423; Proc. of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Seattle, Washington, June 1998, p. 12:1]. It extends the

  17. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  18. Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology

    Science.gov (United States)

    Manda, Prashanti; Ozkan, Seval; Wang, Hui; McCarthy, Fiona; Bridges, Susan M.

    2012-01-01

    The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms. PMID:23071802

  19. Co-Operative Coevolutionary Neural Networks for Mining Functional Association Rules.

    Science.gov (United States)

    Wang, Bing; Merrick, Kathryn E; Abbass, Hussein A

    2017-06-01

    In this paper, we introduce a novel form of association rules (ARs) that do not require discretization of continuous variables or the use of intervals in either sides of the rule. This rule form captures nonlinear relationships among variables, and provides an alternative pattern representation for mining essential relations hidden in a given data set. We refer to the new rule form as a functional AR (FAR). A new neural network-based, co-operative, coevolutionary algorithm is presented for FAR mining. The algorithm is applied to both synthetic and real-world data sets, and its performance is analyzed. The experimental results show that the proposed mining algorithm is able to discover valid and essential underlying relations in the data. Comparison experiments are also carried out with the two state-of-the-art AR mining algorithms that can handle continuous variables to demonstrate the competitive performance of the proposed method.

  20. Spatio-Temporal Rule Mining

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2005-01-01

    Recent advances in communication and information technology, such as the increasing accuracy of GPS technology and the miniaturization of wireless communication devices pave the road for Location-Based Services (LBS). To achieve high quality for such services, spatio-temporal data mining techniques...... are needed. In this paper, we describe experiences with spatio-temporal rule mining in a Danish data mining company. First, a number of real world spatio-temporal data sets are described, leading to a taxonomy of spatio-temporal data. Second, the paper describes a general methodology that transforms...... the spatio-temporal rule mining task to the traditional market basket analysis task and applies it to the described data sets, enabling traditional association rule mining methods to discover spatio-temporal rules for LBS. Finally, unique issues in spatio-temporal rule mining are identified and discussed....

  1. Negative and positive association rules mining from text using frequent and infrequent itemsets.

    Science.gov (United States)

    Mahmood, Sajid; Shahbaz, Muhammad; Guergachi, Aziz

    2014-01-01

    Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of accurate NARs, and their huge number as compared with positive association rules. In medical science, for example, one is interested in factors which can either adjudicate the presence of a disease or write-off of its possibility. The vivid positive symptoms are often obvious; however, negative symptoms are subtler and more difficult to recognize and diagnose. In this paper, we propose an algorithm for discovering positive and negative association rules among frequent and infrequent itemsets. We identify associations among medications, symptoms, and laboratory results using state-of-the-art data mining technology.

  2. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

    Science.gov (United States)

    Tandon, Disha; Haque, Mohammed Monzoorul; Mande, Sharmila S

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM

  3. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques

    Science.gov (United States)

    Mande, Sharmila S.

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM

  4. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

    Directory of Open Access Journals (Sweden)

    Disha Tandon

    Full Text Available The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state. Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule

  5. MINING MULTIDIMENSIONAL FUZZY ASSOCIATION RULES FROM A DATABASE OF MEDICAL RECORD PATIENTS

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2008-01-01

    Full Text Available Mining association rules is one of the important tasks in the process of data mining application. In general, the input as used in the process of generating rules is taken from a certain data table by which all the corresponding values of every domain data have correlations one to each others as given in the table. A problem arises when we need to generate the rules expressing the relationship between two or more domains that belong to several different tables in a normalized database. To overcome the problem, before generating rules it is necessary to join the participant tables into a general table by a process called Denormalization Process. This paper shows a process of generating Multidimensional Fuzzy Association Rules mining from a normalized database of medical record patients. The process consists of two sub-processes, namely sub-process of join tables (Denormalization Process and sub-process of generating fuzzy rules. In general, the process of generating the fuzzy rules has been discussed in our previous papers [1, 2, 3, 4]. In addition to the process of generating fuzzy rules, this paper proposes a correlation measure of the rules as an additional consideration for evaluating interestingness of provided rules.

  6. Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

    Directory of Open Access Journals (Sweden)

    Dinesh J. Prajapati

    2017-06-01

    Full Text Available Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA and Fast Distributed Mining (FDM algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

  7. Interestingness of association rules in data mining: Issues relevant ...

    Indian Academy of Sciences (India)

    In this level playing field, firms are forced to compete on the basis of knowledge. Data mining tools and techniques provide e-commerce applications with novel and significant knowledge. This knowledge can be leveraged to gain competitive advantage. However, the automated nature of data mining algorithms may result in ...

  8. Mining Association Rules among Gene Functions in Clusters of Similar Gene Expression Maps.

    Science.gov (United States)

    An, Li; Obradovic, Zoran; Smith, Desmond; Bodenreider, Olivier; Megalooikonomou, Vasileios

    2009-11-01

    Association rules mining methods have been recently applied to gene expression data analysis to reveal relationships between genes and different conditions and features. However, not much effort has focused on detecting the relation between gene expression maps and related gene functions. Here we describe such an approach to mine association rules among gene functions in clusters of similar gene expression maps on mouse brain. The experimental results show that the detected association rules make sense biologically. By inspecting the obtained clusters and the genes having the gene functions of frequent itemsets, interesting clues were discovered that provide valuable insight to biological scientists. Moreover, discovered association rules can be potentially used to predict gene functions based on similarity of gene expression maps.

  9. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  10. Improving Intrusion Detection System Based on Snort Rules for Network Probe Attacks Detection with Association Rules Technique of Data Mining

    Directory of Open Access Journals (Sweden)

    Nattawat Khamphakdee

    2015-07-01

    Full Text Available The intrusion detection system (IDS is an important network security tool for securing computer and network systems. It is able to detect and monitor network traffic data. Snort IDS is an open-source network security tool. It can search and match rules with network traffic data in order to detect attacks, and generate an alert. However, the Snort IDS  can detect only known attacks. Therefore, we have proposed a procedure for improving Snort IDS rules, based on the association rules data mining technique for detection of network probe attacks.  We employed the MIT-DARPA 1999 data set for the experimental evaluation. Since behavior pattern traffic data are both normal and abnormal, the abnormal behavior data is detected by way of the Snort IDS. The experimental results showed that the proposed Snort IDS rules, based on data mining detection of network probe attacks, proved more efficient than the original Snort IDS rules, as well as icmp.rules and icmp-info.rules of Snort IDS.  The suitable parameters for the proposed Snort IDS rules are defined as follows: Min_sup set to 10%, and Min_conf set to 100%, and through the application of eight variable attributes. As more suitable parameters are applied, higher accuracy is achieved.

  11. Interestingness of association rules in data mining: Issues relevant ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    domain expert since they represent common place knowledge. Researchers in the data mining community have acknowledged the importance of ... Personalization or one-to-one marketing is the delivery of a targeted solution to a customer by using the customer's information such as likes, dislikes, preferences, etc (Murthi ...

  12. Clustering and summarising association rules mined from phenotype, genotype and environmental data concerning age-related hearing impairment.

    Science.gov (United States)

    Iltanen, Kati; Kiviharju, Sami; Ao, Lida; Juhola, Martti; Pyykkö, Ilmari

    2013-01-01

    In this study, we examine the applicability of association rules for analysing high-dimensional data concerning age-related hearing impairment (ARHI). The ARHI data of the study contain hundreds of variables concerning phenotype, genotype and environmental factors. The number of association rules produced from the data is too large for manual exploration in the raw and furthermore, the rules are overlapping. Thus, the focus of our study is to develop an approach to cluster association rules into subsets and to summarise and represent the found rule subsets for easier exploration of rules. The results show that it is possible to efficiently extract rules representing interesting environmental factor-gene or gene-gene interactions. Finding suitable parameters for the association rule mining and the possibility to post-process the mined rules is essential. The developed approach facilitates rule exploration by grouping rules with items concerning the same phenomenon to the same subset and byrevealing overlapping rules.

  13. The application of data mining to explore association rules between metabolic syndrome and lifestyles.

    Science.gov (United States)

    Huang, Yi Chao

    2013-01-01

    This study used an efficient data mining algorithm, called DCIP (the data cutting and inner product method), to explore association rules between the lifestyles of factory workers in Taiwan and the metabolic syndrome. A total of 1,216 workers in four companies completed a lifestyle questionnaire. Results of the questionnaire survey were integrated into the workers' health examination reports to form an attribute database of the metabolic syndrome. Among the association rules derived by DCIP, 80% of those on the list of the top 15 highest support counts are corroborated by medical literature or by healthcare professionals. These findings prove that data mining is a valid and effective research method, and that larger sample sizes will likely produce more accurate associations connecting the metabolic syndrome to specific lifestyles. The rules already verified can serve as a reference guide for the health management of factory workers. The remaining 20%, while still lacking hard evidence, provide fertile ground for future research.

  14. Association rule mining of cellular responses induced by metal and metal oxide nanoparticles.

    Science.gov (United States)

    Liu, Rong; France, Bryan; George, Saji; Rallo, Robert; Zhang, Haiyuan; Xia, Tian; Nel, Andre E; Bradley, Kenneth; Cohen, Yoram

    2014-03-07

    Relationships among fourteen different biological responses (including ten signaling pathway activities and four cytotoxicity effects) of murine macrophage (RAW264.7) and bronchial epithelial (BEAS-2B) cells exposed to six metal and metal oxide nanoparticles (NPs) were analyzed using both statistical and data mining approaches. Both the pathway activities and cytotoxicity effects were assessed using high-throughput screening (HTS) over an exposure period of up to 24 h and concentration range of 0.39-200 mg L(-1). HTS data were processed by outlier removal, normalization, and hit-identification (for significantly regulated cellular responses) to arrive at reliable multiparametric bioactivity profiles for the NPs. Association rule mining was then applied to the bioactivity profiles followed by a pruning process to remove redundant rules. The non-redundant association rules indicated that "significant regulation" of one or more cellular responses implies regulation of other (associated) cellular response types. Pairwise correlation analysis (via Pearson's χ(2) test) and self-organizing map clustering of the different cellular response types indicated consistency with the identified non-redundant association rules. Furthermore, in order to explore the potential use of association rules as a tool for data-driven hypothesis generation, specific pathway activity experiments were carried out for ZnO NPs. The experimental results confirmed the association rule identified for the p53 pathway and mitochondrial superoxide levels (via MitoSox reagent) and further revealed that blocking of the transcriptional activity of p53 lowered the MitoSox signal. The present approach of using association rule mining for data-driven hypothesis generation has important implications for streamlining multi-parameter HTS assays, improving the understanding of NP toxicity mechanisms, and selection of endpoints for the development of nanomaterial structure-activity relationships.

  15. Association rule mining data for census tract chemical exposure analysis

    Data.gov (United States)

    U.S. Environmental Protection Agency — Chemical concentration, exposure, and health risk data for U.S. census tracts from National Scale Air Toxics Assessment (NATA). This dataset is associated with the...

  16. Weighted Association Rule Mining for Item Groups with Different Properties and Risk Assessment for Networked Systems

    Science.gov (United States)

    Kim, Jungja; Ceong, Heetaek; Won, Yonggwan

    In market-basket analysis, weighted association rule (WAR) discovery can mine the rules that include more beneficial information by reflecting item importance for special products. In the point-of-sale database, each transaction is composed of items with similar properties, and item weights are pre-defined and fixed by a factor such as the profit. However, when items are divided into more than one group and the item importance must be measured independently for each group, traditional weighted association rule discovery cannot be used. To solve this problem, we propose a new weighted association rule mining methodology. The items should be first divided into subgroups according to their properties, and the item importance, i.e. item weight, is defined or calculated only with the items included in the subgroup. Then, transaction weight is measured by appropriately summing the item weights from each subgroup, and the weighted support is computed as the fraction of the transaction weights that contains the candidate items relative to the weight of all transactions. As an example, our proposed methodology is applied to assess the vulnerability to threats of computer systems that provide networked services. Our algorithm provides both quantitative risk-level values and qualitative risk rules for the security assessment of networked computer systems using WAR discovery. Also, it can be widely used for new applications with many data sets in which the data items are distinctly separated.

  17. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  18. An Associate Rules Mining Algorithm Based on Artificial Immune Network for SAR Image Segmentation

    Directory of Open Access Journals (Sweden)

    Mengling Zhao

    2015-01-01

    Full Text Available As a computational intelligence method, artificial immune network (AIN algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new classification algorithm an associate rules mining algorithm based on artificial immune network (ARM-AIN. The new method uses the association rules to represent immune cells and mine the best association rules rather than searching optimal clustering centers. The proposed algorithm has been extensively compared with artificial immune network classification (AINC algorithm, artificial immune network classification algorithm based on self-adaptive PSO (SPSO-AINC, and PSO-AINC over several large-scale data sets, target recognition of remote sensing image, and segmentation of three different SAR images. The result of experiment indicates the superiority of ARM-AIN in classification accuracy and running time.

  19. Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences.

    Science.gov (United States)

    Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou

    2006-06-15

    The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.

  20. Effect of temporal relationships in associative rule mining for web log data.

    Science.gov (United States)

    Khairudin, Nazli Mohd; Mustapha, Aida; Ahmad, Mohd Hanif

    2014-01-01

    The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality.

  1. Effect of Temporal Relationships in Associative Rule Mining for Web Log Data

    Science.gov (United States)

    Mohd Khairudin, Nazli; Mustapha, Aida

    2014-01-01

    The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality. PMID:24587757

  2. Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS).

    Science.gov (United States)

    Wei, Lai; Scott, John

    2015-09-01

    Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection algorithms identify disproportionate product-adverse event pairs and may not be sensitive to more complex potential signals. We sought to develop a computationally tractable multivariate data-mining approach to identify product-multiple adverse event associations. We describe an application of stepwise association rule mining (Step-ARM) to detect potential vaccine-symptom group associations in the US Vaccine Adverse Event Reporting System. Step-ARM identifies strong associations between one vaccine and one or more adverse events. To reduce the number of redundant association rules found by Step-ARM, we also propose a clustering method for the post-processing of association rules. In sample applications to a trivalent intradermal inactivated influenza virus vaccine and to measles, mumps, rubella, and varicella (MMRV) vaccine and in simulation studies, we find that Step-ARM can detect a variety of medically coherent potential vaccine-symptom group signals efficiently. In the MMRV example, Step-ARM appears to outperform univariate methods in detecting a known safety signal. Our approach is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasure products such as pandemic influenza vaccines. The post-processing clustering algorithm improves the applicability of the approach as a screening method to identify patterns that may merit further investigation. Copyright © 2015 John Wiley & Sons, Ltd.

  3. A mutual-information-based mining method for marine abnormal association rules

    Science.gov (United States)

    Cunjin, Xue; Wanjiao, Song; Lijuan, Qin; Qing, Dong; Xiaoyang, Wen

    2015-03-01

    Long time series of remote sensing images are a key source of data for exploring large-scale marine abnormal association patterns, but pose significant challenges for traditional approaches to spatiotemporal analysis. This paper proposes a mutual-information-based quantitative association rule-mining algorithm (MIQarma) to address these challenges. MIQarma comprises three key steps. First, MIQarma calculates the asymmetrical mutual information between items with one scan of the database, and extracts pair-wise related items according to the user-specified information threshold. Second, a linking-pruning-generating recursive loop generates (k+1)-dimensional candidate association rules from k-dimensional rules on basis of the user-specified minimum support threshold, and this step is repeated until no more candidate association rules are generated. Finally, strong association rules are generated according to the user-specified minimum evaluation indicators. To demonstrate the feasibility and efficiency of MIQarma, we present two case studies: one considers performance analysis and the other identifies marine abnormal association relationships.

  4. Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor.

    Science.gov (United States)

    Sengupta, Dipankar; Sood, Meemansa; Vijayvargia, Poorvika; Hota, Sunil; Naik, Pradeep K

    2013-01-01

    Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis & treatment of disease from the clinical dataset is therefore increasingly becoming necessary. Aim of this study was to assess the applicability of knowledge discovery in brain tumor data warehouse, applying data mining techniques for investigation of clinical parameters that can be associated with occurrence of brain tumor. In this study, a brain tumor warehouse was developed comprising of clinical data for 550 patients. Apriori association rule algorithm was applied to discover associative rules among the clinical parameters. The rules discovered in the study suggests - high values of Creatinine, Blood Urea Nitrogen (BUN), SGOT & SGPT to be directly associated with tumor occurrence for patients in the primary stage with atleast 85% confidence and more than 50% support. A normalized regression model is proposed based on these parameters along with Haemoglobin content, Alkaline Phosphatase and Serum Bilirubin for prediction of occurrence of STATE (brain tumor) as 0 (absent) or 1 (present). The results indicate that the methodology followed will be of good value for the diagnostic procedure of brain tumor, especially when large data volumes are involved and screening based on discovered parameters would allow clinicians to detect tumors at an early stage of development.

  5. Association rule mining on grid monitoring data to detect error sources

    Science.gov (United States)

    Maier, Gerhild; Schiffers, Michael; Kranzlmueller, Dieter; Gaidioz, Benjamin

    2010-04-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information - expressed by association rules - is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability.

  6. Association rule mining on grid monitoring data to detect error sources

    CERN Document Server

    Maier, G; Kranzlmueller, D; Gaidioz, B

    2010-01-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability

  7. Association rule mining on grid monitoring data to detect error sources

    Energy Technology Data Exchange (ETDEWEB)

    Maier, Gerhild; Gaidioz, Benjamin [CERN, Geneva (Switzerland); Schiffers, Michael; Kranzlmueller, Dieter, E-mail: Gerhild.Maier@cern.c [Ludwig Maximilian University, Munich (Germany)

    2010-04-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information - expressed by association rules - is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability.

  8. Discovering protein–DNA binding sequence patterns using association rule mining

    Science.gov (United States)

    Wong, Ka-Chun; Chan, Tak-Ming; Wong, Man-Hon; Lee, Kin-Hong; Lau, Chi-Kong; Tsui, Stephen K. W.

    2010-01-01

    Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings. PMID:20529874

  9. Mining association rules between abnormal health examination results and outpatient medical records.

    Science.gov (United States)

    Chao Huang, Yi

    Currently, interpretation of health examination reports relies primarily on the physician's own experience. If health screening data could be integrated with outpatient medical records to uncover correlations between disease and abnormal test results, the physician could benefit from having additional reference resources for medical examination report interpretation and clinic diagnosis. This study used the medical database of a regional hospital in Taiwan to illustrate how association rules can be found between abnormal health examination results and outpatient illnesses. The rules can help to build up a disease-prevention knowledge database that assists healthcare providers in follow-up treatment and prevention. Furthermore, this study proposes a new algorithm, the data cutting and sorting method, or DCSM, in place of the traditional Apriori algorithm. DCSM significantly improves the mining performance of Apriori by reducing the time to scan health examination and outpatient medical records, both of which are databases of immense sizes.

  10. [Exploration on eighteen incompatible medicaments of chest pain prescriptions based on association rules mining].

    Science.gov (United States)

    Zhang, Yuhua; Hua, Haoming; Fan, Xinsheng; Wang, Chongjun; Duan, Jinao

    2011-12-01

    To investigate the laws of eighteen incompatible medicaments of the chest pain prescriptions based on association rules mining. The database of chest pain prescription was established and then the chest pain prescriptions composed of eighteen incompatible medicaments were screened. The dynasty, couplet medicines, the property and flavor of drugs and preparation form were analyzed with the frequent item sets and corresponding analysis methods. Eight hundred and fifty chest pain prescriptions were collected, and 88 of them contained eighteen incompatible medicaments, taking 10.3% of all; the applications of ancient and modern chest pain prescriptions containing eighteen incompatible medicaments are significant difference (P rules for application of anti-drug compatibility to treat chest pain.

  11. Mining health care administrative data with temporal association rules on hybrid events.

    Science.gov (United States)

    Concaro, S; Sacchi, L; Cerra, C; Fratino, P; Bellazzi, R

    2011-01-01

    The analysis of administrative health care data can be helpful to conveniently assess health care activities. In this context temporal data mining techniques can be suitably exploited to get a deeper insight into the processes underlying health care delivery. In this paper we present an algorithm for the extraction of temporal association rules (TARs) on sequences of hybrid events and its application on health care administrative databases. We propose a method that extends TAR mining by managing hybrid events, namely events characterized by a heterogeneous temporal nature. Hybrid events include both point-like events (e.g. ambulatory visits) and interval-like events (e.g. drug consumption). The definition of user-defined rule templates can be optionally used to constrain the search only to the extraction of a subset of interesting rules. A TAR post-pruning strategy, based on a case-control approach, is also presented. We analyzed the administrative database of diabetic patients in charge to the regional health care agency (ASL) of Pavia. TAR mining allowed to find patterns specifically related to the diabetic population in comparison with a control group, as well as to check the compliance of the actual clinical careflow with the ASL recommendations. The experimental results highlighted the main potentials of the algorithm, such as the opportunity to detect interesting temporal relationships between diagnostic or therapeutic patterns, or to check the adherence of past temporal behaviors to specific expected paths (e.g. guidelines) or to discover new knowledge that could be implicitly hidden in the data.

  12. Identifying the Combinatorial Effects of Histone Modifications by Association Rule Mining in Yeast

    Science.gov (United States)

    Wang, Jiang; Dai, Xianhua; Xiang, Qian; Deng, Yangyang; Feng, Jihua; Dai, Zhiming; He, Caisheng

    2010-01-01

    Eukaryotic genomes are packaged into chromatin by histone proteins whose chemical modification can profoundly influence gene expression. The histone modifications often act in combinations, which exert different effects on gene expression. Although a number of experimental techniques and data analysis methods have been developed to study histone modifications, it is still very difficult to identify the relationships among histone modifications on a genome-wide scale. We proposed a method to identify the combinatorial effects of histone modifications by association rule mining. The method first identified Functional Modification Transactions (FMTs) and then employed association rule mining algorithm and statistics methods to identify histone modification patterns. We applied the proposed methodology to Pokholok et al’s data with eight sets of histone modifications and Kurdistani et al’s data with eleven histone acetylation sites. Our method succeeds in revealing two different global views of histone modification landscapes on two datasets and identifying a number of modification patterns some of which are supported by previous studies. We concentrate on combinatorial effects of histone modifications which significantly affect gene expression. Our method succeeds in identifying known interactions among histone modifications and uncovering many previously unknown patterns. After in-depth analysis of possible mechanism by which histone modification patterns can alter transcriptional states, we infer three possible modification pattern reading mechanism (‘redundant’, ‘trivial’, ‘dominative’). Our results demonstrate several histone modification patterns which show significant correspondence between yeast and human cells. PMID:21037963

  13. Efficient mining of association rules for the early diagnosis of Alzheimer's disease.

    Science.gov (United States)

    Chaves, R; Górriz, J M; Ramírez, J; Illán, I A; Salas-Gonzalez, D; Gómez-Río, M

    2011-09-21

    In this paper, a novel technique based on association rules (ARs) is presented in order to find relations among activated brain areas in single photon emission computed tomography (SPECT) imaging. In this sense, the aim of this work is to discover associations among attributes which characterize the perfusion patterns of normal subjects and to make use of them for the early diagnosis of Alzheimer's disease (AD). Firstly, voxel-as-feature-based activation estimation methods are used to find the tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs serve as input to secondly mine ARs with a minimum support and confidence among activation blocks by using a set of controls. In this context, support and confidence measures are related to the proportion of functional areas which are singularly and mutually activated across the brain. Finally, we perform image classification by comparing the number of ARs verified by each subject under test to a given threshold that depends on the number of previously mined rules. Several classification experiments were carried out in order to evaluate the proposed methods using a SPECT database that consists of 41 controls (NOR) and 56 AD patients labeled by trained physicians. The proposed methods were validated by means of the leave-one-out cross validation strategy, yielding up to 94.87% classification accuracy, thus outperforming recent developed methods for computer aided diagnosis of AD.

  14. Mining Context-Aware Association Rules Using Grammar-Based Genetic Programming.

    Science.gov (United States)

    Luna, Jose Maria; Pechenizkiy, Mykola; Del Jesus, Maria Jose; Ventura, Sebastian

    2017-09-25

    Real-world data usually comprise features whose interpretation depends on some contextual information. Such contextual-sensitive features and patterns are of high interest to be discovered and analyzed in order to obtain the right meaning. This paper formulates the problem of mining context-aware association rules, which refers to the search for associations between itemsets such that the strength of their implication depends on a contextual feature. For the discovery of this type of associations, a model that restricts the search space and includes syntax constraints by means of a grammar-based genetic programming methodology is proposed. Grammars can be considered as a useful way of introducing subjective knowledge to the pattern mining process as they are highly related to the background knowledge of the user. The performance and usefulness of the proposed approach is examined by considering synthetically generated datasets. A posteriori analysis on different domains is also carried out to demonstrate the utility of this kind of associations. For example, in educational domains, it is essential to identify and understand contextual and context-sensitive factors that affect overall and individual student behavior and performance. The results of the experiments suggest that the approach is feasible and it automatically identifies interesting context-aware associations from real-world datasets.

  15. Toxicity prediction from toxicogenomic data based on class association rule mining.

    Science.gov (United States)

    Nagata, Keisuke; Washio, Takashi; Kawahara, Yoshinobu; Unami, Akira

    2014-01-01

    While the recent advent of new technologies in biology such as DNA microarray and next-generation sequencer has given researchers a large volume of data representing genome-wide biological responses, it is not necessarily easy to derive knowledge that is accurate and understandable at the same time. In this study, we applied the Classification Based on Association (CBA) algorithm, one of the class association rule mining techniques, to the TG-GATEs database, where both toxicogenomic and toxicological data of more than 150 compounds in rat and human are stored. We compared the generated classifiers between CBA and linear discriminant analysis (LDA) and showed that CBA is superior to LDA in terms of both predictive performances (accuracy: 83% for CBA vs. 75% for LDA, sensitivity: 82% for CBA vs. 72% for LDA, specificity: 85% for CBA vs. 75% for LDA) and interpretability.

  16. Toxicity prediction from toxicogenomic data based on class association rule mining

    Directory of Open Access Journals (Sweden)

    Keisuke Nagata

    2014-01-01

    Full Text Available While the recent advent of new technologies in biology such as DNA microarray and next-generation sequencer has given researchers a large volume of data representing genome-wide biological responses, it is not necessarily easy to derive knowledge that is accurate and understandable at the same time. In this study, we applied the Classification Based on Association (CBA algorithm, one of the class association rule mining techniques, to the TG-GATEs database, where both toxicogenomic and toxicological data of more than 150 compounds in rat and human are stored. We compared the generated classifiers between CBA and linear discriminant analysis (LDA and showed that CBA is superior to LDA in terms of both predictive performances (accuracy: 83% for CBA vs. 75% for LDA, sensitivity: 82% for CBA vs. 72% for LDA, specificity: 85% for CBA vs. 75% for LDA and interpretability.

  17. Inferring characteristic phenotypes via class association rule mining in the bone dysplasia domain.

    Science.gov (United States)

    Paul, Razan; Groza, Tudor; Hunter, Jane; Zankl, Andreas

    2014-04-01

    Finding, capturing and describing characteristic features represents a key aspect in disorder definition, diagnosis and management. This process is particularly challenging in the case of rare disorders, due to the sparse nature of data and expertise. From a computational perspective, finding characteristic features is associated with some additional major challenges, such as formulating a computationally tractable definition, devising appropriate inference algorithms or defining sound validation mechanisms. In this paper we aim to deal with each of these problems in the context provided by the skeletal dysplasia domain. We propose a clear definition for characteristic phenotypes, we experiment with a novel, class association rule mining algorithm and we discuss our lessons learned from both an automatic and human-based validation of our approach. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. A Stock Trading Recommender System Based on Temporal Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2015-04-01

    Full Text Available Recommender systems capable of discovering patterns in stock price movements and generating stock recommendations based on the patterns thus discovered can significantly supplement the decision-making process of a stock trader. Such recommender systems are of great significance to a layperson who wishes to profit by stock trading even while not possessing the skill or expertise of a seasoned trader. A genetic algorithm optimized Symbolic Aggregate approXimation (SAX–Apriori based stock trading recommender system, which can mine temporal association rules from the stock price data set to generate stock trading recommendations, is presented in this article. The proposed system is validated on 12 different data sets. The results indicate that the proposed system significantly outperforms the passive buy-and-hold strategy, offering scope for a layperson to successfully invest in capital markets.

  19. Leveraging Bibliographic RDF Data for Keyword Prediction with Association Rule Mining (ARM

    Directory of Open Access Journals (Sweden)

    Nidhi Kushwaha

    2014-11-01

    Full Text Available The Semantic Web (Web 3.0 has been proposed as an efficient way to access the increasingly large amounts of data on the internet. The Linked Open Data Cloud project at present is the major effort to implement the concepts of the Seamtic Web, addressing the problems of inhomogeneity and large data volumes. RKBExplorer is one of many repositories implementing Open Data and contains considerable bibliographic information. This paper discusses bibliographic data, an important part of cloud data. Effective searching of bibiographic datasets can be a challenge as many of the papers residing in these databases do not have sufficient or comprehensive keyword information. In these cases however, a search engine based on RKBExplorer is only able to use information to retrieve papers based on author names and title of papers without keywords. In this paper we attempt to address this problem by using the data mining algorithm Association Rule Mining (ARM to develop keywords based on features retrieved from Resource Description Framework (RDF data within a bibliographic citation. We have demonstrate the applicability of this method for predicting missing keywords for bibliographic entries in several typical databases. −−−−− Paper presented at 1st International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2014 March 27-28, 2014. Organized by VIT University, Chennai, India. Sponsored by BRNS.

  20. Ontology in association rules

    National Research Council Canada - National Science Library

    Ferraz, Inhaúma Neves; Garcia, Ana Cristina Bicharra

    2013-01-01

    .... Although most data mining techniques, such as the use of association rules, may substantially reduce the search effort over large data sets, often, the consequential outcomes surpass the amount...

  1. Diagnostic Analysis of Patients with Essential Hypertension Using Association Rule Mining

    Science.gov (United States)

    Shin, A Mi; Lee, In Hee; Lee, Gyeong Ho; Park, Hyung Seop; Yoon, Kyung Il; Lee, Jung Jeung; Kim, Yoon Nyun

    2010-01-01

    Objectives The purpose of this study was to analyze the records of patients diagnosed with essential hypertension using association rule mining (ARM). Methods Patients with essential hypertension (ICD code, I10) were extracted from a hospital's data warehouse and a data mart constructed for analysis. Apriori modeling of the ARM method and web node in the Clementine 12.0 program were used to analyze patient data. Results Patients diagnosed with essential hypertension totaled 5,022 and the diagnostic data extracted from those patients numbered 53,994. As a result of the web node, essential hypertension, non-insulin dependent diabetes mellitus (NIDDM), and cerebral infarction were shown to be associated. Based on the results of ARM, NIDDM (support, 35.15%; confidence, 100%) and cerebral infarction (support, 21.21%; confidence, 100%) were determined to be important diseases associated with essential hypertension. Conclusions Essential hypertension was strongly associated with NIDDM and cerebral infarction. This study demonstrated the practicality of ARM in co-morbidity studies using a large clinic database. PMID:21818427

  2. An association rule mining-based framework for understanding lifestyle risk behaviors.

    Directory of Open Access Journals (Sweden)

    So Hyun Park

    Full Text Available OBJECTIVES: This study investigated the prevalence and patterns of lifestyle risk behaviors in Korean adults. METHODS: We utilized data from the Fourth Korea National Health and Nutrition Examination Survey for 14,833 adults (>20 years of age. We used association rule mining to analyze patterns of lifestyle risk behaviors by characterizing non-adherence to public health recommendations related to the Alameda 7 health behaviors. The study variables were current smoking, heavy drinking, physical inactivity, obesity, inadequate sleep, breakfast skipping, and frequent snacking. RESULTS: Approximately 72% of Korean adults exhibited two or more lifestyle risk behaviors. Among women, current smoking, obesity, and breakfast skipping were associated with inadequate sleep. Among men, breakfast skipping with additional risk behaviors such as physical inactivity, obesity, and inadequate sleep was associated with current smoking. Current smoking with additional risk behaviors such as inadequate sleep or breakfast skipping was associated with physical inactivity. CONCLUSION: Lifestyle risk behaviors are intercorrelated in Korea. Information on patterns of lifestyle risk behaviors could assist in planning interventions targeted at multiple behaviors simultaneously.

  3. Quality prediction modeling for multistage manufacturing based on classification and association rule mining

    Directory of Open Access Journals (Sweden)

    Kao Hung-An

    2017-01-01

    Full Text Available For manufacturing enterprises, product quality is a key factor to assess production capability and increase their core competence. To reduce external failure cost, many research and methodology have been introduced in order to improve process yield rate, such as TQC/TQM, Shewhart CycleDeming's 14 Points, etc. Nowadays, impressive progress has been made in process monitoring and industrial data analysis because of the Industry 4.0 trend. Industries start to utilize quality control (QC methodology to lower inspection overhead and internal failure cost. Currently, the focus of QC is mostly in the inspection of single workstation and final product, however, for multistage manufacturing, many factors (like equipment, operators, parameters, etc. can have cumulative and interactive effects to the final quality. When failure occurs, it is difficult to resume the original settings for cause analysis. To address these problems, this research proposes a combination of principal components analysis (PCA with classification and association rule mining algorithms to extract features representing relationship of multiple workstations, predict final product quality, and analyze the root-cause of product defect. The method is demonstrated on a semiconductor data set.

  4. Business rule mining from spreadsheets

    NARCIS (Netherlands)

    Roy, S.

    2015-01-01

    Business rules represent the knowledge that guides the operations of a business organization. They are implemented in software applications used by organizations, and the activity of extracting them from software is known as business rule mining. It has various purposes amongst which migration and

  5. Fuzzy association rule mining and classification for the prediction of malaria in South Korea.

    Science.gov (United States)

    Buczak, Anna L; Baugher, Benjamin; Guven, Erhan; Ramac-Thomas, Liane C; Elbert, Yevgeniy; Babin, Steven M; Lewis, Sheri H

    2015-06-18

    Malaria is the world's most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality. We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as Low, Medium or High, where these classes are defined as a total of 0-2, 3-16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak. Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7-8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3. A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict Low, Medium or High cases 7-8 weeks in the future. This paper

  6. Mining association rules between stroke risk factors based on the Apriori algorithm.

    Science.gov (United States)

    Li, Qin; Zhang, Yiyan; Kang, Hongyu; Xin, Yi; Shi, Caicheng

    2017-07-20

    Stroke is a frequently-occurring disease and is a severe threat to human health. We aimed to explore the associations between stroke risk factors. Subjects who were aged 40 or above were requested to do surveys with a unified questionnaire as well as laboratory examinations. The Apriori algorithm was applied to find out the meaningful association rules. Selected association rules were divided into 8 groups by the number of former items. The rules with higher confidence degree in every group were viewed as the meaningful rules. The training set used in association analysis consists of a total of 985,325 samples, with 15,835 stroke patients (1.65%) and 941,490 without stroke (98.35%). Based on the threshold we set for the Apriori algorithm, eight meaningful association rules were obtained between stroke and its high risk factors. While between high risk factors, there are 25 meaningful association rules. Based on the Apriori algorithm, meaningful association rules between the high risk factors of stroke were found, proving a feasible way to reduce the risk of stroke with early intervention.

  7. Using GO-WAR for mining cross-ontology weighted association rules.

    Science.gov (United States)

    Agapito, Giuseppe; Cannataro, Mario; Guzzi, Pietro Hiram; Milano, Marianna

    2015-07-01

    The Gene Ontology (GO) is a structured repository of concepts (GO terms) that are associated to one or more gene products. The process of association is referred to as annotation. The relevance and the specificity of both GO terms and annotations are evaluated by a measure defined as information content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of association rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents GO-WAR (Gene Ontology-based Weighted Association Rules) a methodology for extracting weighted association rules. GO-WAR can extract association rules with a high level of IC without loss of support and confidence from a dataset of annotated data. A case study on using of GO-WAR on publicly available GO annotation datasets is used to demonstrate that our method outperforms current state of the art approaches. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. [Exploring the association rules of clinical application of shenmai injection through text mining].

    Science.gov (United States)

    Zhang, Lin-Lin; Guo, Hong-Tao; Zheng, Guang; Liu, Li-Mei; Song, Zhi-Qian; Lu, Ai-Ping; Liu, Zhen-Li

    2013-07-01

    To explore the rules of clinical application of Shenmai Injection (SI). The data sets of SI were downloaded from CBM database by the method of literature retrieved from Jan. 1980 to May 2012. Rules of Chinese medical patterns, diseases, symptoms, Chinese patent medicines (CPM), and Western medicine (WM) were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two-dimension based network. Totally 3 159 literature were recruited. Results showed that SI was most frequently correlated with stasis syndrome and deficiency syndrome. Heart failure, arrhythmia, myocarditis, myocardial infarction, and shock were core diseases treated by SI. Symptoms such as angina pectoris, fatigue, chest tightness/pain were mainly relieved by SI. For CPM, SI was most commonly used with Compound Danshen Injection, Astragalus Injection, and so on. As for WM, SI was most commonly used with nitroglycerin, fructose, captopril, and so on. The syndrome types and mining results of SI were the same with its instructions. Stasis syndrome was the potential Chinese medical pattern of SI. Heart failure, arrhythmia, and myocardial infarction were potential diseases treated by SI. For CPM, SI was most commonly used with Danshen Injection, Compound Danshen Injection, and so on. And for WM, SI was most commonly used with nitroglycerin, fructose, captopril, and so on.

  9. Business Rule Mining from Spreadsheets

    OpenAIRE

    Roy, Sohon

    2015-01-01

    Business rules represent the knowledge that guides the operations of a business organization. They are implemented in software applications used by organizations, and the activity of extracting them from software is known as business rule mining. It has various purposes amongst which migration and generating documentation are the most common. However, apart from conventional software, organizations also use spreadsheets for a large part of their operations and decision-making activities. Ther...

  10. Ontology in association rules.

    Science.gov (United States)

    Ferraz, Inhaúma Neves; Garcia, Ana Cristina Bicharra

    2013-01-01

    Data mining has emerged to address the problem of transforming data into useful knowledge. Although most data mining techniques, such as the use of association rules, may substantially reduce the search effort over large data sets, often, the consequential outcomes surpass the amount of information humanly manageable. On the other hand, important association rules may be overlooked owing to the setting of the support threshold, which is a very subjective metric, but rooted in most data mining techniques. This paper presents a study on the effects, in terms of precision and recall, of using a data preparation technique, called SemPrune, which is built on domain ontology. SemPrune is intended for pre- and post-processing phases of data mining. Identifying generalization/specialization relations, as well as composition/decomposition relations, is the key to successfully applying SemPrune.

  11. GenMiner: mining non-redundant association rules from integrated gene expression data and annotations.

    Science.gov (United States)

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2008-11-15

    GenMiner is an implementation of association rule discovery dedicated to the analysis of genomic data. It allows the analysis of datasets integrating multiple sources of biological data represented as both discrete values, such as gene annotations, and continuous values, such as gene expression measures. GenMiner implements the new NorDi (normal discretization) algorithm for normalizing and discretizing continuous values and takes advantage of the Close algorithm to efficiently generate minimal non-redundant association rules. Experiments show that execution time and memory usage of GenMiner are significantly smaller than those of the standard Apriori-based approach, as well as the number of extracted association rules. The GenMiner software and supplementary materials are available at http://bioinfo.unice.fr/publications/genminer_article/ and http://keia.i3s.unice.fr/?Implementations:GenMiner Supplementary data are available at Bioinformatics online.

  12. A PROPOSAL OF FUZZY MULTIDIMENSIONAL ASSOCIATION RULES

    OpenAIRE

    Rolly Intan

    2006-01-01

    Association rules that involve two or more dimensions or predicates can be referred as multidimensional association rules. Rather than searching for frequent itemsets (as is done in mining single-dimensional association rules), in multidimensional association rules, we search for frequent predicate sets. In general, there are two types of multidimensional association rules, namely interdimension association rules and hybrid-dimension association rules. Interdimension association rules are mul...

  13. A Proposal of Fuzzy Multidimensional Association Rules

    OpenAIRE

    Intan, Rolly

    2006-01-01

    Association rules that involve two or more dimensions or predicates can be referred as multidimensional association rules. Rather than searching for frequent itemsets (as is done in mining single-dimensional association rules), in multidimensional association rules, we search for frequent predicate sets. In general, there are two types of multidimensional association rules, namely interdimension association rules and hybrid-dimension association rules. Interdimension association rules are mul...

  14. Study on the networks of "Nature-Family-Component" of Chinese medicinal herbs based on association rules mining.

    Science.gov (United States)

    Fu, Xian-jun; Wang, Zhen-guo; Qu, Yi; Wang, Peng; Zhou, Yang; Yu, Hua-yun

    2013-09-01

    To explore appropriate methods for the research of the theory of Chinese medicine nature property and find the relationship between Nature-Family-Component of Chinese herbs. From perspective of systems biology, we used Associate Network to identify useful relationships among "Nature-Family-Component" of Herbs. In this work, Associate Network combines association rules mining method and network construction method to evaluate the complicate relationship among "Nature-Family-Component" of herbs screened. The results of association rules mining showed that the families had a close relationship with nature properties of herbs. For example, the families of Magnoliaceae, Araceae had a close relationship with hot nature with confidence of 100%, the families of Cucurbitaceae has a close relationship to cold nature with confidence of 90.91%. Moreover, the results of constructed Associate Network implied that herbs belonging to the same families generally had the same natures. In addition, some herbs belonging to different families may also have same natures when they contain the same main components. These results implied that the main components of herbs might affect their natures; the relationships between families and natures were based on the main compounds of herbs.

  15. Mining Research on Vibration Signal Association Rules of Quayside Container Crane Hoisting Motor Based on Apriori Algorithm

    Science.gov (United States)

    Yang, Chencheng; Tang, Gang; Hu, Xiong

    2017-07-01

    Shore-hoisting motor in the daily work will produce a large number of vibration signal data,in order to analyze the correlation among the data and discover the fault and potential safety hazard of the motor, the data are discretized first, and then Apriori algorithm are used to mine the strong association rules among the data. The results show that the relationship between day 1 and day 16 is the most closely related, which can guide the staff to analyze the work of these two days of motor to find and solve the problem of fault and safety.

  16. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems.

    Science.gov (United States)

    Wang, Chao; Guo, Xiao-Jing; Xu, Jin-Fang; Wu, Cheng; Sun, Ya-Lin; Ye, Xiao-Fei; Qian, Wei; Ma, Xiu-Qiang; Du, Wen-Min; He, Jia

    2012-01-01

    The detection of signals of adverse drug events (ADEs) has increased because of the use of data mining algorithms in spontaneous reporting systems (SRSs). However, different data mining algorithms have different traits and conditions for application. The objective of our study was to explore the application of association rule (AR) mining in ADE signal detection and to compare its performance with that of other algorithms. Monte Carlo simulation was applied to generate drug-ADE reports randomly according to the characteristics of SRS datasets. Thousand simulated datasets were mined by AR and other algorithms. On average, 108,337 reports were generated by the Monte Carlo simulation. Based on the predefined criterion that 10% of the drug-ADE combinations were true signals, with RR equaling to 10, 4.9, 1.5, and 1.2, AR detected, on average, 284 suspected associations with a minimum support of 3 and a minimum lift of 1.2. The area under the receiver operating characteristic (ROC) curve of the AR was 0.788, which was equivalent to that shown for other algorithms. Additionally, AR was applied to reports submitted to the Shanghai SRS in 2009. Five hundred seventy combinations were detected using AR from 24,297 SRS reports, and they were compared with recognized ADEs identified by clinical experts and various other sources. AR appears to be an effective method for ADE signal detection, both in simulated and real SRS datasets. The limitations of this method exposed in our study, i.e., a non-uniform thresholds setting and redundant rules, require further research.

  17. Mining Association Rules for Neurobehavioral and Motor Disorders in Children Diagnosed with Cerebral Palsy.

    Science.gov (United States)

    Cheng, Chihwen; Burns, T G; Wang, May D

    2013-09-01

    Children diagnosed with cerebral palsy (CP) appear to be at high risk for developing neurobehavioral and motor disorders. The most common disorders for these children are impaired visual-perception skills and motor planning. Besides, they often have impaired executive functions, which can contribute to problematic emotional adjustment such as depression. Additionally, literature suggests that the tendency to develop these cognitive impairments and emotional abnormalities in pediatric CP is influenced by age and IQ. Because there are many other medical co-morbidities that can occur with CP (e.g., seizures and shunt placement), prediction of what percentages of patients will incur cognitive impairment and emotional abnormality is a difficult task. The purpose of this study was to investigate the associations between possible factors mentioned above, and neurobehavioral and motor disorders from a clinical database of pediatric subjects diagnosed with CP. The study resulted in 22 rules that can predict negative outcomes. These rules reinforced the growing body of literature supporting a link between CP, executive dysfunction, and subsequent neurobehavioral problems. The antecedents and consequents of some association rules were single factors, while other statistical associations were interactions of factor combinations. Further research is needed to include children's comprehensive treatment and medication history in order to determine additional impacts on their neurobehavioral and motor disorders.

  18. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining.

    Directory of Open Access Journals (Sweden)

    Imane Boudellioua

    Full Text Available The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.

  19. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

    KAUST Repository

    Boudellioua, Imene

    2016-07-08

    The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.

  20. Characteristics of cyclist crashes in Italy using latent class analysis and association rule mining.

    Directory of Open Access Journals (Sweden)

    Gabriele Prati

    Full Text Available The factors associated with severity of the bicycle crashes may differ across different bicycle crash patterns. Therefore, it is important to identify distinct bicycle crash patterns with homogeneous attributes. The current study aimed at identifying subgroups of bicycle crashes in Italy and analyzing separately the different bicycle crash types. The present study focused on bicycle crashes that occurred in Italy during the period between 2011 and 2013. We analyzed categorical indicators corresponding to the characteristics of infrastructure (road type, road signage, and location type, road user (i.e., opponent vehicle and cyclist's maneuver, type of collision, age and gender of the cyclist, vehicle (type of opponent vehicle, and the environmental and time period variables (time of the day, day of the week, season, pavement condition, and weather. To identify homogenous subgroups of bicycle crashes, we used latent class analysis. Using latent class analysis, the bicycle crash data set was segmented into 19 classes, which represents 19 different bicycle crash types. Logistic regression analysis was used to identify the association between class membership and severity of the bicycle crashes. Finally, association rules were conducted for each of the latent classes to uncover the factors associated with an increased likelihood of severity. Association rules highlighted different crash characteristics associated with an increased likelihood of severity for each of the 19 bicycle crash types.

  1. In-Depth Analysis of Energy Efficiency Related Factors in Commercial Buildings Using Data Cube and Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Byeongjoon Noh

    2017-11-01

    Full Text Available Significant amounts of energy are consumed in the commercial building sector, resulting in various adverse environmental issues. To reduce energy consumption and improve energy efficiency in commercial buildings, it is necessary to develop effective methods for analyzing building energy use. In this study, we propose a data cube model combined with association rule mining for more flexible and detailed analysis of building energy consumption profiles using the Commercial Buildings Energy Consumption Survey (CBECS dataset, which has accumulated over 6700 existing commercial buildings across the U.S.A. Based on the data cube model, a multidimensional commercial sector building energy analysis was performed based upon on-line analytical processing (OLAP operations to assess the energy efficiency according to building factors with various levels of abstraction. Furthermore, the proposed analysis system provided useful information that represented a set of energy efficient combinations by applying the association rule mining method. We validated the feasibility and applicability of the proposed analysis model by structuring a building energy analysis system and applying it to different building types, weather conditions, composite materials, and heating/cooling systems of the multitude of commercial buildings classified in the CBECS dataset.

  2. Evolving Temporal Association Rules with Genetic Algorithms

    Science.gov (United States)

    Matthews, Stephen G.; Gongora, Mario A.; Hopgood, Adrian A.

    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty.

  3. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  4. A PROPOSAL OF FUZZY MULTIDIMENSIONAL ASSOCIATION RULES

    Directory of Open Access Journals (Sweden)

    Rolly Intan

    2006-01-01

    Full Text Available Association rules that involve two or more dimensions or predicates can be referred as multidimensional association rules. Rather than searching for frequent itemsets (as is done in mining single-dimensional association rules, in multidimensional association rules, we search for frequent predicate sets. In general, there are two types of multidimensional association rules, namely interdimension association rules and hybrid-dimension association rules. Interdimension association rules are multidimensional association rules with no repeated predicates. This paper introduces a method for generating interdimension association rules. A more meaningful association rules can be provided by generalizing crisp value of attributes to be fuzzy value. To generate the multidimensional association rules implying fuzzy value, this paper introduces an alternative method for mining the rules by searching for the predicate sets.

  5. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    Science.gov (United States)

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  6. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Amir Hossein Azadnia

    2013-01-01

    Full Text Available One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  7. Applying negative rule mining to improve genome annotation

    Directory of Open Access Journals (Sweden)

    Frishman Goar

    2007-07-01

    Full Text Available Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.

  8. Scalable Association Rule Mining with Predicates on Semantic Representations of Data

    Energy Technology Data Exchange (ETDEWEB)

    Tsay, Li-Shiang [ORNL; Sukumar, Sreenivas R [ORNL; Roberts, Larry W [ORNL

    2015-01-01

    Finding semantic associations from a vast amount of heterogeneous data is an important and useful task in various applications. We present a framework to extract semantic association patterns directly from a very large graph dataset without the extra step of converting graph data into transaction data.

  9. Application of Fuzzy Association Rule Mining for Analysing Students Academic Performance

    OpenAIRE

    Olufunke O. Oladipupo; Olanrewaju. J. Oyelade; Dada. O. Aborisade

    2012-01-01

    This study examines the relationship between students preadmission academic profile and academic performance. Data sample of students in the Department of Computer Science in one of Nigeria private Universities was used. The preadmission academic profile considered includes 'O' level grades, University Matriculation Examination (UME) scores, and Post-UME scores. The academic performance is defined using students Grade Point Average (GPA) at the end of a particular session. Fuzzy Association R...

  10. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  11. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis.

    Science.gov (United States)

    Gardiner, Eleanor J; Gillet, Valerie J

    2015-09-28

    Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development

  12. Target-Based Maintenance of Privacy Preserving Association Rules

    Science.gov (United States)

    Ahluwalia, Madhu V.

    2011-01-01

    In the context of association rule mining, the state-of-the-art in privacy preserving data mining provides solutions for categorical and Boolean association rules but not for quantitative association rules. This research fills this gap by describing a method based on discrete wavelet transform (DWT) to protect input data privacy while preserving…

  13. Class Association Rule Pada Metode Associative Classification

    Directory of Open Access Journals (Sweden)

    Eka Karyawati

    2011-11-01

    Full Text Available Frequent patterns (itemsets discovery is an important problem in associative classification rule mining.  Differents approaches have been proposed such as the Apriori-like, Frequent Pattern (FP-growth, and Transaction Data Location (Tid-list Intersection algorithm. This paper focuses on surveying and comparing the state of the art associative classification techniques with regards to the rule generation phase of associative classification algorithms.  This phase includes frequent itemsets discovery and rules mining/extracting methods to generate the set of class association rules (CARs.  There are some techniques proposed to improve the rule generation method.  A technique by utilizing the concepts of discriminative power of itemsets can reduce the size of frequent itemset.  It can prune the useless frequent itemsets. The closed frequent itemset concept can be utilized to compress the rules to be compact rules.  This technique may reduce the size of generated rules.  Other technique is in determining the support threshold value of the itemset. Specifying not single but multiple support threshold values with regard to the class label frequencies can give more appropriate support threshold value.  This technique may generate more accurate rules. Alternative technique to generate rule is utilizing the vertical layout to represent dataset.  This method is very effective because it only needs one scan over dataset, compare with other techniques that need multiple scan over dataset.   However, one problem with these approaches is that the initial set of tid-lists may be too large to fit into main memory. It requires more sophisticated techniques to compress the tid-lists.

  14. Analyzing microarray data using quantitative association rules.

    Science.gov (United States)

    Georgii, Elisabeth; Richter, Lothar; Rückert, Ulrich; Kramer, Stefan

    2005-09-01

    We tackle the problem of finding regularities in microarray data. Various data mining tools, such as clustering, classification, Bayesian networks and association rules, have been applied so far to gain insight into gene-expression data. Association rule mining techniques used so far work on discretizations of the data and cannot account for cumulative effects. In this paper, we investigate the use of quantitative association rules that can operate directly on numeric data and represent cumulative effects of variables. Technically speaking, this type of quantitative association rules based on half-spaces can find non-axis-parallel regularities. We performed a variety of experiments testing the utility of quantitative association rules for microarray data. First of all, the results should be statistically significant and robust against fluctuations in the data. Next, the approach should be scalable in the number of variables, which is important for such high-dimensional data. Finally, the rules should make sense biologically and be sufficiently different from rules found in regular association rule mining working with discretizations. In all of these dimensions, the proposed approach performed satisfactorily. Therefore, quantitative association rules based on half-spaces should be considered as a tool for the analysis of microarray gene-expression data. The code is available from the authors on request.

  15. Secure association rule sharing

    OpenAIRE

    Oliveira,Stanley R. de M.; Zaïane, Osmar R.; Saygın, Yücel; Saygin, Yucel

    2004-01-01

    The sharing of association rules is often beneficial in industry, but requires privacy safeguards. One may decide to disclose only part of the knowledge and conceal strategic patterns which we call restrictive rules. These restrictive rules must be protected before sharing since they are paramount for strategic decisions and need to remain private. To address this challenging problem, we propose a unified framework for protecting sensitive knowledge before sharing. This framework encompasses:...

  16. Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways

    KAUST Repository

    Saidi, Rabie

    2017-08-28

    It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.

  17. Using association rules mining to explore pattern of Chinese medicinal formulae (prescription) in treating and preventing breast cancer recurrence and metastasis.

    Science.gov (United States)

    He, Yanhua; Zheng, Xiao; Sit, Cindy; Loo, Wings T Y; Wang, ZhiYu; Xie, Ting; Jia, Bo; Ye, Qiaobo; Tsui, Kamchuen; Chow, Louis W C; Chen, Jianping

    2012-09-19

    Chinese herbal medicine is increasingly widely used as a complementary approach for control of breast cancer recurrence and metastasis. In this paper, we examined the implicit prescription patterns behind the Chinese medicinal formulae, so as to explore the Chinese medicinal compatibility patterns or rules in the treatment or control of breast cancer recurrence and metastasis. This study was based on the herbs recorded in Pharmacopoeia of the People's Republic of China, and the literature sources from Chinese Journal Net and China Master Dissertations Full-text Database (1990 - 2010) to analyze the compatibility rule of the prescription. Each Chinese herb was listed according to the selected medicinal formulae and the added information was organized to establish a database. The frequency and the association rules of the prescription patterns were analyzed using the SPSS Clenmentine Data Mining System. An initial statistical analysis was carried out to categorize the herbs according to their medicinal types and dosage, natures, flavors, channel tropism, and functions. Based on the categorization, the frequencies of occurrence were computed. The main prescriptive features from the selected formulae of the mining data are: (1) warm or cold herbs in the Five Properties category; sweet or bitter herbs in the Five Flavors category and with affinity to the liver meridian are the most frequently prescribed in the 96 medicinal formulae; (2) herbs with tonifying and replenishing, blood-activating and stasis-resolving, spleen-strengthening and dampness-resolving or heat-clearing and detoxicating functions that are frequently prescribed; (3) herbs with blood-tonifying, yin-tonifying, spleen-strengthening and dampness-resolving, heat-clearing and detoxicating, and blood-activating with stasis-resolving functions that are interrelated and prescribed in combination with qi-tonifying herbs. The results indicate that there is a close relationship between recurrence and metastasis

  18. Evaluation of rational nonsteroidal anti-inflammatory drugs and gastro-protective agents use; association rule data mining using outpatient prescription patterns.

    Science.gov (United States)

    Pattanaprateep, Oraluck; McEvoy, Mark; Attia, John; Thakkinstian, Ammarin

    2017-07-04

    Nonsteroidal anti-inflammatory drugs (NSAIDs) and gastro-protective agents should be co-prescribed following a standard clinical practice guideline; however, adherence to this guideline in routine practice is unknown. This study applied an association rule model (ARM) to estimate rational NSAIDs and gastro-protective agents use in an outpatient prescriptions dataset. A database of hospital outpatients from October 1st, 2013 to September 30th, 2015 was searched for any of following drugs: oral antacids (A02A), peptic ulcer and gastro-oesophageal reflux disease drugs (GORD, A02B), and anti-inflammatory and anti-rheumatic products, non-steroids or NSAIDs (M01A). Data including patient demographics, diagnoses, and drug utilization were also retrieved. An association rule model was used to analyze co-prescription of the same drug class (i.e., prescriptions within A02A-A02B, M01A) and between drug classes (A02A-A02B & M01A) using the Apriori algorithm in R. The lift value, was calculated by a ratio of confidence to expected confidence, which gave information about the association between drugs in the prescription. We identified a total of 404,273 patients with 2,575,331 outpatient visits in 2 fiscal years. Mean age was 48 years and 34% were male. Among A02A, A02B and M01A drug classes, 12 rules of associations were discovered with support and confidence thresholds of 1% and 50%. The highest lift was between Omeprazole and Ranitidine (340 visits); about one-third of these visits (118) were prescriptions to non-GORD patients, contrary to guidelines. Another finding was the concomitant use of COX-2 inhibitors (Etoricoxib or Celecoxib) and PPIs. 35.6% of these were for patients aged less than 60 years with no GI complication and no Aspirin, inconsistent with guidelines. Around one-third of occasions where these medications were co-prescribed were inconsistent with guidelines. With the rapid growth of health datasets, data mining methods may help assess quality of care and

  19. New game - new rules: mining in the democratic South Africa

    Energy Technology Data Exchange (ETDEWEB)

    Motlatsi, J. [National Union of Mineworkers (South Africa)

    1995-12-31

    Discusses the eight areas identified by the South African Union of Mineworkers as requiring new rules to improve safety and conditions in the South African mining industry. The areas are: improved health and safety; the elimination of racism; fair wages; decent living conditions; proper training; care for workers and areas affected by the downscaling of mining; development of an economically viable mining sector; and a mining sector run on a humane and participatory manner.

  20. Assessing Lightning and Wildfire Hazard by Land Properties and Cloud to Ground Lightning Data with Association Rule Mining in Alberta, Canada

    Directory of Open Access Journals (Sweden)

    DongHwan Cha

    2017-10-01

    Full Text Available Hotspot analysis was implemented to find regions in the province of Alberta (Canada with high frequency Cloud to Ground (CG lightning strikes clustered together. Generally, hotspot regions are located in the central, central east, and south central regions of the study region. About 94% of annual lightning occurred during warm months (June to August and the daily lightning frequency was influenced by the diurnal heating cycle. The association rule mining technique was used to investigate frequent CG lightning patterns, which were verified by similarity measurement to check the patterns’ consistency. The similarity coefficient values indicated that there were high correlations throughout the entire study period. Most wildfires (about 93% in Alberta occurred in forests, wetland forests, and wetland shrub areas. It was also found that lightning and wildfires occur in two distinct areas: frequent wildfire regions with a high frequency of lightning, and frequent wild-fire regions with a low frequency of lightning. Further, the preference index (PI revealed locations where the wildfires occurred more frequently than in other class regions. The wildfire hazard area was estimated with the CG lightning hazard map and specific land use types.

  1. Assessing Lightning and Wildfire Hazard by Land Properties and Cloud to Ground Lightning Data with Association Rule Mining in Alberta, Canada.

    Science.gov (United States)

    Cha, DongHwan; Wang, Xin; Kim, Jeong Woo

    2017-10-23

    Hotspot analysis was implemented to find regions in the province of Alberta (Canada) with high frequency Cloud to Ground (CG) lightning strikes clustered together. Generally, hotspot regions are located in the central, central east, and south central regions of the study region. About 94% of annual lightning occurred during warm months (June to August) and the daily lightning frequency was influenced by the diurnal heating cycle. The association rule mining technique was used to investigate frequent CG lightning patterns, which were verified by similarity measurement to check the patterns' consistency. The similarity coefficient values indicated that there were high correlations throughout the entire study period. Most wildfires (about 93%) in Alberta occurred in forests, wetland forests, and wetland shrub areas. It was also found that lightning and wildfires occur in two distinct areas: frequent wildfire regions with a high frequency of lightning, and frequent wild-fire regions with a low frequency of lightning. Further, the preference index (PI) revealed locations where the wildfires occurred more frequently than in other class regions. The wildfire hazard area was estimated with the CG lightning hazard map and specific land use types.

  2. Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    KIZILOLUK, S.

    2015-11-01

    Full Text Available In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA, is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets.

  3. Observational Calculi and Association Rules

    CERN Document Server

    Rauch, Jan

    2013-01-01

    Observational calculi were introduced in the 1960’s as a tool of logic of discovery. Formulas of observational calculi correspond to assertions on analysed data. Truthfulness of suitable assertions can lead to acceptance of new scientific hypotheses. The general goal was to automate the process of discovery of scientific knowledge using mathematical logic and statistics. The GUHA method for producing true formulas of observational calculi relevant to the given problem of scientific discovery was developed. Theoretically interesting and practically important results on observational calculi were achieved. Special attention was paid to formulas - couples of Boolean attributes derived from columns of the analysed data matrix. Association rules introduced in the 1990’s can be seen as a special case of such formulas. New results on logical calculi and association rules were achieved. They can be seen as a logic of association rules. This can contribute to solving contemporary challenging problems of data minin...

  4. An improved predictive association rule based classifier using gain ...

    Indian Academy of Sciences (India)

    Health care data diagnosis is a significant task that needs to be executed precisely, which requires much experience and domain-knowledge. Traditional symptoms-based disease diagnosis may perhaps lead to false presumptions. In recent times, Associative Classification (AC), the combination of association rule mining ...

  5. Exploring Consumer Behavior: Use of Association Rules

    Directory of Open Access Journals (Sweden)

    Pavel Turčínek

    2015-01-01

    Full Text Available This paper focuses on problematic of use of association rules in exploring consumer behavior and presents selected results of applied data analyses on data collected via questionnaire survey on a sample of 1127 Czech respondents with structure close to representative sample of population the Czech Republic. The questionnaire survey deals with problematic of shopping for meat products. The objective was to explore possibilities of less frequently used data-mining techniques in processing of customer preference. For the data analyses, two methods for generating association rules are used: Apriori algorithm and FP-grow algorithm. Both of them were executed in Weka software. The Apriori algorithm seemed to be a better tool, because it has provided finer data, due to the fact that FP-growth algorithm needed reduction of preference scale to only two extreme values, because the input data must be binary. For consumer preferences we also calculated their means. This paper explores the different preferences and expectations of what customers’ favorite outlet should provide, and offer. Customers based on the type of their outlet loyalty were divided into five segments and further explored in more detail. Some of the found best association rules suggest similar patterns across the whole sample, e.g. the results suggest that the respondents for whom a quality of merchandise is a very important factor typically also base their outlet selection on freshness of products. This finding applies to all types of retail loyalty categores. Other rules seem to indicate a behavior more specific for a particular segment of customers. The results suggest that application of association rules in customer research can provide more insight and can be a good supplementary analysis for consumer data exploration when Likert scales were used.

  6. Soil quality assessment using weighted fuzzy association rules

    Science.gov (United States)

    Xue, Yue-Ju; Liu, Shu-Guang; Hu, Yue-Ming; Yang, Jing-Feng

    2010-01-01

    Fuzzy association rules (FARs) can be powerful in assessing regional soil quality, a critical step prior to land planning and utilization; however, traditional FARs mined from soil quality database, ignoring the importance variability of the rules, can be redundant and far from optimal. In this study, we developed a method applying different weights to traditional FARs to improve accuracy of soil quality assessment. After the FARs for soil quality assessment were mined, redundant rules were eliminated according to whether the rules were significant or not in reducing the complexity of the soil quality assessment models and in improving the comprehensibility of FARs. The global weights, each representing the importance of a FAR in soil quality assessment, were then introduced and refined using a gradient descent optimization method. This method was applied to the assessment of soil resources conditions in Guangdong Province, China. The new approach had an accuracy of 87%, when 15 rules were mined, as compared with 76% from the traditional approach. The accuracy increased to 96% when 32 rules were mined, in contrast to 88% from the traditional approach. These results demonstrated an improved comprehensibility of FARs and a high accuracy of the proposed method.

  7. [Mining rules on determination of four properties based on traditional Chinese medicine functional combination].

    Science.gov (United States)

    Yang, Xue-Mei; Lin, Duan-Yi; Lai, Xin-Mei; Chen, Mei-Mei; Huang, Lu-Qi

    2013-05-01

    It laid the foundation of the large sample data mining for a comprehensive summary concerning four properties theory of traditional Chinese medicine (TCM), and also provided theory clues on determination of four properties for the new resource development of TCM and the clinical use of Chinese medicine. Four properties data of 8 980 Chinese medicines from "Chinese herbal medicine (CHM)" and associated function index data were chose as data sets. Then, the IBM SPSS Clementine 14.1 data mining platform and Apriori model were adopted to mining classification-association rules, setting the minimum support threshold of rule antecedent and the minimum confidence threshold as 0.5% and 60%. 11 classification-association rules involved in warm, cold and mild natures were found. It was discovered that the TCM with functions of "dispelling wind-cold, warming the middle, stopping pain and expelling wind-removing dampness, tonifying kidney yang, unblocking meridians and expelling wind-removing dampness, dispelling cold to stop pain, strengthening sinews-bones and expelling wind-removing dampness" was likely warm-natured, with functions of "tonifying the lung" was likely mild-natured, and with functions of " clearing heat and drying dampness, clearing heat and purging fire, eliminating restlessness" was likely cold-natured.

  8. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  9. Subcellular localization prediction through boosting association rules.

    Science.gov (United States)

    Yoon, Yongwook; Lee, Gary Geunbae

    2012-01-01

    Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short $k$-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark datasets show our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the $k$-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.

  10. Using Machine Learning Methods Jointly to Find Better Set of Rules in Data Mining

    Directory of Open Access Journals (Sweden)

    SUG Hyontai

    2017-01-01

    Full Text Available Rough set-based data mining algorithms are one of widely accepted machine learning technologies because of their strong mathematical background and capability of finding optimal rules based on given data sets only without room for prejudiced views to be inserted on the data. But, because the algorithms find rules very precisely, we may confront with the overfitting problem. On the other hand, association rule algorithms find rules of association, where the association resides between sets of items in database. The algorithms find itemsets that occur more than given minimum support, so that they can find the itemsets practically in reasonable time even for very large databases by supplying the minimum support appropriately. In order to overcome the problem of the overfitting problem in rough set-based algorithms, first we find large itemsets, after that we select attributes that cover the large itemsets. By using the selected attributes only, we may find better set of rules based on rough set theory. Results from experiments support our suggested method.

  11. Significant cancer prevention factor extraction: an association rule discovery approach.

    Science.gov (United States)

    Nahar, Jesmin; Tickle, Kevin S; Ali, A B M Shawkat; Chen, Yi-Ping Phoebe

    2011-06-01

    Cancer is increasing the total number of unexpected deaths around the world. Until now, cancer research could not significantly contribute to a proper solution for the cancer patient, and as a result, the high death rate is uncontrolled. The present research aim is to extract the significant prevention factors for particular types of cancer. To find out the prevention factors, we first constructed a prevention factor data set with an extensive literature review on bladder, breast, cervical, lung, prostate and skin cancer. We subsequently employed three association rule mining algorithms, Apriori, Predictive apriori and Tertius algorithms in order to discover most of the significant prevention factors against these specific types of cancer. Experimental results illustrate that Apriori is the most useful association rule-mining algorithm to be used in the discovery of prevention factors.

  12. Association rule interestingness: measure and statistical validation

    OpenAIRE

    Lallich, Stéphane; Teytaud, Olivier; Prudhomme, Elie

    2006-01-01

    The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the selected support and condence thresholds can easily be extracted by algorithms based on support and condence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. One has to resolve a two-tier problem: choosing the measures best suited to the problem at hand, then validating the interesting rules agai...

  13. Discovering market basket patterns using hierarchical association rules

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2015-10-01

    Full Text Available Association rules are a data mining method for discovering patterns of frequent item sets, such as products in a store that are frequently purchased at the same time by a customer (market basket analysis. A number of interestingness measures for association rules have been developed to date, but research has shown that there a dominant measure does not exist. Authors have mostly used objective measures, whereas subjective measures have rarely been investigated. This paper aims to combine objective measures such as support, confidence and lift with a subjective approach based on human expert selection in order to extract interesting rules from a real dataset collected from a large Croatian retail chain. Hierarchical association rules were used to enhance the efficiency of the extraction rule. The results show that rules that are more interesting were extracted using the hierarchical method, and that a hybrid approach of combining objective and subjective measures succeeds in extracting certain unexpected and actionable rules. The research can be useful for retail and marketing managers in planning marketing strategies, as well as for researchers investigating this field.

  14. Rules of meridians and acupoints selection in treatment of Parkinson's disease based on data mining techniques.

    Science.gov (United States)

    Li, Zhe; Hu, Ying-Yu; Zheng, Chun-Ye; Su, Qiao-Zhen; An, Chang; Luo, Xiao-Dong; Liu, Mao-Cai

    2018-01-15

    To help selecting appropriate meridians and acupoints in clinical practice and experimental study for Parkinson's disease (PD), the rules of meridians and acupoints selection of acupuncture and moxibustion were analyzed in domestic and foreign clinical treatment for PD based on data mining techniques. Literature about PD treated by acupuncture and moxibustion in China and abroad was searched and selected from China National Knowledge Infrastructure and MEDLINE. Then the data from all eligible articles were extracted to establish the database of acupuncture-moxibustion for PD. The association rules of data mining techniques were used to analyze the rules of meridians and acupoints selection. Totally, 168 eligible articles were included and 184 acupoints were applied. The total frequency of acupoints application was 1,090 times. Those acupoints were mainly distributed in head and neck and extremities. Among all, Taichong (LR 3), Baihui (DU 20), Fengchi (GB 20), Hegu (LI 4) and Chorea-tremor Controlled Zone were the top five acupoints that had been used. Superior-inferior acupoints matching was utilized the most. As to involved meridians, Du Meridian, Dan (Gallbladder) Meridian, Dachang (Large Intestine) Meridian, and Gan (Liver) Meridian were the most popular meridians. The application of meridians and acupoints for PD treatment lay emphasis on the acupoints on the head, attach importance to extinguishing Gan wind, tonifying qi and blood, and nourishing sinews, and make good use of superior-inferior acupoints matching.

  15. Mining Staff Assignment Rules from Event-Based Data

    NARCIS (Netherlands)

    Ly, L.T.; Rinderle, S.B.; Dadam, P.; Reichert, M.U.

    2006-01-01

    Process mining offers methods and techniques for capturing process behaviour from log data of past process executions. Although many promising approaches on mining the control flow have been published, no attempt has been made to mine the staff assignment situation of business processes. In this

  16. Sequential association rules in atonal music

    NARCIS (Netherlands)

    Honingh, A.; Weyde, T.; Conklin, D.; Chew, E.; Childs, A.; Chuan, C.-H.

    2009-01-01

    This paper describes a preliminary study on the structure of atonal music. In the same way as sequential association rules of chords can be found in tonal music, sequential association rules of pitch class set categories can be found in atonal music. It has been noted before that certain pitch class

  17. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    Science.gov (United States)

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  18. Dynamic association rules for gene expression data analysis.

    Science.gov (United States)

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed

  19. [Exploring application of data mining technology in researching composition rules of Tibetan medical formulas].

    Science.gov (United States)

    Cairang, Nanjia; Renzeng, Duojie; Duojie, Cairang; Luosang, Dongzhi; Li, Xianjia

    2012-08-01

    There are thousands of medicinal formulas in the ancient Tibetan medical texts. Researching the composition rules of Tibetan medical formulas is a very important step in the study and practice of Tibetan medicine. In order to explore the composition rules of Tibetan medical formulas this article draws on the research methods utilized in related fields of traditional Chinese medicine adapted to the unique characteristics of Tibetan medicine. This is the first time the utilization of data mining methods has been proposed for the research of Tibetan medical formulas. It is believed that data mining techniques can aid researchers in discovering the composition rules of Tibetan medical formulas in accordance with Tibetan medical theory.

  20. On construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-01-01

    This paper is devoted to the study of approximate algorithms for minimization of partial association rule length. It is shown that under some natural assumptions on the class NP, a greedy algorithm is close to the best polynomial approximate algorithms for solving of this NP-hard problem. The paper contains various bounds on precision of the greedy algorithm, bounds on minimal length of rules based on an information obtained during greedy algorithm work, and results of the study of association rules for the most part of binary information systems. © 2009 Springer Berlin Heidelberg.

  1. Efficient learning of microbial genotype-phenotype association rules.

    Science.gov (United States)

    MacDonald, Norman J; Beiko, Robert G

    2010-08-01

    Finding biologically causative genotype-phenotype associations from whole-genome data is difficult due to the large gene feature space to mine, the potential for interactions among genes and phylogenetic correlations between genomes. Associations within phylogenetically distinct organisms with unusual molecular mechanisms underlying their phenotype may be particularly difficult to assess. We have developed a new genotype-phenotype association approach that uses Classification based on Predictive Association Rules (CPAR), and compare it with NETCAR, a recently published association algorithm. Our implementation of CPAR gave on average slightly higher classification accuracy, with approximately 100 time faster running times. Given the influence of phylogenetic correlations in the extraction of genotype-phenotype association rules, we furthermore propose a novel measure for downweighting the dependence among samples by modeling shared ancestry using conditional mutual information, and demonstrate its complementary nature to traditional mining approaches. Software implemented for this study is available under the Creative Commons Attribution 3.0 license from the author at http://kiwi.cs.dal.ca/Software/PICA

  2. The Effect of Correction Factor in Synthesizing Global Rules in a Multi-Database Mining Scenario

    Directory of Open Access Journals (Sweden)

    Rengaramanujam Srinivasan

    2009-01-01

    Full Text Available Recently, multi-database mining using local patternanalysis has been identified as an efficient strategy for miningmultiple data sources of an interstate business organization.Using this approach, frequent patterns from the individualsites are synthesized and forwarded to the central head.Various synthesizing models [5,7] have been proposed to formglobal patterns from the forwarded high-frequent rules.Earlier we had proposed a model for synthesizinghigh-frequent rules on the basis of transaction population ofthe sites, support and confidence of the rule in the respectivesites. The rules that are forwarded by the local sites are“strong” rules which satisfy the minimum support andconfidence thresholds at respective sites. It is desired that thesynthesized rules from such forwarded patterns must closelymatch with the mono-mining results, ie. the results that wouldbe obtained if all the databases are put together and mininghas been done. When the rule is present in the site but fails tosatisfy the minimum support threshold value, it is not allowedto take part in the rule synthesizing process. In such situationsthe correction factor “h” plays a vital role in inferring theglobal support and confidence values. A suitable choice ofcorrection factor ‘h’ enables the domain expert to reap thevalid synthesized result. In this paper, the impact of correctionfactor in obtaining synthesized results close to themono-mining results is brought out.

  3. How to Mine Information from Each Instance to Extract an Abbreviated and Credible Logical Rule

    Directory of Open Access Journals (Sweden)

    Limin Wang

    2014-10-01

    Full Text Available Decision trees are particularly promising in symbolic representation and reasoning due to their comprehensible nature, which resembles the hierarchical process of human decision making. However, their drawbacks, caused by the single-tree structure,cannot be ignored. A rigid decision path may cause the majority class to overwhelm otherclass when dealing with imbalanced data sets, and pruning removes not only superfluousnodes, but also subtrees. The proposed learning algorithm, flexible hybrid decision forest(FHDF, mines information implicated in each instance to form logical rules on the basis of a chain rule of local mutual information, then forms different decision tree structures and decision forests later. The most credible decision path from the decision forest can be selected to make a prediction. Furthermore, functional dependencies (FDs, which are extracted from the whole data set based on association rule analysis, perform embedded attribute selection to remove nodes rather than subtrees, thus helping to achieve different levels of knowledge representation and improve model comprehension in the framework of semi-supervised learning. Naive Bayes replaces the leaf nodes at the bottom of the tree hierarchy, where the conditional independence assumption may hold. This technique reduces the potential for overfitting and overtraining and improves the prediction quality and generalization. Experimental results on UCI data sets demonstrate the efficacy of the proposed approach.

  4. [Medication rules for prescriptions containing Pterocephali Herba based on data mining].

    Science.gov (United States)

    Zuo, Fang; Wei, Zhi-Cheng; Tang, Ce; Wang, Wen-Qian; Tong, Dong; Meng, Xian-Li; Zhang, Yi

    2017-08-01

    This study was aimed to discuss and analyze the medication rules for prescriptions containing Pterocephali Herba in Chinese Medical Encyclopedia - Tibetan Medicine, Tibetan Medicine Prescription Modern Research and Clinical Application, and Interpretation of Common Tibetan Medicines based on the collection of Pterocephali Herba and by using the "Traditional Chinese Medicine Inheritance Support system(V2.0.1)",with the use of association rules, apriori algorithm and other data mining methods. The frequency of single drug, the frequency of drug combination, the association rule and the combination of core drugs were analyzed. Through collection of the prescriptions, a total of 215 prescriptions were included, involving a total of 376 herbs. Through the "frequency statistics", the prescriptions containing Pterocephali Herba were commonly used to treat cold fever, distemper virus and arthritis. The highest frequently (frequency≥15) used drugs were Corydalis Herba, Lagotidis Herba, and Gentianae Macrophyllae Radix, et al. The most frequently used drug combinations were "Pterocephali Herba, Corydalis Herba","Pterocephali Herba, Lagotidis Herba", and "Pterocephali Herba, Gentianae Macrophyllae Radix" et al. The prescriptions containing Pterocephali Herba were used to primarily treat disease for Tourette syndrome caused by the dampness heat toxin, fever, arthritis etc, such as pestilent toxicity, pneumonia and influenza, rheumatoid arthritis etc. The drugs in the prescriptions mostly had the effects of heat-clearing and detoxifying, anti-inflammatory, dispelling wind and dampness, often in compatible use with heat-clearing drugs. The drug use was concentrated and reflected the clear thought of prescription statutes. Copyright© by the Chinese Pharmaceutical Association.

  5. Integrating association rules and case-based reasoning to predict retinopathy

    Directory of Open Access Journals (Sweden)

    Vimala Balakrishnan

    Full Text Available This study proposes a retinopathy prediction system based on data mining,particularly association rules using Apriori algorithm, and case-based reasoning. The association rules are used to analyse patterns in the data set and to calculate retinopathy probability whereas case-based reasoning is used to retrieve similar cases. This paper discusses the proposed system. It is believed that great improvements can be provided to medical practitioners and also to diabetics with the implementation of this system.

  6. [Acupoints selection rules analysis of ancient acupuncture for urinary incontinence based on data mining technology].

    Science.gov (United States)

    Zhang, Wei; Tan, Zhigao; Cao, Juanshu; Gong, Houwu; Qin, Zuoai; Zhong, Feng; Cao, Yue; Wei, Yanrong

    2015-12-01

    Based on ancient literature of acupuncture in Canon of Chinese Medicine (4th edition), the articles regarding acupuncture for urinary incontinence were retrieved and collected to establish a database. By Weka data mining software, the multi-level association rules analysis method was applied to analyze the acupoints selection characteristics and rules of ancient acupuncture for treatment of urinary incontinence. Totally 356 articles of acupuncture for urinary incontinence were collected, involving 41 acupoints with a total frequency of 364. As a result, (1) the acupoints in the yin-meridian of hand and foot were highly valued, as the frequency of acupoints in yin-meridians was 2.6 times than that in yang-meridians, and the frequency of acupoints selected was the most in the liver meridian of foot-jueyin; (2) the acupoints in bladder meridian of foot-taiyang were also highly valued, and among three yang-meridians of foot, the frequency of acupoints in the bladder meridian of foot-taiyang was 54, accounting for 65.85% (54/82); (3) more acupoints selected were located in the lower limbs and abdomen; (4) specific acupoints in above meridians were mostly selected, presenting 73.2% (30/41) to the ratio of number and 79.4% (289/364) to the frequency, respectively; (5) Zhongji (CV 3), the front-mu point of bladder meridian, was seldom selected in the ancient acupuncture literature, which was different from modern literature reports. The results show that urinary incontinence belongs to external genitalia diseases, which should be treated from yin, indicating more yin-meridians be used and special acupoints be focused on. It is essential to focus inheritance and innovation in TCM clinical treatment, and applying data mining technology to ancient literature of acupuncture could provide classic theory basis for TCM clinical treatment.

  7. Analysis on composition rules of Chinese patent drugs treating pain-related diseases based on data mining method.

    Science.gov (United States)

    Tang, Shi-Huan; Shen, Dan; Yang, Hong-Jun

    2017-08-24

    To analyze the composition rules of oral prescriptions in the treatment of headache, stomachache and dysmenorrhea recorded in National Standard for Chinese Patent Drugs (NSCPD) enacted by Ministry of Public Health of China and then make comparison between them to better understand pain treatment in different regions of human body. Constructed NSCPD database had been constructed in 2014. Prescriptions treating the three pain-related diseases were searched and screened from the database. Then data mining method such as association rules analysis and complex system entropy method integrated in the data mining software Traditional Chinese Medicine Inheritance Support System (TCMISS) were applied to process the data. Top 25 drugs with high frequency in the treatment of each disease were selected, and 51, 33 and 22 core combinations treating headache, stomachache and dysmenorrhea respectively were mined out as well. The composition rules of the oral prescriptions for treating headache, stomachache and dysmenorrhea recorded in NSCPD has been summarized. Although there were similarities between them, formula varied according to different locations of pain. It can serve as an evidence and reference for clinical treatment and new drug development.

  8. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup; Hansen, Lars Kai; Balslev, D.

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach coo...... that the statistically motivated associations are well aligned with general neuroscientific knowledge....

  9. Association-rule based information source selection

    OpenAIRE

    Yang, Hui; Zhang, Minjie; Shi, Zhongzhi

    2004-01-01

    The proliferation of information sources available on the Wide World Web has resulted in a need for database selection tools to locate the potential useful information sources with respect to the user's information need. Current database selection tools always treat each database independently, ignoring the implicit, useful associations between distributed databases. To overcome this shortcoming, in this paper, we introduce a data-mining approach to assist the process of database selection by...

  10. Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data

    NARCIS (Netherlands)

    Chen, P.P.; Feng, L.; Dillon, Tharam; Liu, James

    Inter-transactional association rules, first presented in our early work [H. Lu, J. Han, L. Feng, Stock movement prediction and n-dimensional inter-transaction association rules, in: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Seattle,

  11. Induction and pruning of classification rules for prediction of microseismic hazards in coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Sikora, M. [Silesian Technical University, Gliwice (Poland)

    2011-06-15

    The paper presents results of application of a rule induction and pruning algorithm for classification of a microseismic hazard state in coal mines. Due to imbalanced distribution of examples describing states 'hazardous' and 'safe', the special algorithm was used for induction and rule pruning. The algorithm selects optimal parameters' values influencing rule induction and pruning based on training and tuning sets. A rule quality measure which decides about a form and classification abilities of rules that are induced is the basic parameter of the algorithm. The specificity and sensitivity of a classifier were used to evaluate its quality. Conducted tests show that the admitted method of rules induction and classifier's quality evaluation enables to get better results of classification of microseismic hazards than by methods currently used in mining practice. Results obtained by the rules-based classifier were also compared with results got by a decision tree induction algorithm and by a neuro-fuzzy system.

  12. Recommendation System Based On Association Rules For Distributed E-Learning Management Systems

    Science.gov (United States)

    Mihai, Gabroveanu

    2015-09-01

    Traditional Learning Management Systems are installed on a single server where learning materials and user data are kept. To increase its performance, the Learning Management System can be installed on multiple servers; learning materials and user data could be distributed across these servers obtaining a Distributed Learning Management System. In this paper is proposed the prototype of a recommendation system based on association rules for Distributed Learning Management System. Information from LMS databases is analyzed using distributed data mining algorithms in order to extract the association rules. Then the extracted rules are used as inference rules to provide personalized recommendations. The quality of provided recommendations is improved because the rules used to make the inferences are more accurate, since these rules aggregate knowledge from all e-Learning systems included in Distributed Learning Management System.

  13. Mining for associations between text and brain activation in a functional neuroimaging database

    DEFF Research Database (Denmark)

    Nielsen, Finn Arup; Hansen, Lars Kai; Balslev, Daniela

    2004-01-01

    We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...

  14. Identification of temporal association rules from time-series microarray data sets.

    Science.gov (United States)

    Nam, Hojung; Lee, KiYoung; Lee, Doheon

    2009-03-19

    One of the most challenging problems in mining gene expression data is to identify how the expression of any particular gene affects the expression of other genes. To elucidate the relationships between genes, an association rule mining (ARM) method has been applied to microarray gene expression data. However, a conventional ARM method has a limit on extracting temporal dependencies between gene expressions, though the temporal information is indispensable to discover underlying regulation mechanisms in biological pathways. In this paper, we propose a novel method, referred to as temporal association rule mining (TARM), which can extract temporal dependencies among related genes. A temporal association rule has the form [gene A upward arrow, gene B downward arrow] --> (7 min) [gene C upward arrow], which represents that high expression level of gene A and significant repression of gene B followed by significant expression of gene C after 7 minutes. The proposed TARM method is tested with Saccharomyces cerevisiae cell cycle time-series microarray gene expression data set. In the parameter fitting phase of TARM, the fitted parameter set [threshold = +/- 0.8, support >or= 3 transactions, confidence >or= 90%] with the best precision score for KEGG cell cycle pathway has been chosen for rule mining phase. With the fitted parameter set, numbers of temporal association rules with five transcriptional time delays (0, 7, 14, 21, 28 minutes) are extracted from gene expression data of 799 genes, which are pre-identified cell cycle relevant genes. From the extracted temporal association rules, associated genes, which play same role of biological processes within short transcriptional time delay and some temporal dependencies between genes with specific biological processes are identified. In this work, we proposed TARM, which is an applied form of conventional ARM. TARM showed higher precision score than Dynamic Bayesian network and Bayesian network. Advantages of TARM are

  15. Dynamic Programming Approach for Construction of Association Rule Systems

    KAUST Repository

    Alsolami, Fawaz

    2016-11-18

    In the paper, an application of dynamic programming approach for optimization of association rules from the point of view of knowledge representation is considered. The association rule set is optimized in two stages, first for minimum cardinality and then for minimum length of rules. Experimental results present cardinality of the set of association rules constructed for information system and lower bound on minimum possible cardinality of rule set based on the information obtained during algorithm work as well as obtained results for length.

  16. Integrated Association Rules Complete Hiding Algorithms

    Directory of Open Access Journals (Sweden)

    Mohamed Refaat Abdellah

    2017-01-01

    Full Text Available This paper presents database security approach for complete hiding of sensitive association rules by using six novel algorithms. These algorithms utilize three new weights to reduce the needed database modifications and support complete hiding, as well as they reduce the knowledge distortion and the data distortions. Complete weighted hiding algorithms enhance the hiding failure by 100%; these algorithms have the advantage of performing only a single scan for the database to gather the required information to form the hiding process. These proposed algorithms are built within the database structure which enables the sanitized database to be generated on run time as needed.

  17. Success Rules of OSS Projects using Datamining 3-Itemset Association Rule

    OpenAIRE

    Andi Wahju Rahardjo Emanuel; Retantyo Wardoyo; Jazi Eko Istiyanto; Khabib Mustofa

    2010-01-01

    We present a research to find the success rules of 134,549 Open Source Software (OSS) Projects at Sourceforge portal using Datamining 3-Itemset Association Rule. Seventeen types of OSS Project's data are collected, classified, and then analyzed using Weka datamining tool. The Datamining 3-Itemset Association Rule is used to find the success rules of these projects by assuming that the success of these projects are reflected by the number of downloads. The result are formulated into 9 success ...

  18. Performance analysis of modified algorithm for finding multilevel association rules

    OpenAIRE

    Shrivastava, Arpna; Jain, R. C.

    2013-01-01

    Multilevel association rules explore the concept hierarchy at multiple levels which provides more specific information. Apriori algorithm explores the single level association rules. Many implementations are available of Apriori algorithm. Fast Apriori implementation is modified to develop new algorithm for finding multilevel association rules. In this study the performance of this new algorithm is analyzed in terms of running time in seconds.

  19. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution. The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post

  20. Mining amino acid association patterns in class B GPCRs.

    Science.gov (United States)

    Kumari, Tannu; Pardasani, Kamal Raj

    2015-01-01

    Class B GPCR family is a small group of receptors which are activated by peptides of intermediate length that range from 30 to 40 amino acid residues including hormones, neuropeptides and autocrine factors that mediate diverse physiological functions. They are involved in physiological processes like glucose homeostasis (glucagon and glucagon-like peptide-1), calcium homeostasis and bone turnover (parathyroid hormone and calcitonin), and control of the stress axis (corticotropin-releasing factor). Most of the GPCR structures and their functions are still unknown. Thus, the study of amino acid association patterns can be useful in prediction of their structure and functions. In view of above, in this paper, an attempt has been made to explore amino acid association patterns in class B GPCRs and their relationships with secondary structures and physiochemical properties. The fuzzy association rule mining is employed to take care of uncertainty due to variation in length of sequences. The association rules have been generated with the help of patterns discovered in the sequences.

  1. Association and Sequence Mining in Web Usage

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-06-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. The focus of this paper is to provide an overview how to use frequent pattern techniques for discovering different types of patterns in a Web log database. In this paper we will focus on finding association as a data mining technique to extract potentially useful knowledge from web usage data. I implemented in Java, using NetBeans IDE, a program for identification of pages’ association from sessions. For exemplification, we used the log files from a commercial web site.

  2. Semantic interestingness measures for discovering association rules in the skeletal dysplasia domain.

    Science.gov (United States)

    Paul, Razan; Groza, Tudor; Hunter, Jane; Zankl, Andreas

    2014-02-05

    Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. With the currently existing wealth of formalised knowledge, the ability to discover implicit relationships between different ontological concepts becomes particularly important. One of the most widely used methods to achieve this is association rule mining. However, while previous research exists on applying traditional association rule mining on ontologies, no approach has, to date, exploited the advantages brought by using the structure of these ontologies in computing rule interestingness measures. We introduce a method that combines concept similarity metrics, formulated using the intrinsic structure of a given ontology, with traditional interestingness measures to compute semantic interestingness measures in the process of association rule mining. We apply the method in our domain of interest - bone dysplasias - using the core ontologies characterising it and an annotated dataset of patient clinical summaries, with the goal of discovering implicit relationships between clinical features and disorders. Experimental results show that, using the above mentioned dataset and a voting strategy classification evaluation, the best scoring traditional interestingness measure achieves an accuracy of 57.33%, while the best scoring semantic interestingness measure achieves an accuracy of 64.38%, both at the recall cut-off point 5. Semantic interestingness measures outperform the traditional ones, and hence show that they are able to exploit the semantic similarities inherently present between ontological concepts. Nevertheless, this is dependent on the domain, and implicitly, on the semantic similarity metric chosen to model it.

  3. [Analysis of Meridians and Acupoints Rules in Acupuncture Treatment of Dysmenorrhea Based on Data Mining].

    Science.gov (United States)

    Chen, Wei-Hao; Lin, Shu-Jun; Zhang, Yi-Min; Zhang, Yu-Juan; Lin, Han-Yu

    2017-10-25

    To determine the rules of acupoints and meridians selection for dysmenorrhea based on data mining. The literature on acupuncture treatment of dysmenorrhea was reviewed and a database of dysmenorrhea prescriptions regarding the main points of acupuncture was established with Excel 2003 software, using the relevance rule and cluster analysis methods in data mining technology to analyze the characteristics and laws in acupuncture prescription. One hundred and fourteen acupuncture prescriptions were included. The highest frequency of acupoint, meridian and location was San-yinjiao(SP 6), Spleen Meridian, lower limb knee and below knee, respectively. The results of relevance rule indicated that the highest confidence for acupoint combination was SP 6-Taichong(LR 3), the highest support for acupoint combination was SP 6-Guanyuan(CV 4), and the results of cluster analysis showed that there were three effective cluster groups. The combination of SP 6-LR 3-CV 4 can be applied in the clinic to cure dysmenorrhea, and Zusanli(ST 36), Ciliao(BL 32), Zhongji(CV 3) can be matched based on syndrome differentiation.

  4. Greedy algorithms withweights for construction of partial association rules

    KAUST Repository

    Moshkov, Mikhail

    2009-09-10

    This paper is devoted to the study of approximate algorithms for minimization of the total weight of attributes occurring in partial association rules. We consider mainly greedy algorithms with weights for construction of rules. The paper contains bounds on precision of these algorithms and bounds on the minimal weight of partial association rules based on an information obtained during the greedy algorithm run.

  5. A Survey of Mining Associated Rockbursts

    Science.gov (United States)

    1988-03-02

    1st International Congress on Rockbursts and Seismicity in Mines, Johannesburg, 1982, SAIMM. Johannesburg, 1984. 28. Herget, G., Mackintosh , A.D. (1987...presented at Fred Leighton Memorial Workshop on Mining Induced Seismicity, 29-30 August 1987, Montreal, Quebec. 30 Monitoring: Various digital arrays. Dates...Johannesburg, 1982, SAIMM, Johannesburg, 1984. 54 References 28. Hlerget, G.. Mackintosh , A.D. (1987) Mining Induced Stresses in Saskatchewan Potash Mines

  6. A novel artificial immune clonal selection classification and rule mining with swarm learning model

    Science.gov (United States)

    Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.

    2013-06-01

    Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.

  7. On the use of genetic programming for mining comprehensible rules in subgroup discovery.

    Science.gov (United States)

    Luna, José María; Romero, José Raúl; Romero, Cristóbal; Ventura, Sebastián

    2014-12-01

    This paper proposes a novel grammar-guided genetic programming algorithm for subgroup discovery. This algorithm, called comprehensible grammar-based algorithm for subgroup discovery (CGBA-SD), combines the requirements of discovering comprehensible rules with the ability to mine expressive and flexible solutions owing to the use of a context-free grammar. Each rule is represented as a derivation tree that shows a solution described using the language denoted by the grammar. The algorithm includes mechanisms to adapt the diversity of the population by self-adapting the probabilities of recombination and mutation. We compare the approach with existing evolutionary and classic subgroup discovery algorithms. CGBA-SD appears to be a very promising algorithm that discovers comprehensible subgroups and behaves better than other algorithms as measures by complexity, interest, and precision indicate. The results obtained were validated by means of a series of nonparametric tests.

  8. The spatiotempora variations rules of Songzao coal mining subsidence based on numerical simulation

    Science.gov (United States)

    Lu, J.; Li, Y.; Cheng, H.; Tang, Z.

    2015-11-01

    .0 m in 1999 was more than twice the area affected by subsidence in 2004. This in return, it was more than 7 times larger than the area affected by subsidence in 2009 of the one affected by subsidence in 2004. Extent of the area affected by the 2.5 m subsidence has also enlarged rapidly. This area has expanded by about 40 times in 2009 than its value in 2004. In addition, the area of subsidence of value 3.0 m has reached about 0.44 hm2 in 2009 from zero value. Finally, the fifth finding indicated that the overall extend of the mining subsidence was much more serious in southern than in northern side of the Songzao Mine. Moreover, it was indicated that the increasing rate of mining subsidence in the western side of the study area was as bigger as in the eastern side between 1999 and 2009. The spatiotemporal variations rules of songzao coal mining subsidence based on numerical simulation could provide reference for the subsequent subsidence prevention and land consolidation.

  9. The spatiotempora variations rules of Songzao coal mining subsidence based on numerical simulation

    Directory of Open Access Journals (Sweden)

    J. Lu

    2015-11-01

    affected by the subsidence 2.0 m in 1999 was more than twice the area affected by subsidence in 2004. This in return, it was more than 7 times larger than the area affected by subsidence in 2009 of the one affected by subsidence in 2004. Extent of the area affected by the 2.5 m subsidence has also enlarged rapidly. This area has expanded by about 40 times in 2009 than its value in 2004. In addition, the area of subsidence of value 3.0 m has reached about 0.44 hm2 in 2009 from zero value. Finally, the fifth finding indicated that the overall extend of the mining subsidence was much more serious in southern than in northern side of the Songzao Mine. Moreover, it was indicated that the increasing rate of mining subsidence in the western side of the study area was as bigger as in the eastern side between 1999 and 2009. The spatiotemporal variations rules of songzao coal mining subsidence based on numerical simulation could provide reference for the subsequent subsidence prevention and land consolidation.

  10. Incremental Maintenance Of Association Rules Under Support Threshold Change

    OpenAIRE

    Tobji, Mohamed Anis Bach; Gouider, Mohamed Salah

    2017-01-01

    Maintenance of association rules is an interesting problem. Several incremental maintenance algorithms were proposed since the work of (Cheung et al, 1996). The majority of these algorithms maintain rule bases assuming that support threshold doesn't change. In this paper, we present incremental maintenance algorithm under support threshold change. This solution allows user to maintain its rule base under any support threshold.

  11. Testing genotypes-phenotype relationships using permutation tests on association rules.

    Science.gov (United States)

    Shaikh, Mateen; Beyene, Joseph

    2015-02-01

    Association rule mining is a knowledge discovery technique which informs researchers about relationships between variables in data. These relationships can be focused to a specific set of response variables. We propose an augmented version of this method to discover groups of genotypes which relate to specific outcomes. We derive the methodology to find these candidate groups of genotypes and illustrate how the method works on data regarding neuroinvasive complications of West Nile virus and through simulation.

  12. A Novel Texture Classification Procedure by using Association Rules

    Directory of Open Access Journals (Sweden)

    L. Jaba Sheela

    2008-11-01

    Full Text Available Texture can be defined as a local statistical pattern of texture primitives in observer’s domain of interest. Texture classification aims to assign texture labels to unknown textures, according to training samples and classification rules. Association rules have been used in various applications during the past decades. Association rules capture both structural and statistical information, and automatically identify the structures that occur most frequently and relationships that have significant discriminative power. So, association rules can be adapted to capture frequently occurring local structures in textures. This paper describes the usage of association rules for texture classification problem. The performed experimental studies show the effectiveness of the association rules. The overall success rate is about 98%.

  13. A Template Model for Multidimensional Inter-Transactional Association Rules

    NARCIS (Netherlands)

    Feng, L.; Yu, J.X.; Lu, H.J.; Han, J.W.

    2002-01-01

    Multidimensional inter-transactional association rules extend the traditional association rules to describe more general associations among items with multiple properties across transactions. “After McDonald and Burger King open branches, KFC will open a branch two months later and one mile away��?

  14. Fast association-rule-based similarity search in 3D models

    Science.gov (United States)

    Dua, Sumeet; Jain, Vineet

    2004-11-01

    Advances in automated data collection tools in design and manufacturing have far exceeded our capacity to analyze this data for novel information. Techniques of data mining and knowledge discovery in large databases promise computationally efficient and accurate means to analyze such data for patterns and similar structures. In this paper, we present a unique data mining approach for finding similarities in classes of 3D models, using discovery of association rules. PCA is first performed on the 3D model to transform it along first principal axis. Transformed 3D model is then sliced and segmented along multiple principal axes, such that each slice can be interpreted as a transaction in a transaction database. Association-rule discovery is performed on this transaction space for multiple models and common association rules among those transactions are stored as a representative of a class of models. We have evaluated the performance of association rules for efficient representation of classes of shape models. The method is time and space efficient, besides presenting a novel paradigm for searching content dependencies in a database of 3D models.

  15. Analysis of Frequent Item set Mining on Variant Datasets

    OpenAIRE

    Henry Alexander; Rohit Bansal; Robin Singh Bhadoria

    2011-01-01

    Association rule mining is the process of discovering relationships among the data items in large database. It is one of the most important problems in the field of data mining. Finding frequent itemsets is one of the most computationally expensive tasks in association rule mining. The classical frequent itemset mining approaches mine the frequent itemsets from the database where presence of an item in a transaction is certain. Frequent itemset mining under uncertain data model is a new area ...

  16. Non-redundant association rules between diseases and medications: an automated method for knowledge base construction.

    Science.gov (United States)

    Séverac, François; Sauleau, Erik A; Meyer, Nicolas; Lefèvre, Hassina; Nisand, Gabriel; Jay, Nicolas

    2015-04-15

    The widespread use of electronic health records (EHRs) has generated massive clinical data storage. Association rules mining is a feasible technique to convert this large amount of data into usable knowledge for clinical decision making, research or billing. We present a data driven method to create a knowledge base linking medications to pathological conditions through their therapeutic indications from elements within the EHRs. Association rules were created from the data of patients hospitalised between May 2012 and May 2013 in the department of Cardiology at the University Hospital of Strasbourg. Medications were extracted from the medication list, and the pathological conditions were extracted from the discharge summaries using a natural language processing tool. Association rules were generated along with different interestingness measures: chi square, lift, conviction, dependency, novelty and satisfaction. All medication-disease pairs were compared to the Summary of Product Characteristics, which is the gold standard. A score based on the other interestingness measures was created to filter the best rules, and the indices were calculated for the different interestingness measures. After the evaluation against the gold standard, a list of accurate association rules was successfully retrieved. Dependency represents the best recall (0.76). Our score exhibited higher exactness (0.84) and precision (0.27) than all of the others interestingness measures. Further reductions in noise produced by this method must be performed to improve the classification precision. Association rules mining using the unstructured elements of the EHR is a feasible technique to identify clinically accurate associations between medications and pathological conditions.

  17. Study of the factors associated with substance use in adolescence using Association Rules.

    Science.gov (United States)

    García, Elena Gervilla; Blasco, Berta Cajal; López, Rafael Jiménez; Pol, Alfonso Palmer

    2010-01-01

    The aim of this study is to analyse the factors related to the use of addictive substances in adolescence using association rules, descriptive tools included in Data Mining. Thus, we have a database referring to the consumption of addictive substances in adolescence, and use the free distribution program in the R arules package (version 2.10.0). The sample was made up of 9,300 students between the ages of 14 and 18 (47.1% boys and 52.9% girls) with an average age of 15.6 (SE=1.2). The adolescents answered an anonymous questionnaire on personal, family and environmental risk factors related to substance use. The best rules obtained with regard to substance use relate the consumption of alcohol to perceived parenting style and peer consumption (confidence = 0.8528), the use of tobacco (smoking), cannabis and cocaine to perceived parental action and illegal behaviour (confidence = 0.8032, 0.8718 and 1.0000, respectively), and the use of ecstasy to peer consumption (confidence = 1.0000). In general, the association rules show in a simple manner the relationship between certain patterns of perceived parental action, behaviours that deviate from social behavioural norms, peer consumption and the use of different legal and illegal drugs of abuse in adolescence. The implications of the results obtained are described, together with the usefulness of this new methodology of analysis.

  18. Finding Exception For Association Rules Via SQL Queries

    Directory of Open Access Journals (Sweden)

    Luminita DUMITRIU

    2000-12-01

    Full Text Available Finding association rules is mainly based on generating larger and larger frequent set candidates, starting from frequent attributes in the database. The frequent sets can be organised as a part of a lattice of concepts according to the Formal Concept Analysis approach. Since the lattice construction is database contents-dependent, the pseudo-intents (see Formal Concept Analysis are avoided. Association rules between concept intents (closed sets A=>B are partial implication rules, meaning that there is some data supporting A and (not B; fully explaining the data requires finding exceptions for the association rules. The approach applies to Oracle databases, via SQL queries.

  19. [Analysis on medication rules of state medical master Yan Zhenghua from prescriptions with citri reticulatae pericarpium based on data mining].

    Science.gov (United States)

    Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Bing; Zhang, Xiao-Meng; Yang, Bing; Sheng, Xiao-Guang

    2014-02-01

    The prescriptions containing pericarpium citri reticulatae that built by Professor. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. After analyzed by data mining, such as apriori algorithm, the frequency of single medicine, the frequency of drug combination, the association rules between drugs and core drug combinations can be get from the database. Through the analysis of 1 027 prescriptions with pericarpium citri reticulatae, these prescriptions were commonly used to treat stomach aches, cough and other syndromes. The most frequency drug combinations were "Citri Reticulatae Pericarpium-Poria", "Paeoniae Radix Rubra-Citri Reticulatae Pericarpium" and so on. The drug association rules that the confidence was 1 were "Glycyrrhizae Radix ex Rhizoma --> Citri Reticulatae Pericarpium", "Paeoniae Alba Radix-Cyperi Rhizoma --> Citri Reticulatae Pericarpium", "Poria --> Citri Reticulatae Pericarpium", and so on. The drugs in the prescriptions containing pericarpium citri reticulatae that built by Professor Yan mostly had the effects of regulating the flow of Qi and invigorate blood circulation, which reflected the clearly thought when making prescriptions.

  20. PubMedMiner: Mining and Visualizing MeSH-based Associations in PubMed.

    Science.gov (United States)

    Zhang, Yucan; Sarkar, Indra Neil; Chen, Elizabeth S

    2014-01-01

    The exponential growth of biomedical literature provides the opportunity to develop approaches for facilitating the identification of possible relationships between biomedical concepts. Indexing by Medical Subject Headings (MeSH) represent high-quality summaries of much of this literature that can be used to support hypothesis generation and knowledge discovery tasks using techniques such as association rule mining. Based on a survey of literature mining tools, a tool implemented using Ruby and R - PubMedMiner - was developed in this study for mining and visualizing MeSH-based associations for a set of MEDLINE articles. To demonstrate PubMedMiner's functionality, a case study was conducted that focused on identifying and comparing comorbidities for asthma in children and adults. Relative to the tools surveyed, the initial results suggest that PubMedMiner provides complementary functionality for summarizing and comparing topics as well as identifying potentially new knowledge.

  1. Mining functional information associated with expression arrays.

    Science.gov (United States)

    Blaschke, C; Oliveros, J C; Valencia, A

    2001-03-01

    Deciphering the networks of interactions between molecules in biological systems has gained momentum with the monitoring of gene expression patterns at the genomic scale. Expression array experiments provide vast amounts of experimental data about these networks, the analysis of which requires new computational methods. In particular, issues related to the extraction of biological information are key for the end users. We propose here a strategy, implemented in a system called GEISHA (gene expression information system for human analysis) and able to detect biological terms significantly associated to different gene expression clusters by mining collections of Medline abstracts. GEISHA is based on a comparison of the frequency of abstracts linked to different gene clusters and containing a given term. Interpretation by the end user of the biological meaning of the terms is facilitated by embedding them in the corresponding significant sentences and abstracts and by establishing relations with other, equally significant terms. The information provided by GEISHA for the available yeast expression data compares favorably with the functional annotations provided by human experts, demonstrating the potential value of GEISHA as an assistant for the analysis of expression array experiments.

  2. Investigation of work zone crash casualty patterns using association rules.

    Science.gov (United States)

    Weng, Jinxian; Zhu, Jia-Zheng; Yan, Xuedong; Liu, Zhiyuan

    2016-07-01

    Investigation of the casualty crash characteristics and contributory factors is one of the high-priority issues in traffic safety analysis. In this paper, we propose a method based on association rules to analyze the characteristics and contributory factors of work zone crash casualties. A case study is conducted using the Michigan M-94/I-94/I-94BL/I-94BR work zone crash data from 2004 to 2008. The obtained association rules are divided into two parts including rules with high-lift, and rules with high-support for the further analysis. The results show that almost all the high-lift rules contain either environmental or occupant characteristics. The majority of association rules are centered on specific characteristics, such as drinking driving, the highway with more than 4 lanes, speed-limit over 40mph and not use of traffic control devices. It should be pointed out that some stronger associated rules were found in the high-support part. With the network visualization, the association rule method can provide more understandable results for investigating the patterns of work zone crash casualties. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Association rules to identify complications of cerebral infarction in patients with atrial fibrillation.

    Science.gov (United States)

    Jung, Sun-Ju; Son, Chang-Sik; Kim, Min-Soo; Kim, Dae-Joon; Park, Hyoung-Seob; Kim, Yoon-Nyun

    2013-03-01

    The purpose of this study was to find risk factors that are associated with complications of cerebral infarction in patients with atrial fibrillation (AF) and to discover useful association rules among these factors. The risk factors with respect to cerebral infarction were selected using logistic regression analysis with the Wald's forward selection approach. The rules to identify the complications of cerebral infarction were obtained by using the association rule mining (ARM) approach. We observed that 4 independent factors, namely, age, hypertension, initial electrocardiographic rhythm, and initial echocardiographic left atrial dimension (LAD), were strong predictors of cerebral infarction in patients with AF. After the application of ARM, we obtained 4 useful rules to identify complications of cerebral infarction: age (>63 years) and hypertension (Yes) and initial ECG rhythm (AF) and initial Echo LAD (>4.06 cm); age (>63 years) and hypertension (Yes) and initial Echo LAD (>4.06 cm); hypertension (Yes) and initial ECG rhythm (AF) and initial Echo LAD (>4.06 cm); age (>63 years) and hypertension (Yes) and initial ECG rhythm (AF). Among the induced rules, 3 factors (the initial ECG rhythm [i.e., AF], initial Echo LAD, and age) were strongly associated with each other.

  4. Association Rule Extraction from XML Stream Data for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Juryon Paik

    2014-07-01

    Full Text Available With the advances of wireless sensor networks, they yield massive volumes of disparate, dynamic and geographically-distributed and heterogeneous data. The data mining community has attempted to extract knowledge from the huge amount of data that they generate. However, previous mining work in WSNs has focused on supporting simple relational data structures, like one table per network, while there is a need for more complex data structures. This deficiency motivates XML, which is the current de facto format for the data exchange and modeling of a wide variety of data sources over the web, to be used in WSNs in order to encourage the interchangeability of heterogeneous types of sensors and systems. However, mining XML data for WSNs has two challenging issues: one is the endless data flow; and the other is the complex tree structure. In this paper, we present several new definitions and techniques related to association rule mining over XML data streams in WSNs. To the best of our knowledge, this work provides the first approach to mining XML stream data that generates frequent tree items without any redundancy.

  5. Association rule extraction from XML stream data for wireless sensor networks.

    Science.gov (United States)

    Paik, Juryon; Nam, Junghyun; Kim, Ung Mo; Won, Dongho

    2014-07-18

    With the advances of wireless sensor networks, they yield massive volumes of disparate, dynamic and geographically-distributed and heterogeneous data. The data mining community has attempted to extract knowledge from the huge amount of data that they generate. However, previous mining work in WSNs has focused on supporting simple relational data structures, like one table per network, while there is a need for more complex data structures. This deficiency motivates XML, which is the current de facto format for the data exchange and modeling of a wide variety of data sources over the web, to be used in WSNs in order to encourage the interchangeability of heterogeneous types of sensors and systems. However, mining XML data for WSNs has two challenging issues: one is the endless data flow; and the other is the complex tree structure. In this paper, we present several new definitions and techniques related to association rule mining over XML data streams in WSNs. To the best of our knowledge, this work provides the first approach to mining XML stream data that generates frequent tree items without any redundancy.

  6. Fuzzy association rules for biological data analysis: A case study on yeast

    Directory of Open Access Journals (Sweden)

    Cano Carlos

    2008-02-01

    Full Text Available Abstract Background Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data. Results In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones. Conclusion An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

  7. Fuzzy association rules for biological data analysis: a case study on yeast.

    Science.gov (United States)

    Lopez, Francisco J; Blanco, Armando; Garcia, Fernando; Cano, Carlos; Marin, Antonio

    2008-02-19

    Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data. In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones. An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

  8. Biomedical application of fuzzy association rules for identifying breast cancer biomarkers.

    Science.gov (United States)

    Lopez, F J; Cuadros, M; Cano, C; Concha, A; Blanco, A

    2012-09-01

    Current breast cancer research involves the study of many different prognosis factors: primary tumor size, lymph node status, tumor grade, tumor receptor status, p53, and ki67 levels, among others. High-throughput microarray technologies are allowing to better understand and identify prognostic factors in breast cancer. But the massive amounts of data derived from these technologies require the use of efficient computational techniques to unveil new and relevant biomedical knowledge. Furthermore, integrative tools are needed that effectively combine heterogeneous types of biomedical data, such as prognosis factors and expression data. The objective of this study was to integrate information from the main prognostic factors in breast cancer with whole-genome microarray data to identify potential associations among them. We propose the application of a data mining approach, called fuzzy association rule mining, to automatically unveil these associations. This paper describes the proposed methodology and illustrates how it can be applied to different breast cancer datasets. The obtained results support known associations involving the number of copies of chromosome-17, HER2 amplification, or the expression level of estrogen and progesterone receptors in breast cancer patients. They also confirm the correspondence between the HER2 status predicted by different testing methodologies (immunohistochemistry and fluorescence in situ hybridization). In addition, other interesting rules involving CDC6, SOX11, and EFEMP1 genes are identified, although further detailed studies are needed to statistically confirm these findings. As part of this study, a web platform implementing the fuzzy association rule mining approach has been made freely available at: http://www.genome2.ugr.es/biofar .

  9. Association Rule Analysis for Tour Route Recommendation and Application to Wctsnop

    Science.gov (United States)

    Fang, H.; Chen, C.; Lin, J.; Liu, X.; Fang, D.

    2017-09-01

    The increasing E-tourism systems provide intelligent tour recommendation for tourists. In this sense, recommender system can make personalized suggestions and provide satisfied information associated with their tour cycle. Data mining is a proper tool that extracting potential information from large database for making strategic decisions. In the study, association rule analysis based on FP-growth algorithm is applied to find the association relationship among scenic spots in different cities as tour route recommendation. In order to figure out valuable rules, Kulczynski interestingness measure is adopted and imbalance ratio is computed. The proposed scheme was evaluated on Wangluzhe cultural tourism service network operation platform (WCTSNOP), where it could verify that it is able to quick recommend tour route and to rapidly enhance the recommendation quality.

  10. ASSOCIATION RULE ANALYSIS FOR TOUR ROUTE RECOMMENDATION AND APPLICATION TO WCTSNOP

    Directory of Open Access Journals (Sweden)

    H. Fang

    2017-09-01

    Full Text Available The increasing E-tourism systems provide intelligent tour recommendation for tourists. In this sense, recommender system can make personalized suggestions and provide satisfied information associated with their tour cycle. Data mining is a proper tool that extracting potential information from large database for making strategic decisions. In the study, association rule analysis based on FP-growth algorithm is applied to find the association relationship among scenic spots in different cities as tour route recommendation. In order to figure out valuable rules, Kulczynski interestingness measure is adopted and imbalance ratio is computed. The proposed scheme was evaluated on Wangluzhe cultural tourism service network operation platform (WCTSNOP, where it could verify that it is able to quick recommend tour route and to rapidly enhance the recommendation quality.

  11. Collaborative Data Mining Tool for Education

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

    2009-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…

  12. Application of a hybrid association rules/decision tree model for drought monitoring

    Science.gov (United States)

    Nourani, Vahid; Molajou, Amir

    2017-12-01

    The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.

  13. Discovering Prerequisite Structure of Skills through Probabilistic Association Rules Mining

    Science.gov (United States)

    Chen, Yang; Wuillemin, Pierre-Henr; Labat, Jean-Marc

    2015-01-01

    Estimating the prerequisite structure of skills is a crucial issue in domain modeling. Students usually learn skills in sequence since the preliminary skills need to be learned prior to the complex skills. The prerequisite relations between skills underlie the design of learning sequence and adaptation strategies for tutoring systems. The…

  14. Privacy-preserving distributed mining of association rules using ...

    Indian Academy of Sciences (India)

    The first protocol uses the notion of Elliptic-curve-based Paillier cryptosystem, which helps in achieving the integrity and authenticity of the messages exchanged among involving sites over the insecure communication channel. It offers privacy of individual site's information against the involving sites and an external ...

  15. Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

    CERN Document Server

    Ahrens, R; The ATLAS collaboration; Kalinin, S; Maettig, P; Sandhoff, M; dos Santos, T; Volkmer, F

    2012-01-01

    The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based “PanDA” job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job’s owner immideatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehaviour. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitorin...

  16. Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

    CERN Document Server

    Ahrens, R; The ATLAS collaboration; Kalinin, S; Maettig, P; Sandhoff, M; dos Santos, T; Volkmer, F

    2012-01-01

    The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based “PanDA” job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job’s owner immideatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehaviour. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitorin...

  17. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  18. Use HypE to Hide Association Rules by Adding Items.

    Science.gov (United States)

    Cheng, Peng; Lin, Chun-Wei; Pan, Jeng-Shyang

    2015-01-01

    During business collaboration, partners may benefit through sharing data. People may use data mining tools to discover useful relationships from shared data. However, some relationships are sensitive to the data owners and they hope to conceal them before sharing. In this paper, we address this problem in forms of association rule hiding. A hiding method based on evolutionary multi-objective optimization (EMO) is proposed, which performs the hiding task by selectively inserting items into the database to decrease the confidence of sensitive rules below specified thresholds. The side effects generated during the hiding process are taken as optimization goals to be minimized. HypE, a recently proposed EMO algorithm, is utilized to identify promising transactions for modification to minimize side effects. Results on real datasets demonstrate that the proposed method can effectively perform sanitization with fewer damages to the non-sensitive knowledge in most cases.

  19. Multimode Retrieval of Mammography Based on Association Rules

    Directory of Open Access Journals (Sweden)

    LV Ya-na

    2017-04-01

    Full Text Available The mammogram case has images of low level features and semantic features. In order to achieve efficient retrieval of breast imaging cases,and enhance the certainty of computer aided diagnosis,a multi-mode retrieval method based on association rules is proposed in this paper. First of all,feature selection algorithm based on the association rules can be used to select the low level features associated with image semantic features,to achieve the dimension reduction. The associative rules which between the selected features and the semantic features can be excavated by using the Apriori algorithm .And then,the associative classifier engine will be used to build the associative classification model depend on the associative rules to capture the visual semantic features. Finally,take obtained semantic from the association classification as input semantic,combining with the low level features of image,to implement the mammogram case multi-mode retrieval. We conducted experiments comparing by precision and recall rate and relevance ranking average value and so nn as the results show,multi一mode retrieval method proposed by this paper and provide visual semantic features of can effectively improve the performance of breast imaging case retrieval image by its low-level features. Multi-mode retrieval reduced the semantic gap between image low level features and visual semantic features,improved the accuracy of image retrieval and provided more meaningful decision support for doctors.

  20. Multi-level Association Rules and Directed Graphs for the Lagrangian Analysis of the Mediterranean Ocean Forecasting System (MFS)

    Science.gov (United States)

    Petelin, B.; Malacic, V.; Malej, A.; Kukar, M.; Kononenko, I.

    2012-04-01

    The Lagrangian method is one of basic methods for modeling the transport of water parcels and the dispersion of biological species. Lagrangian data analysis uses various tools which include classical statistics; however, a visual inspection of individual trajectories is also important for a first sight of the underlying dynamics. The difficulty of the analysis of a large number of trajectories and its visual presentation implies the need for more sophisticated methods. In this study we propose a new methodology which includes data mining and different visualization techniques, namely, association rules and directed graphs. Association rules mining is a representative of unsupervised data mining methods, used to find interesting and important relationships between subsets of attributes in large databases. Oceanographic data exhibit strong spatial and temporal dependencies, so we have extended the basic association rules discovery to spatial and temporal association rules mining. In addition, we need efficient methods for the visualization of the rules and thus we suggest a novel method which uses multi-level graphs with different levels of space and time granularity. Moreover, we can intertwine the knowledge from various disciplines related to oceanography, e.g. marine ecology, and form the graphs of connections among quantities with different granularity and refinement. The motivation for our work comes from the modeling of marine meta-populations where the persistence of local populations strongly depends on the topology and cycles of the connectivity networks. The results of first experiments with the Lagrangian trajectories obtained from the climatologically averaged results of the Adriatic Sea Forecasting System (AFS) show many similarities with previous findings concerning the circulation in the Adriatic Sea, especially regarding the currents along the Italian coast and cyclonic circulation in the southern Adriatic. In this study we present a case study on

  1. Parental rules and communication: their association with adolescent smoking

    NARCIS (Netherlands)

    Harakeh, Z.; Scholte, R.H.J.; Vries, H. de; Engels, R.C.M.E.

    2005-01-01

    Aims - To examine the association between parental rules and communication (also referred to as antismoking socialization) and adolescents’ smoking. Design and participants - A cross-sectional study including 428 Dutch two-parent families with at least two adolescent children (aged

  2. Generation of Acid Mine Lakes Associated with Abandoned Coal Mines in Northwest Turkey.

    Science.gov (United States)

    Sanliyuksel Yucel, Deniz; Balci, Nurgul; Baba, Alper

    2016-05-01

    A total of five acid mine lakes (AMLs) located in northwest Turkey were investigated using combined isotope, molecular, and geochemical techniques to identify geochemical processes controlling and promoting acid formation. All of the investigated lakes showed typical characteristics of an AML with low pH (2.59-3.79) and high electrical conductivity values (1040-6430 μS/cm), in addition to high sulfate (594-5370 mg/l) and metal (aluminum [Al], iron [Fe], manganese [Mn], nickel [Ni], and zinc [Zn]) concentrations. Geochemical and isotope results showed that the acid-generation mechanism and source of sulfate in the lakes can change and depends on the age of the lakes. In the relatively older lakes (AMLs 1 through 3), biogeochemical Fe cycles seem to be the dominant process controlling metal concentration and pH of the water unlike in the younger lakes (AMLs 4 and 5). Bacterial species determined in an older lake (AML 2) indicate that biological oxidation and reduction of Fe and S are the dominant processes in the lakes. Furthermore, O and S isotopes of sulfate indicate that sulfate in the older mine lakes may be a product of much more complex oxidation/dissolution reactions. However, the major source of sulfate in the younger mine lakes is in situ pyrite oxidation catalyzed by Fe(III) produced by way of oxidation of Fe(II). Consistent with this, insignificant fractionation between δ(34) [Formula: see text] and δ(34) [Formula: see text] values indicated that the oxidation of pyrite, along with dissolution and precipitation reactions of Fe(III) minerals, is the main reason for acid formation in the region. Overall, the results showed that acid generation during early stage formation of an AML associated with pyrite-rich mine waste is primarily controlled by the oxidation of pyrite with Fe cycles becoming the dominant processes regulating pH and metal cycles in the later stages of mine lake development.

  3. Exploring factors associated with pressure ulcers: a data mining approach.

    Science.gov (United States)

    Raju, Dheeraj; Su, Xiaogang; Patrician, Patricia A; Loan, Lori A; McCarthy, Mary S

    2015-01-01

    Pressure ulcers are associated with a nearly three-fold increase in in-hospital mortality. It is essential to investigate how other factors besides the Braden scale could enhance the prediction of pressure ulcers. Data mining modeling techniques can be beneficial to conduct this type of analysis. Data mining techniques have been applied extensively in health care, but are not widely used in nursing research. To remedy this methodological gap, this paper will review, explain, and compare several data mining models to examine patient level factors associated with pressure ulcers based on a four year study from military hospitals in the United States. The variables included in the analysis are easily accessible demographic information and medical measurements. Logistic regression, decision trees, random forests, and multivariate adaptive regression splines were compared based on their performance and interpretability. The random forests model had the highest accuracy (C-statistic) with the following variables, in order of importance, ranked highest in predicting pressure ulcers: days in the hospital, serum albumin, age, blood urea nitrogen, and total Braden score. Data mining, particularly, random forests are useful in predictive modeling. It is important for hospitals and health care systems to use their own data over time for pressure ulcer risk prediction, to develop risk models based upon more than the total Braden score, and specific to their patient population. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Feed Forward Neural Network Algorithm for Frequent Patterns Mining

    OpenAIRE

    Dr. K.R.Pardasani; Sanjay Sharma; Amit Bhagat

    2010-01-01

    Association rule mining is used to find relationships among items in large data sets. Frequent patterns mining is an important aspect in association rule mining. In this paper, an efficient algorithm named Apriori-Feed Forward(AFF) based on Apriori algorithm and the Feed Forward Neural Network is presented to mine frequent patterns. Apriori algorithm scans database many times to generate frequent itemsets whereas Apriori-Feed Forward(AFF) algorithm scans database Only Once. Computational resu...

  5. DMET-Miner: Efficient discovery of association rules from pharmacogenomic data.

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro H; Cannataro, Mario

    2015-08-01

    Microarray platforms enable the investigation of allelic variants that may be correlated to phenotypes. Among those, the Affymetrix DMET (Drug Metabolism Enzymes and Transporters) platform enables the simultaneous investigation of all the genes that are related to drug absorption, distribution, metabolism and excretion (ADME). Although recent studies demonstrated the effectiveness of the use of DMET data for studying drug response or toxicity in clinical studies, there is a lack of tools for the automatic analysis of DMET data. In a previous work we developed DMET-Analyzer, a methodology and a supporting platform able to automatize the statistical study of allelic variants, that has been validated in several clinical studies. Although DMET-Analyzer is able to correlate a single variant for each probe (related to a portion of a gene) through the use of the Fisher test, it is unable to discover multiple associations among allelic variants, due to its underlying statistic analysis strategy that focuses on a single variant for each time. To overcome those limitations, here we propose a new analysis methodology for DMET data based on Association Rules mining, and an efficient implementation of this methodology, named DMET-Miner. DMET-Miner extends the DMET-Analyzer tool with data mining capabilities and correlates the presence of a set of allelic variants with the conditions of patient's samples by exploiting association rules. To face the high number of frequent itemsets generated when considering large clinical studies based on DMET data, DMET-Miner uses an efficient data structure and implements an optimized search strategy that reduces the search space and the execution time. Preliminary experiments on synthetic DMET datasets, show how DMET-Miner outperforms off-the-shelf data mining suites such as the FP-Growth algorithms available in Weka and RapidMiner. To demonstrate the biological relevance of the extracted association rules and the effectiveness of the

  6. ANALISA POLA DATA HASIL PEMBANGUNAN KABUPATEN MALANG MENGGUNAKAN METODE ASSOCIATION RULE

    Directory of Open Access Journals (Sweden)

    Dewi Sibagariang

    2013-10-01

    Full Text Available Data of development results in an area divided into several sectors. Each sector has a commodity, government use this data to determine potential comodity in ​​small coverage area. This paper was based on our  research use association rule method,  as we know  this method commonly used in data mining to discover pattern from huge data. Apriori is an algorithm that is implemented on application in this research, this algorithm is used  to  generate strong association information (strong linkage between commodities in each sector. Support, confidence values and relationship between each commodities in 33 districts Kabupaten Malang displayed by application. From test result showed that more higher  value of confidence and support make the strong relationships between commodity value. Minimum limit value can not support more than 33, because most transaction data which is calculated from the total number of 33 districts in Malang.

  7. NV - Assessment of wildlife hazards associated with mine pit lakes

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Several open pit mines in Nevada lower groundwater to mine ore below the water table. After mining, the pits partially fill with groundwater to form pit lakes. Water...

  8. MiningABs: mining associated biomarkers across multi-connected gene expression datasets

    Science.gov (United States)

    2014-01-01

    Background Human disease often arises as a consequence of alterations in a set of associated genes rather than alterations to a set of unassociated individual genes. Most previous microarray-based meta-analyses identified disease-associated genes or biomarkers independent of genetic interactions. Therefore, in this study, we present the first meta-analysis method capable of taking gene combination effects into account to efficiently identify associated biomarkers (ABs) across different microarray platforms. Results We propose a new meta-analysis approach called MiningABs to mine ABs across different array-based datasets. The similarity between paired probe sequences is quantified as a bridge to connect these datasets together. The ABs can be subsequently identified from an “improved” common logit model (c-LM) by combining several sibling-like LMs in a heuristic genetic algorithm selection process. Our approach is evaluated with two sets of gene expression datasets: i) 4 esophageal squamous cell carcinoma and ii) 3 hepatocellular carcinoma datasets. Based on an unbiased reciprocal test, we demonstrate that each gene in a group of ABs is required to maintain high cancer sample classification accuracy, and we observe that ABs are not limited to genes common to all platforms. Investigating the ABs using Gene Ontology (GO) enrichment, literature survey, and network analyses indicated that our ABs are not only strongly related to cancer development but also highly connected in a diverse network of biological interactions. Conclusions The proposed meta-analysis method called MiningABs is able to efficiently identify ABs from different independently performed array-based datasets, and we show its validity in cancer biology via GO enrichment, literature survey and network analyses. We postulate that the ABs may facilitate novel target and drug discovery, leading to improved clinical treatment. Java source code, tutorial, example and related materials are available at

  9. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics.

    Science.gov (United States)

    Hauben, Manfred

    2004-09-01

    To compare the results from one frequently cited data mining algorithm with those from a study, which was published in a peer-reviewed journal, that examined the association of pancreatitis with selected atypical antipsychotics observed by traditional rule-based methods of signal detection. Retrospective pharmacovigilance study. The widely studied data mining algorithm known as the Multi-item Gamma Poisson Shrinker (MGPS) was applied to adverse-event reports from the United States Food and Drug Administration's Adverse Event Reporting System database through the first quarter of 2003 for clozapine, olanzapine, and risperidone to determine if a significant signal of pancreatitis would have been generated by this method in advance of their review or the addition of these events to the respective product labels. Data mining was performed by using nine preferred terms relevant to drug-induced pancreatitis from the Medical Dictionary for Regulatory Activities (MedDRA). Results from a previous study on the antipsychotics were reviewed and analyzed. Physicians' Desk References (PDRs) starting from 1994 were manually reviewed to determine the first year that pancreatitis was listed as an adverse event in the product label for each antipsychotic. This information was used as a surrogate marker of the timing of initial signal detection by traditional criteria. Pancreatitis was listed as an adverse event in a PDR for all three atypical antipsychotics. Despite the presence of up to 88 reports/drug-event combination in the Food and Drug Administration's Adverse Event Reporting System database, the MGPS failed to generate a signal of disproportional reporting of pancreatitis associated with the three antipsychotics despite the signaling of these drug-event combinations by traditional rule-based methods, as reflected in product labeling and/or the literature. These discordant findings illustrate key principles in the application of data mining algorithms to drug safety

  10. Injury Profiles Associated with Artisanal and Small-Scale Gold Mining in Tarkwa, Ghana

    National Research Council Canada - National Science Library

    Calys-Tagoe, Benedict N L; Ovadje, Lauretta; Clarke, Edith; Basu, Niladri; Robins, Thomas

    2015-01-01

    Artisanal and small-scale gold mining (ASGM) is inherently risky, but little is known about mining-associated hazards and injuries despite the tremendous growth worldwide of ASGM and the benefits it offers...

  11. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  12. Data mining and visualization techniques

    Science.gov (United States)

    Wong, Pak Chung [Richland, WA; Whitney, Paul [Richland, WA; Thomas, Jim [Richland, WA

    2004-03-23

    Disclosed are association rule identification and visualization methods, systems, and apparatus. An association rule in data mining is an implication of the form X.fwdarw.Y where X is a set of antecedent items and Y is the consequent item. A unique visualization technique that provides multiple antecedent, consequent, confidence, and support information is disclosed to facilitate better presentation of large quantities of complex association rules.

  13. Rule-based statistical data mining agents for an e-commerce application

    Science.gov (United States)

    Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar

    2003-03-01

    Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.

  14. Finding Influential Users in Social Media Using Association Rule Learning

    Directory of Open Access Journals (Sweden)

    Fredrik Erlandsson

    2016-04-01

    Full Text Available Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods.

  15. Mining multi-item drug adverse effect associations in spontaneous reporting systems

    Directory of Open Access Journals (Sweden)

    Chase Herbert S

    2010-10-01

    Full Text Available Abstract Background Multi-item adverse drug event (ADE associations are associations relating multiple drugs to possibly multiple adverse events. The current standard in pharmacovigilance is bivariate association analysis, where each single drug-adverse effect combination is studied separately. The importance and difficulty in the detection of multi-item ADE associations was noted in several prominent pharmacovigilance studies. In this paper we examine the application of a well established data mining method known as association rule mining, which we tailored to the above problem, and demonstrate its value. The method was applied to the FDAs spontaneous adverse event reporting system (AERS with minimal restrictions and expectations on its output, an experiment that has not been previously done on the scale and generality proposed in this work. Results Based on a set of 162,744 reports of suspected ADEs reported to AERS and published in the year 2008, our method identified 1167 multi-item ADE associations. A taxonomy that characterizes the associations was developed based on a representative sample. A significant number (67% of the total of potential multi-item ADE associations identified were characterized and clinically validated by a domain expert as previously recognized ADE associations. Several potentially novel ADEs were also identified. A smaller proportion (4% of associations were characterized and validated as known drug-drug interactions. Conclusions Our findings demonstrate that multi-item ADEs are present and can be extracted from the FDA’s adverse effect reporting system using our methodology, suggesting that our method is a valid approach for the initial identification of multi-item ADEs. The study also revealed several limitations and challenges that can be attributed to both the method and quality of data.

  16. Rule-Mining for the Early Prediction of Chronic Kidney Disease Based on Metabolomics and Multi-Source Data.

    Directory of Open Access Journals (Sweden)

    Margaux Luck

    Full Text Available 1H Nuclear Magnetic Resonance (NMR-based metabolic profiling is very promising for the diagnostic of the stages of chronic kidney disease (CKD. Because of the high dimension of NMR spectra datasets and the complex mixture of metabolites in biological samples, the identification of discriminant biomarkers of a disease is challenging. None of the widely used chemometric methods in NMR metabolomics performs a local exhaustive exploration of the data. We developed a descriptive and easily understandable approach that searches for discriminant local phenomena using an original exhaustive rule-mining algorithm in order to predict two groups of patients: 1 patients having low to mild CKD stages with no renal failure and 2 patients having moderate to established CKD stages with renal failure. Our predictive algorithm explores the m-dimensional variable space to capture the local overdensities of the two groups of patients under the form of easily interpretable rules. Afterwards, a L2-penalized logistic regression on the discriminant rules was used to build predictive models of the CKD stages. We explored a complex multi-source dataset that included the clinical, demographic, clinical chemistry, renal pathology and urine metabolomic data of a cohort of 110 patients. Given this multi-source dataset and the complex nature of metabolomic data, we analyzed 1- and 2-dimensional rules in order to integrate the information carried by the interactions between the variables. The results indicated that our local algorithm is a valuable analytical method for the precise characterization of multivariate CKD stage profiles and as efficient as the classical global model using chi2 variable section with an approximately 70% of good classification level. The resulting predictive models predominantly identify urinary metabolites (such as 3-hydroxyisovalerate, carnitine, citrate, dimethylsulfone, creatinine and N-methylnicotinamide as relevant variables indicating that

  17. Urban association rules: uncovering linked trips for shopping behavior

    CERN Document Server

    Yoshimura, Yuji; Hobin, Juan N Bautista; Ratti, Carlo; Blat, Josep

    2016-01-01

    In this article, we introduce the method of urban association rules and its uses for extracting frequently appearing combinations of stores that are visited together to characterize shoppers' behaviors. The Apriori algorithm is used to extract the association rules (i.e., if -> result) from customer transaction datasets in a market-basket analysis. An application to our large-scale and anonymized bank card transaction dataset enables us to output linked trips for shopping all over the city: the method enables us to predict the other shops most likely to be visited by a customer given a particular shop that was already visited as an input. In addition, our methodology can consider all transaction activities conducted by customers for a whole city in addition to the location of stores dispersed in the city. This approach enables us to uncover not only simple linked trips such as transition movements between stores but also the edge weight for each linked trip in the specific district. Thus, the proposed methodo...

  18. Analysis of Medical Opinions about the Nonrealization of Autopsies in a Mexican Hospital Using Association Rules and Bayesian Networks

    Directory of Open Access Journals (Sweden)

    Elayne Rubio Delgado

    2018-01-01

    Full Text Available This research identifies the factors influencing the reduction of autopsies in a hospital of Veracruz. The study is based on the application of data mining techniques such as association rules and Bayesian networks in data sets obtained from opinions of physicians. We analyzed, for the exploration and extraction of the knowledge, algorithms like Apriori, FPGrowth, PredictiveApriori, Tertius, J48, NaiveBayes, MultilayerPerceptron, and BayesNet, all of them provided by the API of WEKA. To generate mining models and present the new knowledge in natural language, we also developed a web application. The results presented in this study are those obtained from the best-evaluated algorithms, which have been validated by specialists in the field of pathology.

  19. Improving Personalized Clinical Risk Prediction Based on Causality-Based Association Rules.

    Science.gov (United States)

    Cheng, Chih-Wen; Wang, May D

    2015-09-01

    Developing clinical risk prediction models is one of the main tasks of healthcare data mining. Advanced data collection techniques in current Big Data era have created an emerging and urgent need for scalable, computer-based data mining methods. These methods can turn data into useful, personalized decision support knowledge in a flexible, cost-effective, and productive way. In our previous study, we developed a tool, called icuARM- II, that can generate personalized clinical risk prediction evidence using a temporal rule mining framework. However, the generation of final risk prediction possibility with icuARM-II still relied on human interpretation, which was subjective and, most of time, biased. In this study, we propose a new mechanism to improve icuARM-II's rule selection by including the concept of causal analysis. The generated risk prediction is quantitatively assessed using calibration statistics. To evaluate the performance of the new rule selection mechanism, we conducted a case study to predict short-term intensive care unit mortality based on personalized lab testing abnormalities. Our results demonstrated a better-calibrated ICU risk prediction using the new causality-base rule selection solution by comparing with conventional confidence-only rule selection methods.

  20. Associations between rule-based parenting practices and child screen viewing: A cross-sectional study

    Directory of Open Access Journals (Sweden)

    Joanna M. Kesten

    2015-01-01

    Conclusions: Limit setting is associated with greater SV. Collaborative rule setting may be effective for managing boys' game-console use. More research is needed to understand rule-based parenting practices.

  1. State Identification of Hoisting Motors Based on Association Rules for Quayside Container Crane

    Science.gov (United States)

    Li, Q. Z.; Gang, T.; Pan, H. Y.; Xiong, H.

    2017-07-01

    Quay container crane hoisting motor is a complex system, and the characteristics of long-term evolution and change of running status of there is a rule, and use it. Through association rules analysis, this paper introduced the similarity in association rules, and quay container crane hoisting motor status identification. Finally validated by an example, some rules change amplitude is small, regular monitoring, not easy to find, but it is precisely because of these small changes led to mechanical failure. Therefore, using the association rules change in monitoring the motor status has the very strong practical significance.

  2. Mining the human genome after Association for Molecular Pathology v. Myriad Genetics.

    Science.gov (United States)

    Evans, Barbara J

    2014-07-01

    The Supreme Court's recent decision in Association for Molecular Pathology v. Myriad Genetics portrays the human genome as a product of nature. This frames medical genetics as an extractive industry that mines a natural resource to produce valuable goods and services. Natural resource law offers insights into problems medical geneticists can expect after this decision and suggests possible solutions. Increased competition among clinical laboratories offers various benefits but threatens to increase fragmentation of genetic data resources, potentially causing waste in the form of lost opportunities to discover the clinical significance of particular gene variants. The solution lies in addressing legal barriers to appropriate data sharing. Sustainable discovery in the field of medical genetics can best be achieved through voluntary data sharing rather than command-and-control tactics, but voluntary mechanisms must be conceived broadly to include market-based approaches as well as donative and publicly funded data commons. The recently revised Health Insurance Portability and Accountability Act Privacy Rule offers an improved--but still imperfect--framework for market-oriented data sharing. This article explores strategies for addressing the Privacy Rule's remaining defects. America is close to having a legal framework that can reward innovators, protect privacy, and promote needed data sharing to advance medical genetics.

  3. [Analysis on medication rules of state medical master yan zhenghua's prescriptions that including Polygoni Multiflori Caulis based on data mining].

    Science.gov (United States)

    Wu, Jia-rui; Guo, Wei-xian; Zhang, Xiao-meng; Yang, Bing; Zhang, Bing; Zhao, Meng-di; Sheng, Xiao-guang

    2014-11-01

    The prescriptions including Polygoni Multiflori Caulis that built by Pro. Yan were collected to build a database based on traditional Chinese medicine (TCM) inheritance assist system. The method of association rules with apriori algorithm was used to achieve frequency of single medicine, frequency of drug combinations, association rules between drugs and core drug combinations. The datamining results indicated that in the prescriptions that including Polygoni Multiflori Caulis, the highest frequency used drugs were parched Ziziphi Spinosae Semen, Ostreae Concha, Ossis Mastodi Fossilia, Salviae Miltiorrhizae Radix Et Rhizoma, Paeoniae Rubra Radix, and so on. The most frequent drug combinations were "Polygoni Multiflori Caulis-parched Ziziphi Spinosae Semen", "Ostreae Concha-Polygoni Multiflori Caulis", and "Polygoni Multiflori Caulis-Ossis Mastodi Fossilia". The drug association rules of confidence coefficient 1 were "Ostreae Concha-->Polygoni Multiflori Caulis", "Poria-->Polygoni Multiflori Caulis", "parched Ziziphi Spinosae Semen-->Polygoni Multiflori Caulis", and "Paeoniae Alba Radix-->Polygoni Multiflori Caulis". The core drug combinations in the treatment of insomnia were Ossis Mastodi Fossilia, Polygoni Multiflori Caulis, Salviae Miltiorrhizae Radix et Rhizoma, Ostreae Concha, Polygalae Radix, Margaritifera Concha, Poria, and parched Ziziphi Spinosae Semen. And the core drug combinations in the treatment of obstruction of Qi in chest were Salviae Miltiorrhizae Radix Et Rhizoma, Polygoni Multiflori Caulis, parched Ziziphi Spinosae Semen, Trichosanthis Fructus, Allii Macrostemonis Bulbus, and Paeoniae Rubra Radix.

  4. Text Association Analysis and Ambiguity in Text Mining

    Science.gov (United States)

    Bhonde, S. B.; Paikrao, R. L.; Rahane, K. U.

    2010-11-01

    Text Mining is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Text Mining will enhance human's ability to process massive quantities of information, and it has high commercial values. Firstly, the paper discusses the introduction of TM its definition and then gives an overview of the process of text mining and the applications. Up to now, not much research in text mining especially in concept/entity extraction has focused on the ambiguity problem. This paper addresses ambiguity issues in natural language texts, and presents a new technique for resolving ambiguity problem in extracting concept/entity from texts. In the end, it shows the importance of TM in knowledge discovery and highlights the up-coming challenges of document mining and the opportunities it offers.

  5. Health concerns associated with unconventional gas mining in rural Australia.

    Science.gov (United States)

    Haswell, Melissa R; Bethmont, Anna

    2016-01-01

    Many governments globally are investigating the benefits and risks associated with unconventional gas mining for shale, tight and coal seam gas (coalbed methane) to determine whether the industry should proceed in their jurisdiction. Most locations likely to be developed are in rural areas, with potential impact on farmers and small communities. Despite significant health concerns, public health knowledge and growing evidence are often overlooked in decision-making. It is difficult to gain a broad but accurate understanding of the health concerns for rural communities because the evidence has grown very recently and rapidly, is complex and largely based in the USA, where the industry is advanced. In 2016, a concerned South Australian beef and lamb farmer in an area targeted for potential unconventional gas development organised visits to homes in developed unconventional gas areas of Pennsylvania and forums with leading researchers and lawyers in Pennsylvania and New York. Guided by priorities identified during this trip, this communication concisely distils the research evidence on these key concerns, highlighting the Australian situation where evidence exists. It summarises key information of particular concern to rural regions, using Australia as an example, to assist rural health professionals to be better prepared to engage in decision-making and address the challenges associated with this new industry. Discussions with communities and experts, supported by the expanding research from the USA and Australia, revealed increasing health concerns in six key areas. These are absence of a safe solution to the toxic wastewater management problems, air pollution, land and water competition, mental health and psychosocial wellbeing risks, fugitive methane emissions and lack of proven regulatory regimes. Emerging epidemiological studies suggesting interference with foetal development and birth outcomes, and exacerbation of asthma conditions, are particularly concerning

  6. Prediction of autism susceptibility genes based on association rules.

    Science.gov (United States)

    Gong, Lejun; Yan, Yunyang; Xie, Jianming; Liu, Hongde; Sun, Xiao

    2012-06-01

    Autism is a complex neuropsychiatric disorder with high heritability and an unclear etiology. The identification of key genes related to autism may elucidate its etiology. The current study provides an approach to predicting autism susceptibility genes. Genes are first extracted from the biomedical literature, and some autism susceptibility genes are then recognized as seeds by the prior knowledge. As candidates, the remaining genes are predicted by creating association rules between the seeds and candidates. In an evaluated data set, 27 autism susceptibility genes (type "Y") are extracted and 43 possible autism susceptibility genes (type "P") are predicted. The sum of "Y" and "P" genes accounts for 93.3% of the data set that are not contained in the typical database of autism susceptibility genes. Our approach can effectively extract and predict autism susceptibility genes from the biomedical literature. These predicted results complement the typical database of autism susceptibility genes. The web portal for the predicted results, which is freely available at http://biolab.hyit.edu.cn/ar, can be a valuable resource in studies of diseases related to genes. Copyright © 2012 Wiley Periodicals, Inc.

  7. Parallel Tree Projection Algorithm for Sequence Mining

    Science.gov (United States)

    2001-03-29

    HPMA +00] was developed by extending the tree-projectionalgorithm [AAP00]. Even though, sequential association rule discovery algorithms based on tree...Kumar. Scalable parallel data mining for association rules. IEEETransactions on Knowledge and Data Eng. (accepted for publication), 1999.[ HPMA +00] J

  8. An inductive database prototype based on virtual mining views

    OpenAIRE

    Blockeel, Hendrik; Calders, Toon; Fromont, Elisa; Goethals, Bart; Prado, Adriana; Robardet, Céline

    2008-01-01

    International audience; We present a prototype of an inductive database. Our system enables the user to query not only the data stored in the database but also generalizations (e.g. rules or trees) over these data through the use of virtual mining views. The mining views are relational tables that virtually contain the complete output of data mining algorithms executed over a given dataset. The prototype implemented into PostgreSQL currently integrates frequent itemset, association rule and d...

  9. PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set.

    Science.gov (United States)

    Zakaria, Wael; Kotb, Yasser; Ghaleb, Fayed F M

    2015-01-01

    The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.

  10. Associations between rule-based parenting practices and child screen viewing : a cross-sectional study

    OpenAIRE

    Kesten, Joanna M.; Sebire, Simon J.; Turner, Katrina M; Stewart-Brown, Sarah L.; Bentley, Georgina F.; Jago, Russell

    2015-01-01

    Background:\\ud Child screen viewing (SV) is positively associated with poor health indicators. Interventions addressing rule-based parenting practices may offer an effective means of limiting SV. This study examined associations between rule-based parenting practices (limit and collaborative rule setting) and SV in 6-8-years old children.\\ud \\ud Methods:\\ud An online survey of 735 mothers in 2011 assessed: time that children spent engaged in SV activities; and the use of limit and collaborati...

  11. Interest Measures for Fuzzy Association Rules Based on Expectations of Independence

    Directory of Open Access Journals (Sweden)

    Michal Burda

    2014-01-01

    Full Text Available Lift, leverage, and conviction are three of the best commonly known interest measures for crisp association rules. All of them are based on a comparison of observed support and the support that is expected if the antecedent and consequent part of the rule were stochastically independent. The aim of this paper is to provide a correct definition of lift, leverage, and conviction measures for fuzzy association rules and to study some of their interesting mathematical properties.

  12. Association Rule-based Predictive Model for Machine Failure in Industrial Internet of Things

    Science.gov (United States)

    Kwon, Jung-Hyok; Lee, Sol-Bee; Park, Jaehoon; Kim, Eui-Jik

    2017-09-01

    This paper proposes an association rule-based predictive model for machine failure in industrial Internet of things (IIoT), which can accurately predict the machine failure in real manufacturing environment by investigating the relationship between the cause and type of machine failure. To develop the predictive model, we consider three major steps: 1) binarization, 2) rule creation, 3) visualization. The binarization step translates item values in a dataset into one or zero, then the rule creation step creates association rules as IF-THEN structures using the Lattice model and Apriori algorithm. Finally, the created rules are visualized in various ways for users’ understanding. An experimental implementation was conducted using R Studio version 3.3.2. The results show that the proposed predictive model realistically predicts machine failure based on association rules.

  13. Mining literatures to discover novel multiple biological associations in a disease context.

    Science.gov (United States)

    Faro, Alberto; Giordano, Daniela; Maiorana, Francesco

    2015-01-01

    The text mining methods proposed to discover associations between pairs of biological entities by mining a scientific literature often extract associations already existing in the literature, whereas their extensions supervise too much the discovery process with heuristics and ontologies that limit the research space. On the other hand, the methods that search novel associations applying the text mining methods to two literatures do not avoid the risk of discovering syllogisms based on faulty premises. For this reason, the paper proposes a method that helps the users to discover associations among biological entities by mining the literature using an unsupervised clustering approach. The discovered multiple associations are derived from binary associations to limit the computational load without compromising the methodology accuracy. A case study demonstrates how the tool derived from the methodology works in practice. A comparison between this tool and other tools available in the literature points out the methodology effectiveness.

  14. YAGM: a web tool for mining associated genes in yeast based on diverse biological associations.

    Science.gov (United States)

    Wu, Wei-Sheng; Wang, Chung-Ching; Jhou, Meng-Jhun; Wang, Yu-Cheng

    2015-01-01

    Investigating association between genes can be used in understanding the relations of genes in biological processes. STRING and GeneMANIA are two well-known web tools which can provide a list of associated genes of a query gene based on diverse biological associations such as co-expression, co-localization, co-citation and so on. However, the transcriptional regulation association and mutant phenotype association have not been used in these two web tools. Since the comprehensive transcription factor (TF)-gene binding data, TF-gene regulation data and mutant phenotype data are available in yeast, we developed a web tool called YAGM (Yeast Associated Genes Miner) which constructed the transcriptional regulation association, mutant phenotype association and five commonly used biological associations to mine a list of associated genes of a query yeast gene. In YAGM, we collected seven kinds of datasets including TF-gene binding (TFB) data, TF-gene regulation (TFR) data, mutant phenotype (MP) data, functional annotation (FA) data, physical interaction (PI) data, genetic interaction (GI) data, and literature evidence (LE) data. Then by using the hypergeometric test to calculate the association scores of all gene pairs in yeast, we constructed seven biological associations including two transcriptional regulation associations (TFB association and TFR association), MP association, FA association, PI association, GI association, and LE association. Moreover, the expression profile association from SPELL database was also included in YAGM. When using YAGM, users can input a query gene and choose any possible subsets of the eight biological associations, then a list of associated genes of the query gene will be returned based on the chosen biological associations. In this study, we presented the YAGM which provides eight biological associations for mining associated genes of a query gene in yeast. Among the eight biological associations constructed in YAGM, three (TFB

  15. Air Pollution Monitoring & Tracking System Using Mobile Sensors and Analysis of Data Using Data Mining

    OpenAIRE

    Umesh M. Lanjewar, J. J. Shah

    2012-01-01

    This study proposes air pollution monitoring systemand analysis of pollution data using association ruledata mining technique. Association rule datamining technique aims at finding associationpatterns among various parameters. In this paper,association rule mining is presented for findingassociation patterns among various air pollutants.For this, Apriori algorithm of association rule datamining is used. Apriori is characterized as a level -by-level complete search algorithm. This algorithmis ...

  16. 17 CFR 240.17a-1 - Recordkeeping rule for national securities exchanges, national securities associations...

    Science.gov (United States)

    2010-04-01

    ... national securities exchanges, national securities associations, registered clearing agencies and the... Certain Stabilizing Activities § 240.17a-1 Recordkeeping rule for national securities exchanges, national...) Every national securities exchange, national securities association, registered clearing agency and the...

  17. How hard do mineworkers work? An assessment of workplace stress associated with routine mining activities

    CSIR Research Space (South Africa)

    Schutte, PC

    2012-03-01

    Full Text Available Mining operations are frequently associated with difficult working conditions and high levels of workplace stress. Workplace stress can be defined as the harmful physical and emotional responses that oc-cur when the psychological and...

  18. Discovering relational-based association rules with multiple minimum supports on microarray datasets.

    Science.gov (United States)

    Liu, Yu-Cheng; Cheng, Chun-Pei; Tseng, Vincent S

    2011-11-15

    Association rule analysis methods are important techniques applied to gene expression data for finding expression relationships between genes. However, previous methods implicitly assume that all genes have similar importance, or they ignore the individual importance of each gene. The relation intensity between any two items has never been taken into consideration. Therefore, we proposed a technique named REMMAR (RElational-based Multiple Minimum supports Association Rules) algorithm to tackle this problem. This method adjusts the minimum relation support (MRS) for each gene pair depending on the regulatory relation intensity to discover more important association rules with stronger biological meaning. In the actual case study of this research, REMMAR utilized the shortest distance between any two genes in the Saccharomyces cerevisiae gene regulatory network (GRN) as the relation intensity to discover the association rules from two S.cerevisiae gene expression datasets. Under experimental evaluation, REMMAR can generate more rules with stronger relation intensity, and filter out rules without biological meaning in the protein-protein interaction network (PPIN). Furthermore, the proposed method has a higher precision (100%) than the precision of reference Apriori method (87.5%) for the discovered rules use a literature survey. Therefore, the proposed REMMAR algorithm can discover stronger association rules in biological relationships dissimilated by traditional methods to assist biologists in complicated genetic exploration.

  19. Arsenic and antimony geochemistry of mine wastes, associated waters and sediments at the Giant Mine, Yellowknife, Northwest Territories, Canada

    Science.gov (United States)

    Fawcett, Skya E.; Jamieson, Heather E.; Nordstrom, D. Kirk; McCleskey, R. Blaine

    2015-01-01

    Elevated levels of arsenic (As) and antimony (Sb) in water and sediments are legacy residues found downstream from gold-mining activities at the Giant Mine in Yellowknife, Northwest Territories (NWT), Canada. To track the transport and fate of As and Sb, samples of mine-waste from the mill, and surface water, sediment, pore-water, and vegetation downstream of the mine were collected. Mine waste, pore-water, and sediment samples were analyzed for bulk chemistry, and aqueous and solid-state speciation. Sediment and vegetation chemistry were evaluated using scanning electron microscope imaging, synchrotron-based element mapping and electron microprobe analysis. The distributions of As and Sb in sediments were similar, yet their distributions in the corresponding pore-waters were mostly dissimilar, and the mobility of As was greater than that of Sb. Competition for sorption sites is the most likely cause of elevated Sb concentrations in relatively oxidized pore-water and surface water. The aqueous and solid-state speciation of As and Sb also differed. In pore-water, As(V) dominated in oxidizing environments and As(III) in reducing environments. In contrast, the Sb(V) species dominated in all but one pore-water sample, even under reducing conditions. Antimony(III) appears to preferentially precipitate or adsorb onto sulfides as evidenced by the prevalence of an Sb(III)-S secondary solid-phase and the lack of Sb(III)(aq) in the deeper zones. The As(V)–O solid phase became depleted with depth below the sediment–water interface, and the Sb(V)–O phase persisted under relatively reducing conditions. In the surficial zone at a site populated by Equisetum fluviatile (common horsetail), As and Sb were associated with organic material and appeared mobile in the root zone. In the zone below active plant growth, As and Sb were associated primarily with inorganic phases suggesting a release and reprecipitation of these elements upon plant death. The co-existence of reduced

  20. [Application of association rule in mental health test for employees in a petrochemical enterprise].

    Science.gov (United States)

    Zhang, L F; Zhang, D N; Wang, Z P

    2017-10-20

    Objective: To investigate the occurrence ruleof common psychological abnormalities in petrochemical workers using association rule. Methods: From July to September,2014,the Symptom Checklist-90 (SCL-90)was used for the general survey of mental healthamong all employees in a petrochemical enterprise.The association rule Apriori algorithm was used to analyze the data of SCL-90 and investigate the occurrence rule of psychological abnormalities in petrochemical workers with different sexes,ages,or nationalities. Results: A total of 8 248 usable questionnaires were collected. The SCL-90 analysis showed that 1623 petrochemical workers(19.68%) had positive results,among whom 567(34.94%)had one positive factor and 1056 (65.06%)had two or more positive factors. A total of 7 strong association rules were identified and all of them included obsessive-compulsive symptom and depression. Male({obsessive-compulsive symptom,anxiety}=>{depression}) and female workers ({somatization,depression}=>{obsessive-compulsive symptom}) had their own special association rules. The workers aged 35-44 years had 17 special association rules,and ethnic minorities had 5 special association rules. Conclusion: Employeesin the petrochemical enterprise have multiple positive factors in SCL-90, and employees aged 35-44 years and ethnic minorities have a rich combination of psychological symptoms and need special attention during mental health intervention.

  1. Association between clean indoor air laws and voluntary smokefree rules in homes and cars.

    Science.gov (United States)

    Cheng, Kai-Wen; Okechukwu, Cassandra A; McMillen, Robert; Glantz, Stanton A

    2015-03-01

    This study examines the influence that smokefree workplaces, restaurants and bars have on the adoption of smokefree rules in homes and cars, and whether there is an association with adopting smokefree rules in homes and cars. Bivariate probit models were used to jointly estimate the likelihood of living in a smokefree home and having a smokefree car as a function of law coverage and other variables. Household data were obtained from the nationally representative Social Climate Survey of Tobacco Control 2001, 2002 and 2004-2009; clean indoor air law data were from the American Nonsmokers' Rights Foundation Tobacco Control Laws Database. 'Full coverage' and 'partial coverage' smokefree legislation is associated with an increased likelihood of having voluntary home and car smokefree rules compared with 'no coverage'. The association between 'full coverage' and smokefree rule in homes and cars is 5% and 4%, respectively, and the association between 'partial coverage' and smokefree rules in homes and cars is 3% and 4%, respectively. There is a positive association between the adoption of smokefree rules in homes and cars. Clean indoor air laws provide the additional benefit of encouraging voluntary adoption of smokefree rules in homes and cars. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  2. Clinic-Genomic Association Mining for Colorectal Cancer Using Publicly Available Datasets

    Directory of Open Access Journals (Sweden)

    Fang Liu

    2014-01-01

    Full Text Available In recent years, a growing number of researchers began to focus on how to establish associations between clinical and genomic data. However, up to now, there is lack of research mining clinic-genomic associations by comprehensively analysing available gene expression data for a single disease. Colorectal cancer is one of the malignant tumours. A number of genetic syndromes have been proven to be associated with colorectal cancer. This paper presents our research on mining clinic-genomic associations for colorectal cancer under biomedical big data environment. The proposed method is engineered with multiple technologies, including extracting clinical concepts using the unified medical language system (UMLS, extracting genes through the literature mining, and mining clinic-genomic associations through statistical analysis. We applied this method to datasets extracted from both gene expression omnibus (GEO and genetic association database (GAD. A total of 23517 clinic-genomic associations between 139 clinical concepts and 7914 genes were obtained, of which 3474 associations between 31 clinical concepts and 1689 genes were identified as highly reliable ones. Evaluation and interpretation were performed using UMLS, KEGG, and Gephi, and potential new discoveries were explored. The proposed method is effective in mining valuable knowledge from available biomedical big data and achieves a good performance in bridging clinical data with genomic data for colorectal cancer.

  3. Big data mining analysis method based on cloud computing

    Science.gov (United States)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  4. Alkemio: association of chemicals with biomedical topics by text and data mining

    OpenAIRE

    Gijon-Correas, J.A.; Andrade-Navarro, M. A.; Fontaine, J F

    2014-01-01

    The PubMed(R) database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness...

  5. Arsenic associated with historical gold mining in the Sierra Nevada foothills: Case study and field trip guide for Empire Mine State Historic Park, California

    Science.gov (United States)

    Alpers, Charles N.; Myers, Perry A; Millsap, Daniel; Regnier, Tamsen B; Bowell, Robert J.; Alpers, Charles N.; Jamieson, Heather E.; Nordstrom, D. Kirk; Majzlan, Juraj

    2014-01-01

    The Empire Mine, together with other mines in the Grass Valley mining district, produced at least 21.3 million troy ounces (663 tonnes) of gold (Au) during the 1850s through the 1950s, making it the most productive hardrock Au mining district in California history (Clark 1970). The Empire Mine State Historic Park (Empire Mine SHP or EMSHP), established in 1975, provides the public with an opportunity to see many well-preserved features of the historic mining and mineral processing operations (CDPR 2014a).A legacy of Au mining at Empire Mine and elsewhere is contamination of mine wastes and associated soils, surface waters, and groundwaters with arsenic (As), mercury (Hg), lead (Pb), and other metals. At EMSHP, As has been the principal contaminant of concern and the focus of extensive remediation efforts over the past several years by the State of California, Department of Parks and Recreation (DPR) and Newmont USA, Ltd. In addition, the site is the main focus of a multidisciplinary research project on As bioavailability and bioaccessibility led by the California Department of Toxic Substances Control (DTSC) and funded by the U.S. Environmental Protection Agency’s (USEPA’s) Brownfields Program.This chapter was prepared as a guide for a field trip to EMSHP held on June 14, 2014, in conjunction with a short course on “Environmental Geochemistry, Mineralogy, and Microbiology of Arsenic” held in Nevada City, California on June 15–16, 2014. This guide contains background information on geological setting, mining history, and environmental history at EMSHP and other historical Au mining districts in the Sierra Nevada, followed by descriptions of the field trip stops.

  6. THE CONTRIBUTION OF „RUDA 12 APOSTOLI” MINING ASSOCIATION IN BRAD TO THE DEVELOPMENT OF TRANSYLVANIAN GOLD MINING BETWEEN 1884 – 1921

    Directory of Open Access Journals (Sweden)

    MIRCEA BARON

    2012-01-01

    Full Text Available One of the major gold mining regions in Romania is part of the gold rectangle in the Apuseni Mountains and lies around the town of Brad. It is here that the ”Ruda 12 Apostoli” Mining Association of cuxas was established at the end of the XVIIIth century. This association was to become the most important unit for the mining of precious metals in the entire Austrian – Hungarian Empire after 1884, when it was taken over by the German company ”Harkortschen Bergwerke und Chemische Fabriken zu Schwelm und Harkorten A.G. zu Gotha”, preserving its status in the interwar Romanian as a component of the ”Mica” Mining company. This mining complex had a production of 27,919.520 kg of gold between 1884 – July 1, 1911.

  7. Unexpected rules using a conceptual distance based on fuzzy ontology

    Directory of Open Access Journals (Sweden)

    Mohamed Said Hamani

    2014-01-01

    Full Text Available One of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules.

  8. The Most Advantageous Bangla Keyboard Layout Using Data Mining Technique

    OpenAIRE

    Masum, Abdul Kadar Muhammad; Hassan, Mohammad Mahadi; Kamruzzaman, S. M.

    2010-01-01

    Bangla alphabet has a large number of letters, for this it is complicated to type faster using Bangla keyboard. The proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Association rule of data mining to distribute the Bangla characters in the keyboard is used here. The frequencies of data consisting of monograph, digraph and trigraph are analyzed, which are derived from data wire-house, and then used association rule of data mining to distribute th...

  9. Applied data mining for business and industry

    CERN Document Server

    Giudici, Paolo

    2009-01-01

    The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...

  10. Association of rule of law and health outcomes: an ecological study.

    Science.gov (United States)

    Pinzon-Rondon, Angela Maria; Attaran, Amir; Botero, Juan Carlos; Ruiz-Sternberg, Angela Maria

    2015-10-29

    To explore whether the rule of law is a foundational determinant of health that underlies other socioeconomic, political and cultural factors that have been associated with health outcomes. Global project. Data set of 96 countries, comprising 91% of the global population. The following health indicators, infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, were included to explore their association with the rule of law. We used a novel Rule of Law Index, gathered from survey sources, in a cross-sectional and ecological design. The Index is based on eight subindices: (1) Constraints on Government Powers; (2) Absence of Corruption; (3) Order and Security; (4) Fundamental Rights; (5) Open Government; (6) Regulatory Enforcement, (7) Civil Justice; and (8) Criminal Justice. The rule of law showed an independent association with infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, after adjusting for the countries' level of per capita income, their expenditures in health, their level of political and civil freedom, their Gini measure of inequality and women's status (plaw remained significant in all the multivariate models, and the following adjustment for potential confounders remained robust for at least one or more of the health outcomes across all eight subindices of the rule of law. Findings show that the higher the country's level of adherence to the rule of law, the better the health of the population. It is necessary to start considering the country's adherence to the rule of law as a foundational determinant of health. Health advocates should consider the improvement of rule of law as a tool to improve population health. Conversely, lack of progress in rule of law may constitute a structural barrier to health improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a

  11. Association of rule of law and health outcomes: an ecological study

    Science.gov (United States)

    Pinzon-Rondon, Angela Maria; Attaran, Amir; Botero, Juan Carlos; Ruiz-Sternberg, Angela Maria

    2015-01-01

    Objectives To explore whether the rule of law is a foundational determinant of health that underlies other socioeconomic, political and cultural factors that have been associated with health outcomes. Setting Global project. Participants Data set of 96 countries, comprising 91% of the global population. Primary and secondary outcome measures The following health indicators, infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, were included to explore their association with the rule of law. We used a novel Rule of Law Index, gathered from survey sources, in a cross-sectional and ecological design. The Index is based on eight subindices: (1) Constraints on Government Powers; (2) Absence of Corruption; (3) Order and Security; (4) Fundamental Rights; (5) Open Government; (6) Regulatory Enforcement, (7) Civil Justice; and (8) Criminal Justice. Results The rule of law showed an independent association with infant mortality rate, maternal mortality rate, life expectancy, and cardiovascular disease and diabetes mortality rate, after adjusting for the countries’ level of per capita income, their expenditures in health, their level of political and civil freedom, their Gini measure of inequality and women's status (plaw remained significant in all the multivariate models, and the following adjustment for potential confounders remained robust for at least one or more of the health outcomes across all eight subindices of the rule of law. Findings show that the higher the country's level of adherence to the rule of law, the better the health of the population. Conclusions It is necessary to start considering the country's adherence to the rule of law as a foundational determinant of health. Health advocates should consider the improvement of rule of law as a tool to improve population health. Conversely, lack of progress in rule of law may constitute a structural barrier to health improvement. PMID:26515684

  12. Ecological and human health risks associated with abandoned gold mine tailings contaminated soil

    DEFF Research Database (Denmark)

    Ngole-Jeme, Veronica Mpode; Fantke, Peter

    2017-01-01

    Gold mining is a major source of metal and metalloid emissions into the environment. Studies were carried out in Krugersdorp, South Africa, to evaluate the ecological and human health risks associated with exposure to metals and metalloids in mine tailings contaminated soils. Concentrations...... of arsenic (As), cadmium (Cd), chromium (Cr), cobalt (Co), copper (Cu), lead (Pb), manganese (Mn), nickel (Ni), and zinc (Zn) in soil samples from the area varied with the highest contamination factors (expressed as ratio of metal or metalloid concentration in the tailings contaminated soil......×10−2 for As and Ni respectively among children, and 5×10−3 and 4×10−3 for As and Ni respectively among adults. There is significant potential ecological and human health risk associated with metal and metalloid exposure from contaminated soils around gold mine tailings dumps. This could be a potential contributing...

  13. DISEASES: text mining and data integration of disease-gene associations.

    Science.gov (United States)

    Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

    2015-03-01

    Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Retromer association with membranes: plants have their own rules!

    Science.gov (United States)

    Zelazny, Enric; Santambrogio, Martina; Gaude, Thierry

    2013-09-01

    The retromer is an endosome-localized complex involved in protein trafficking. To better understand its function and regulation in plants, we recently investigated how Arabidopsis retromer subunits assemble and are targeted to endosomal membranes and highlighted original features compared with mammals. We characterized Arabidopsis vps26 null mutant and showed that it displays severe developmental defaults similar to those observed in vps29 mutant. Here, we go further by describing new phenotypic defects associated with loss of VPS26 function, such as inhibition of lateral root initiation. Recently, we showed that VPS35 subunit plays a crucial role in the recruitment of the plant retromer to endosomes, probably through an interaction with the Rab7 homolog RABG3f. In this work, we now show that contrary to mammals, Arabidopsis Rab5 homologs do not seem to be necessary for the recruitment of the core retromer to endosomal membranes, which highlights a new specificity of the plant retromer.

  15. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  16. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  17. A study of trends in occupational risks associated with coal mining

    Energy Technology Data Exchange (ETDEWEB)

    Amoundru, C.

    1980-10-01

    The occupational risks associated with underground coal mining can be categorized as either industrial accidents or occupational diseases. Since 1957, the number of fatal accidents per million tons of coal produced has dropped by a factor of four. The number of industrial accidents in general decreased by 30% during 1967-75. The main occupational diseases affecting miners are arthrosis, deafness, and pneumoconiosis. To make an objective comparison with the health hazards from other sources of energy, the probable risks facing workers in a modern mine should be compared with those currently confronting workers in other industries.

  18. Efficiency and acceptance of new water allocation rules - The case of an agricultural water users association.

    Science.gov (United States)

    Goetz, Renan U; Martínez, Yolanda; Xabadia, Àngels

    2017-12-01

    Water scarcity is one of the major environmental problems in Southern Europe. High levels of water stress and increasing frequency of droughts, along with a greater environmental protection, make it necessary to design water management strategies that are allocative efficient and balance supply and demand. When functioning markets cannot be developed, the allocation rules proposed in the literature of social choice have been recognized as a suitable alternative. However, the application of new water allocation rules can be impaired by a lack of acceptance and implementation problems. This paper examines these obstacles for the case of an agricultural water users association (WUA), situated in the basin of the River Ebro, in relation to the governance structure and collective decision rule of the WUA. It analyzes the extent to which the gains and losses of the farmers affect their acceptance, and examines conditions for building agreements with side payments that provide incentives for the majority of the farmers to form part of a possible agreement. The results show that the uniform and sequential rules improve the allocative efficiency under normal conditions compared to the status quo and the sequential rule even in the case of droughts. In the presence of side payments this rule is likely to be accepted and has only an insignificant impact on distributional inequality. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Integrated assessmet of the impacts associated with uranium mining and milling

    Energy Technology Data Exchange (ETDEWEB)

    Parzyck, D.C.; Baes, C.F. III; Berry, L.G.

    1979-07-01

    The occupational health and safety impacts are assessed for domestic underground mining, open pit mining, and milling. Public health impacts are calculated for a population of 53,000 located within 88 km (55 miles) of a typical southwestern uranium mill. The collective annual dose would be 6.5 man-lung rem/year, 89% of which is from /sup 222/Rn emitted from mill tailings. The dose to the United States population is estimated to be 6 x 10/sup 4/ man-lung rem from combined mining and milling operations. This may be comparedd with 5.7 x 10/sup 5/ man-lung rem from domestic use of natural gas and 4.4 x 10/sup 7/ man-lung rem from building interiors. Unavoidable adverse environmental impacts appear to be severe in a 250 ha area surrounding a mill site but negligible in the entire potentially impacted area (500,000 ha). The contemporary uranium resource and supply industry and its institutional settings are described in relation to the socio-economic impacts likely to emerge from high levels of uranium mining and milling. Radon and radon daughter monitoring techniques associated with uranium mining and milling are discussed.

  20. Impact of gold mining associated with mercury contamination in soil, biota sediments and tailings in Kenya.

    Science.gov (United States)

    Odumo, Benjamin Okang'; Carbonell, Gregoria; Angeyo, Hudson Kalambuka; Patel, Jayanti Purshottam; Torrijos, Manuel; Rodríguez Martín, José Antonio

    2014-11-01

    This work considered the environmental impact of artisanal mining gold activity in the Migori-Transmara area (Kenya). From artisanal gold mining, mercury is released to the environment, thus contributing to degradation of soil and water bodies. High mercury contents have been quantified in soil (140 μg kg(-1)), sediment (430 μg kg(-1)) and tailings (8,900 μg kg(-1)), as expected. The results reveal that the mechanism for transporting mercury to the terrestrial ecosystem is associated with wet and dry depositions. Lichens and mosses, used as bioindicators of pollution, are related to the proximity to mining areas. The further the distance from mining areas, the lower the mercury levels. This study also provides risk maps to evaluate potential negative repercussions. We conclude that the Migori-Transmara region can be considered a strongly polluted area with high mercury contents. The technology used to extract gold throughout amalgamation processes causes a high degree of mercury pollution around this gold mining area. Thus, alternative gold extraction methods should be considered to reduce mercury levels that can be released to the environment.

  1. Patterns Exploration on Patterns of Empirical Herbal Formula of Chinese Medicine by Association Rules.

    Science.gov (United States)

    Huang, Li; Yuan, Jiamin; Yang, Zhimin; Xu, Fuping; Huang, Chunhua

    2015-01-01

    In this study, we use association rules to explore the latent rules and patterns of prescribing and adjusting the ingredients of herbal decoctions based on empirical herbal formula of Chinese Medicine (CM). The consideration and development of CM prescriptions based on the knowledge of CM doctors are analyzed. The study contained three stages. The first stage is to identify the chief symptoms to a specific empirical herbal formula, which can serve as the key indication for herb addition and cancellation. The second stage is to conduct a case study on the empirical CM herbal formula for insomnia. Doctors will add extra ingredients or cancel some of them by CM syndrome diagnosis. The last stage of the study is to divide the observed cases into the effective group and ineffective group based on the assessed clinical effect by doctors. The patterns during the diagnosis and treatment are selected by the applied algorithm and the relations between clinical symptoms or indications and herb choosing principles will be selected by the association rules algorithm. Totally 40 patients were observed in this study: 28 patients were considered effective after treatment and the remaining 12 were ineffective. 206 patterns related to clinical indications of Chinese Medicine were checked and screened with each observed case. In the analysis of the effective group, we used the algorithm of association rules to select combinations between 28 herbal adjustment strategies of the empirical herbal formula and the 190 patterns of individual clinical manifestations. During this stage, 11 common patterns were eliminated and 5 major symptoms for insomnia remained. 12 association rules were identified which included 5 herbal adjustment strategies. The association rules method is an effective algorithm to explore the latent relations between clinical indications and herbal adjustment strategies for the study on empirical herbal formulas.

  2. Cycle mining in active database environments

    Science.gov (United States)

    Seitzer, Jennifer; Buckley, James P.

    2000-04-01

    Traditional data mining algorithms identify patterns in data that are not explicit. These patterns are denoted in the form of IF-THEN rules (IF antecedent THEN consequent), where the antecedent and consequent are logical conjunctions of propositions or first-order predicates. Generally, the mined rules apply to all time periods and specify no temporal interval between antecedent detection and consequent firing. Cycle mining algorithms identify meta-patterns of these associations depicting inferences forming cyclic chains of rule dependencies. Because traditional rules comprise these cycles, the mined cycles also apply to all time periods and do not currently possess the temporal interval of applicability. An active database is one that responds to stimuli in real time, operating in the event-condition-action (ECA) paradigm where a specific event is monitored, a condition is evaluated, and one or more actions are taken. The actions often involve real-time modification of the database. In this paper, we introduce the concepts and present algorithms for mining rules with firing intervals, and intervals of applicability. Using an active database environment, we describe a real time framework that incorporates the active database concept in order to ascertain previously undefined cycles in data over a specific time interval and thereby introduce the concept of interval of discovery. Comprised of discovered rules with firing intervals and intervals of applicability, the encompassing discovered cycles also possess a variation of these attributes. We illustrate this framework with an example from an E-commerce endeavor where data is mined for rules with firing intervals and intervals of applicability, which amalgamate to form a cycle in its interval of discovery. We describe the computer system INDED, the author's implementation of cycle mining, which we are currently interfacing to an active Oracle database using triggers and PL/SQL stored procedures.

  3. DrugQuest - a text mining workflow for drug association discovery.

    Science.gov (United States)

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

    2016-06-06

    Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

  4. MIDClass: microarray data classification by association rules and gene expression intervals.

    Science.gov (United States)

    Giugno, Rosalba; Pulvirenti, Alfredo; Cascione, Luciano; Pigola, Giuseppe; Ferro, Alfredo

    2013-01-01

    We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.

  5. Improved Personalized Recommendation Based on Causal Association Rule and Collaborative Filtering

    Science.gov (United States)

    Lei, Wu; Qing, Fang; Zhou, Jin

    2016-01-01

    There are usually limited user evaluation of resources on a recommender system, which caused an extremely sparse user rating matrix, and this greatly reduce the accuracy of personalized recommendation, especially for new users or new items. This paper presents a recommendation method based on rating prediction using causal association rules.…

  6. MIDClass: microarray data classification by association rules and gene expression intervals.

    Directory of Open Access Journals (Sweden)

    Rosalba Giugno

    Full Text Available We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier, based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.

  7. The technology of searching the associative rules while developing the software

    Science.gov (United States)

    Savchuk, Tamara O.; Pryimak, Natalia V.; Assembay, Azat; Zyska, Tomasz; Junisbekov, Mukhtar; Annabaev, Azamat

    2017-08-01

    It is shown that there are not enough productive methods that can help project managers to create and choose the effective strategies of organizing software development process. By using designed algorithm and mathematical model it's possible to find associative rules that are informative and can help project managers with forming effective process of creating software.

  8. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  9. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    NARCIS (Netherlands)

    Chen, X.; Lee, G.; Maher, B. S.; Fanous, A. H.; Chen, J.; Zhao, Z.; Guo, A.; van den Oord, E.; Sullivan, P. F.; Shi, J.; Levinson, D. F.; Gejman, P. V.; Sanders, A.; Duan, J.; Owen, M. J.; Craddock, N. J.; O'Donovan, M. C.; Blackman, J.; Lewis, D.; Kirov, G. K.; Qin, W.; Schwab, S.; Wildenauer, D.; Chowdari, K.; Nimgaonkar, V.; Straub, R. E.; Weinberger, D. R.; O'Neill, F. A.; Walsh, D.; Bronstein, M.; Darvasi, A.; Lencz, T.; Malhotra, A. K.; Rujescu, D.; Giegling, I.; Werge, T.; Hansen, T.; Ingason, A.; Nöethen, M. M.; Rietschel, M.; Cichon, S.; Djurovic, S.; Andreassen, O. A.; Cantor, R. M.; Ophoff, R.; Corvin, A.; Morris, D. W.; Gill, M.; Pato, C. N.; Pato, M. T.; Macedo, A.; Gurling, H. M. D.; McQuillin, A.; Pimm, J.; Hultman, C.; Lichtenstein, P.; Sklar, P.; Purcell, S. M.; Scolnick, E.; St Clair, D.; Blackwood, D. H. R.; Kendler, K. S.; Kahn, René S.; Linszen, Don H.; van Os, Jim; Wiersma, Durk; Bruggeman, Richard; Cahn, Wiepke; de Haan, Lieuwe; Krabbendam, Lydia; Myin-Germeys, Inez; O'Donovan, Michael C.; Kirov, George K.; Craddock, Nick J.; Holmans, Peter A.; Williams, Nigel M.; Georgieva, Lyudmila; Nikolov, Ivan; Norton, N.; Williams, H.; Toncheva, Draga; Milanova, Vihra; Owen, Michael J.; Hultman, Christina M.; Lichtenstein, Paul; Thelander, Emma F.; Sullivan, Patrick; Morris, Derek W.; O'Dushlaine, Colm T.; Kenny, Elaine; Quinn, Emma M.; Gill, Michael; Corvin, Aiden; McQuillin, Andrew; Choudhury, Khalid; Datta, Susmita; Pimm, Jonathan; Thirumalai, Srinivasa; Puri, Vinay; Krasucki, Robert; Lawrence, Jacob; Quested, Digby; Bass, Nicholas; Gurling, Hugh; Crombie, Caroline; Fraser, Gillian; Kuan, Soh Leh; Walker, Nicholas; St Clair, David; Blackwood, Douglas H. R.; Muir, Walter J.; McGhee, Kevin A.; Pickard, Ben; Malloy, Pat; Maclean, Alan W.; van Beck, Margaret; Wray, Naomi R.; Macgregor, Stuart; Visscher, Peter M.; Pato, Michele T.; Medeiros, Helena; Middleton, Frank; Carvalho, Celia; Morley, Christopher; Fanous, Ayman; Conti, David; Knowles, James A.; Ferreira, Carlos Paz; Macedo, Antonio; Azevedo, M. Helena; Pato, Carlos N.; Stone, Jennifer L.; Ruderfer, Douglas M.; Kirby, Andrew N.; Ferreira, Manuel A. R.; Daly, Mark J.; Purcell, Shaun M.; Sklar, Pamela; Chambert, Kimberly; Kuruvilla, Finny; Gabriel, Stacey B.; Ardlie, Kristin; Moran, Jennifer L.; Scolnick, Edward M.

    2011-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed

  10. A cost-effective, case-control study on the association between breast cancer and pregnancy through web mining.

    Science.gov (United States)

    Yoon, Hong-Jun; Xu, Songhua; Tourassi, Georgia

    2013-05-01

    We report a case-control, breast cancer epidemiological study through mining people stories from the Internet. The aim of the study is to test whether mining openly available, personal stories from the Internet can be a cost-effective way for reliable epidemiological discoveries. As a case study, we focus on the association between breast cancer risk and pregnancy, which is clearly established through controlled clinical survey studies. Specifically, we mined 30,000 online obituary articles. Replicating a case-control study design, our web mining based approach confirmed the general trends reported by traditional epidemiological studies. Our web mining study demonstrates promising preliminary evidence that online content mining can be a cost-effective way for epidemiological knowledge discovery.

  11. Adverse health effects in Canada geese (Branta canadensis) associated with waste from zinc and lead mines in the Tri-State Mining District (Kansas, Oklahoma, and Missouri, USA).

    Science.gov (United States)

    van der Merwe, Deon; Carpenter, James W; Nietfeld, Jerome C; Miesner, John F

    2011-07-01

    Lead and zinc poisoning have been recorded in a variety of bird species, including migrating waterfowl such as Canada Geese (Branta canadensis), at sites contaminated with mine waste from lead and zinc mines in the Tri-State Mining District, Kansas, Oklahoma, and Missouri, USA. The adverse health impacts from mine waste on these birds may, however, be more extensive than is apparent from incidental reports of clinical disease. To characterize health impacts from mine waste on Canada Geese that do not have observable signs of poisoning, four to eight apparently healthy birds per site were collected from four contaminated sites and an uncontaminated reference site, and examined for physical and physiologic evidence of metals poisoning. Tissue concentrations of silver, aluminum, arsenic, barium, cadmium, cobalt, chromium, copper, iron, magnesium, manganese, molybdenum, nickel, lead, selenium, thallium, vanadium, and zinc were determined by inductively coupled plasma mass spectroscopy. Adverse health effects due to lead were characterized by assessing blood δ-aminolevulinic acid dehydratase (ALAD) enzyme activity. Adverse effects associated with zinc poisoning were determined from histologic examination of pancreas tissues. Elevated tissue lead concentrations and inhibited blood ALAD enzyme activities were consistently found in birds at all contaminated sites. Histopathologic signs of zinc poisoning, including fibrosis and vacuolization, were associated with elevated pancreatic zinc concentrations at one of the study sites. Adverse health effects associated with other analyzed elements, or tissue concentrations indicating potentially toxic exposure levels to these elements, were not observed.

  12. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  13. [Study on professor Yan Zhenghua's medication regularity in treating heart diseases based on association rules and entropy cluster].

    Science.gov (United States)

    Wu, Jia-rui; Guo, Wei-xian; Zhang, Xiao-meng; Zhang, Bing; Zhang, Yue

    2015-04-01

    In this study, Professor Yan Zhenghua's recipes for treating heart diseases were collected to determine the frequency and association rules among drugs by such data mining methods as apriori algorithm and complex system entropy cluster and summarize Pro- fessor Yan Zhenghua's medication experience in treating heart diseases. The results indicated that frequently used drugs included Salviae Miltiorrhizae Radix et Rhizoma, Parched Ziziphi Spinosae Semen, Polygoni Multiflori Caulis, Ostreae Concha, Poria; frequently used drug combinations included "Ostreae Concha, Draconis Os", "Polygoni Multiflori Caulis, Parched Ziziphi Spinosae Semen" , and "Salviae Miltiorrhizae Radix et Rhizoma, Parched Ziziphi Spinosae Semen". The drug combinations with the confidence of 1 included "Dalbergiae Odoriferae Lignum-->Salviae Miltiorrhizae Radix et Rhizoma", "Allii Macrostemonis Bulbus-->Parched Ziziphi Spinosae Semen", "Draconis Os-->Ostreae Concha", and "Salviae Miltiorrhizac Radix et Rhizoma, Draconis Os-->Ostreae Concha". The core drug combinations included" Chrysanthemi Flos-Gastrodiae Rhizoma-Tribuli Fructus", "Dipsaci Radix-Taxillus sutchuenensis-Achyranthis Bidentatae Radix", and "Margaritifera Concha-Polygoni Multiflori Caulis-Platycladi Semen-Draconis Os".

  14. Chronic respiratory disease among the elderly in South Africa: any association with proximity to mine dumps?

    Science.gov (United States)

    Nkosi, Vusumuzi; Wichmann, Janine; Voyi, Kuku

    2015-04-03

    There is increasing evidence that environmental factors such as air pollution from mine dumps, increase the risk of chronic respiratory symptoms and diseases. The aim of this study was to investigate the association between proximity to mine dumps and prevalence of chronic respiratory disease in people aged 55 years and older. Elderly persons in communities 1-2 km (exposed) and 5 km (unexposed), from five pre-selected mine dumps in Gauteng and North West Province, in South Africa were included in a cross-sectional study. Structured interviews were conducted with 2397 elderly people, using a previously validated ATS-DLD-78 questionnaire from the British Medical Research Council. Exposed elderly persons had a significantly higher prevalence of chronic respiratory symptoms and diseases than those who were unexposed., Results from the multiple logistic regression analysis indicated that living close to mine dumps was significantly associated with asthma (OR = 1.57; 95% CI: 1.20 - 2.05), chronic bronchitis (OR = 1.74; 95 CI: 1.25 - 2.39), chronic cough (OR = 2.02; 95% CI: 1.58 - 2.57), emphysema (OR = 1.75; 95% CI: 1.11 - 2.77), pneumonia (OR = 1.38; 95% CI: 1.07 - 1.77) and wheeze (OR = 2.01; 95% CI: 1.73 - 2.54). Residing in exposed communities, current smoking, ex-smoking, use of paraffin as main residential cooking/heating fuel and low level of education emerged as independent significant risk factors for chronic respiratory symptoms and diseases. This study suggests that there is a high level of chronic respiratory symptoms and diseases among elderly people in communities located near to mine dumps in South Africa.

  15. Association Rules Analysis of Comorbidity and Multimorbidity: The Concord Health and Aging in Men Project.

    Science.gov (United States)

    Held, Fabian P; Blyth, Fiona; Gnjidic, Danijela; Hirani, Vasant; Naganathan, Vasikaran; Waite, Louise M; Seibel, Markus J; Rollo, Jennifer; Handelsman, David J; Cumming, Robert G; Le Couteur, David G

    2016-05-01

    Comorbidity and multimorbidity are common in older people. Here we used a novel analytic approach called Association Rules together with network analysis to evaluate multimorbidity (two or more disorders) and comorbidity in old age. A population-based cross-sectional study was undertaken where 17 morbidities were analyzed using network analysis, cluster analysis, and Association Rules methodology. A comorbidity interestingness score was developed to quantify the richness and variability of comorbidities associated with an index condition. The participants were community-dwelling men aged 70 years or older from the Concord Health and Ageing in Men Project, Sydney, Australia, with complete data (n = 1,464). The vast majority (75%) of participants had multimorbidity. Several morbidity clusters were apparent (vascular cluster, metabolic cluster, neurodegenerative cluster, mental health and other cluster, and a musculoskeletal and other cluster). Association Rules revealed unexpected comorbidities with high lift and confidence linked to index diseases. Anxiety and heart failure had the highest comorbidity interestingness scores while obesity, hearing impairment, and arthritis had the lowest (zero) scores. We also performed Association Rules analysis for the geriatric syndromes of frailty and falls to determine their association with multimorbidity. Frailty had a very complex and rich set of frequent and interesting comorbidities, while there were no frequent and interesting sets associated with falls. Old age is characterized by a complex pattern of multimorbidity and comorbidity. Single disease definitions do not account for the prevalence and complexity of multimorbidity in older people and a new lexicon may be needed to underpin research and health care interventions for older people. © The Author 2015. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Soil heavy metal contamination and health risks associated with artisanal gold mining in Tongguan, Shaanxi, China.

    Science.gov (United States)

    Xiao, Ran; Wang, Shuang; Li, Ronghua; Wang, Jim J; Zhang, Zengqiang

    2017-07-01

    Soil contamination with heavy metals due to mining activities poses risks to ecological safety and human well-being. Limited studies have investigated heavy metal pollution due to artisanal mining. The present study focused on soil contamination and the health risk in villages in China with historical artisanal mining activities. Heavy metal levels in soils, tailings, cereal and vegetable crops were analyzed and health risk assessed. Additionally, a botany investigation was conducted to identify potential plants for further phytoremediation. The results showed that soils were highly contaminated by residual tailings and previous mining activities. Hg and Cd were the main pollutants in soils. The Hg and Pb concentrations in grains and some vegetables exceeded tolerance limits. Moreover, heavy metal contents in wheat grains were higher than those in maize grains, and leafy vegetables had high concentrations of metals. Ingestion of local grain-based food was the main sources of Hg, Cd, and Pb intake. Local residents had high chronic risks due to the intake of Hg and Pb, while their carcinogenic risk associated with Cd through inhalation was low. Three plants (Erigeron canadensis L., Digitaria ciliaris (Retz.) Koel., and Solanum nigrum L.) were identified as suitable species for phytoremediation. Copyright © 2017. Published by Elsevier Inc.

  17. Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

    Science.gov (United States)

    Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

    2016-07-19

    Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

  18. Association Rule Discovery Has the Ability to Model Complex Genetic Effects

    Science.gov (United States)

    Bush, William S.; Thornton-Wells, Tricia A.; Ritchie, Marylyn D.

    2010-01-01

    Dramatic advances in genotyping technology have established a need for fast, flexible analysis methods for genetic association studies. Common complex diseases, such as Parkinson’s disease or multiple sclerosis, are thought to involve an interplay of multiple genes working either independently or together to influence disease risk. Also, multiple underlying traits, each its own genetic basis may be defined together as a single disease. These effects – trait heterogeneity, locus heterogeneity, and gene-gene interactions (epistasis) – contribute to the complex architecture of common genetic diseases. Association Rule Discovery (ARD) searches for frequent itemsets to identify rule-based patterns in large scale data. In this study, we apply Apriori (an ARD algorithm) to simulated genetic data with varying degrees of complexity. Apriori using information difference to prior as a rule measure shows good power to detect functional effects in simulated cases of simple trait heterogeneity, trait heterogeneity and epistasis, and moderate power in cases of trait heterogeneity and locus heterogeneity. Also, we illustrate that bootstrapping the rule induction process does not considerably improve the power to detect these effects. These results show that ARD is a framework with sufficient flexibility to characterize complex genetic effects. PMID:20953276

  19. Cost efficiency of the non-associative flow rule simulation of an industrial component

    Science.gov (United States)

    Galdos, Lander; de Argandoña, Eneko Saenz; Mendiguren, Joseba

    2017-10-01

    In the last decade, metal forming industry is becoming more and more competitive. In this context, the FEM modeling has become a primary tool of information for the component and process design. Numerous researchers have been focused on improving the accuracy of the material models implemented on the FEM in order to improve the efficiency of the simulations. Aimed at increasing the efficiency of the anisotropic behavior modelling, in the last years the use of non-associative flow rule models (NAFR) has been presented as an alternative to the classic associative flow rule models (AFR). In this work, the cost efficiency of the used flow rule model has been numerically analyzed by simulating an industrial drawing operation with two different models of the same degree of flexibility: one AFR model and one NAFR model. From the present study, it has been concluded that the flow rule has a negligible influence on the final drawing prediction; this is mainly driven by the model parameter identification procedure. Even though the NAFR formulation is complex when compared to the AFR, the present study shows that the total simulation time while using explicit FE solvers has been reduced without loss of accuracy. Furthermore, NAFR formulations have an advantage over AFR formulations in parameter identification because the formulation decouples the yield stress and the Lankford coefficients.

  20. LAND REBORN: TOOLS FOR THE 21ST CENTURY/NATIONAL ASSOCIATION OF ABANDONED MINE LAND PROGRAMS

    Science.gov (United States)

    Mining activities in the US (not counting coal) produce 1-2 billion tons of mine waste annually. Since many of the ore mines involve sulfide minerals, the production of acid mine drainage (AMD) is a common problem from these abandoned mine sites. The combination of acidity, heavy...

  1. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

    Science.gov (United States)

    Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

    2015-06-06

    Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating

  2. Time-to-signal comparison for drug safety data-mining algorithms vs. traditional signaling criteria.

    Science.gov (United States)

    Hochberg, A M; Hauben, M

    2009-06-01

    Data mining may improve identification of signals, but its incremental utility is in question. The objective of this study was to compare associations highlighted by data mining vs. those highlighted through the use of traditional decision rules. In the case of 29 drugs, we used US Food and Drug Administration (FDA) Adverse Event Reporting System (AERS) data to compare three data-mining algorithms (DMAs) with two traditional decision rules: (i) N >or= 3 reports for a designated medical event (DME) and (ii) any event comprising >2% of reports in relation to a drug. Data-mining methods produced 101-324 signals vs. 1,051 for the N >or= 3 rule but yielded a higher proportion of signals having publication support. For the 2% rule, the fraction of signals having publication support was similar to that associated with data mining. Data-mining signals lagged N >or= 3 signaling by 1.5-11.0 months. It may therefore be concluded that data mining identifies fewer signals than the "N >or= 3 DME" rule. The signals appear later with data mining but are more often supported by publications. In the case of the 2% rule, no such difference in publication support was observed.

  3. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.

    Science.gov (United States)

    Cao, Renzhi; Cheng, Jianlin

    2016-01-15

    Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. A Case Investigation of Product Structure Complexity in Mass Customization Using a Data Mining Approach

    DEFF Research Database (Denmark)

    Nielsen, Peter; Brunø, Thomas Ditlev; Nielsen, Kjeld

    2014-01-01

    This paper presents a data mining method for analyzing historical configuration data providing a number of opportunities for improving mass customization capabilities. The overall objective of this paper is to investigate how specific quantitative analyses, more specifically the association rule...

  5. 26 CFR 1.1382-7 - Special rules applicable to cooperative associations exempt from tax before January 1, 1952.

    Science.gov (United States)

    2010-04-01

    ... and Their Patrons § 1.1382-7 Special rules applicable to cooperative associations exempt from tax... 26 Internal Revenue 11 2010-04-01 2010-04-01 true Special rules applicable to cooperative associations exempt from tax before January 1, 1952. 1.1382-7 Section 1.1382-7 Internal Revenue INTERNAL...

  6. Characterization of geochemical alteration halo associated with gold mineralization at the Buzwagi mine, northern Tanzania

    Science.gov (United States)

    Manya, Shukrani

    2017-05-01

    Alteration halo geochemical study was carried out along one transect at the Buzwagi mine which is found in the Neoarchaean Nzega greenstone belt of northern Tanzania. The Buzwagi mine Au mineralization is hosted in quartz veins that are cross-cutting strongly sheared and hydrothermally altered K-granites. Mineralogical studies within the shear zone reveal that sericite, silica and sulphides are the most important hydrothermal mineral assemblages responsible for Au mineralization at the Buzwagi mine. The geochemical alteration halo is characterized by the addition of Au, Cu, Fe, K, Rb, Sn, W and U to wall rocks and simultaneous removal of Na, Sr, Ba, LREE and MREE from the host rocks. The concentrations of Cu (130-870 ppm) which show strong positive correlation with Au (R2 = 0.99) are so high in the alteration halo indicating that Cu is a strong Au pathfinder at the Buzwagi mine. Owing to their immobility during the post-emplacement processes, the HFSE (Zr, Hf, Th, Ta) remained unchanged during the hydrothermal alteration process. The addition of Fe and Cu is attributed to the presence of Fe- and Cu-sulphides (pyrite, chalcopyrite and chalcocite) whereas the addition of K, Rb, Sn, W and U is a function of both primary concentrations of these elements in the host rocks as well as the subsequent strong hydrothermal alteration evidenced by sericitization and silicification which involved the destruction of feldspars into sericites). The destruction of albite and its replacement by sericite accounts for the depletion of Na, Sr (and Ba). The Buzwagi mine Au mineralization mineral association do not include the more known pathfinders like Ag, As, Sb, Bi, Te and Tl and they seem not to have played a role in the mineralization process. These elements, therefore, should not be considered as pathfinders for Au exploration purposes at a Buzwagi-like deposit.

  7. Association rules in computing the use of books in a university library

    Directory of Open Access Journals (Sweden)

    María Alejandra Malberti Riveros

    2015-06-01

    Full Text Available (Received: 2015/03/18 - Accepted: 2015/05/29This work recreates a proposal to evaluate the usage of different books categories in a university library. The model employs a mechanism to carry out usage statistics and a mechanism to carry out discovering association rules, from use data stored in the library system. Usage statistics are computed based on the degree of importance, or relevance, with respect to an area of knowledge, and association rules provide support to determine the final use of the various categories. In the process we take into account that stored data correspond to books requested on loan, renewal or consultation. The study presents knowledge discovery in data, aiming to enhance the management of a university library

  8. [Relevant pathogenesis of heat and phlegm in infantile viral pneumonia: an analysis by association rules].

    Science.gov (United States)

    Al, Jun; Wang, Shou-chuan; Dai, Ming; Chen, Sheng; Yi, Zhan-xiang; Dai, Qi-gang; Xu, Shan

    2013-11-01

    To study the application of association rules in Chinese medical pathogeneses and pathologies of heat and phlegm in infantile viral pneumonia. Association rules were applied to analyze dynamic changes of heat and phlegm correlated symptoms and signs in 297 infants with respiratory syncytial virus (RSV) pneumonia, thus understanding its evolution or pathogenesis. Heat and phlegm co-exist in infantile viral pneumonia. In their relationship, heat was more likely to affect phlegm, but phlegm was less likely to affect heat. Under the intervention of drugs, the possibility of heat induced by phlegm was gradually reduced. But the possibility of phlegm induced by heat was not obvious as time went by. Heat and phlegm have a close relationship in the pathogenesis of infantile viral pneumonia. The intervention of drugs could reduce the pathologic evolution of phlegm causing heat. However, it has little effect on the pathologic evolution of heat causing phlegm.

  9. IMPLEMENTASI ALGORITMA FP-GROWTH MENGGUNAKAN ASSOCIATION RULE PADA MARKET BASKET ANALYSIS

    Directory of Open Access Journals (Sweden)

    Fitriyani Fitriyani

    2016-03-01

    Full Text Available Abstract - The set of data can be processed into information or useful knowledge, one of the data that can be processed is data purchases by consumers. However, large data processing will take a long time in the process. So that these data require appropriate methods in the process. The method is often used in data processing transactions are Apriori, but a great deal less precise data using Apriori because in the process repeatedly scanning the database (candidate set generation. In this study using the FP-Growth method for determining frequent itemset with structure of FP-Tree and Association Rule to determine support and confidence in the transaction data so that the results can be known relationships between an item with other items that are frequently purchased by consumers. Keywords : Apriori, FP-Growth, Association Rule, Transaction, Frequent Itemset. Abstrak - Himpunan data yang besar dapat diolah menjadi informasi atau pengetahuan yang bermanfaat, salah satu data yang dapat diolah adalah data transaksi pembelian barang oleh konsumen. Akan tetapi pemrosesan data yang besar akan membutuhkan waktu yang lama dalam prosesnya. Sehingga data tersebut membutuhkan metode yang tepat dalam proses pengolahannya. Metode yang sering digunakan dalam pengolahan data transaksi adalah Apriori, akan tetapi data yang besar kurang tepat menggunakan Apriori karena dalam prosesnya melakukan scanning berulang kali pada database (candidate set generation. Dalam penelitian ini menggunakan metode FP-Growth untuk menentukan frequent itemset dengan struktur FP-Tree dan Association Rule untuk menentukan support dan confidence pada data transaksi sehingga hasilnya dapat diketahui hubungan-hubungan antara suatu barang dengan barang lainnya yang sering dibeli oleh konsumen. Kata Kunci : Apriori, FP-Growth, Association Rule, Transaction, Frequent Itemset.

  10. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

    Science.gov (United States)

    Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

    2017-07-03

    A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Platinum and associated elements at the New Rambler mine and vicinity, Albany and Carbon Counties, Wyoming

    Science.gov (United States)

    Theobald, P.K.; Thompson, Charles Emmet

    1968-01-01

    Platinum-group metals in the Medicine Bow Mountains were first identified by W. C. Knight in 1901. In the Medicine Bow Mountains, these metals are commonly associated with copper, silver, or gold in shear zones that cut a series of mafic igneous and metamorphic rocks. At the New Rambler mine, where the initial discovery was made, about 50,000 tons of mine and mill waste contain an average of 0.3 percent copper, 7 ppm (parts per million) silver, 1 ppm platinum plus palladium, and 0.7 ppm gold. This material is believed to be from a low-grade envelope around the high-grade pod of complex ore that was mined selectively in the old workings. Soil samples in the vicinity of the New Rambler mine exhibit a wide range of content of several elements associated with the ore. Most of the variation can be attributed to contamination, from the mine workings. Even though soil samples identify a low-level copper anomaly that persists to the limit of the area sampled, soils do not offer a promising medium for tracing mineralization owing to the blanket of transported overburden. Stream sediments, if preconcentrated for analysis, do reveal anomalies not only in the contaminated stream below the New Rambler mine, but in adjacent drainage and on Dave Creek. Examination of a spectrum of elements in heavy-mineral concentrates from stream sediment may contribute to knowledge of the nature of the mineralization and of the basic geology of the environment. The sampling of bedrock exposures is not particularly fruitful because outcrops are sparse and the exposed rocks are the least altered and mineralized. Bedrock sampling does, however, provide information on the large size and provincial nature of the platinum-rich area. We feel that a properly integrated program of geological, geophysical, and geochemical exploration in the Medicine Bow Mountains and probably in the Sierra Madre to the west has a reasonable probability of successfully locating a complex ore body.

  12. Associations of Parental Rules and Socioeconomic Position With Preschool Children's Sedentary Behaviour and Screen Time.

    Science.gov (United States)

    Downing, Katherine L; Hinkley, Trina; Hesketh, Kylie D

    2015-04-01

    There is little current understanding of the influences on sedentary behavior and screen time in preschool children. This study investigated socioeconomic position (SEP) and parental rules as potential correlates of preschool children's sedentary behavior and screen time. Data from the Healthy Active Preschool Years (HAPPY) Study were used. Participating parents reported their child's usual weekly screen time and their rules to regulate their child's screen time. Children wore accelerometers for 8 days to objectively measure sedentary time. Children whose parents limited television viewing spent significantly less time in that behavior and in total screen time; however, overall sedentary behavior was unaffected. An association between parents limiting computer/electronic game use and time spent on the computer was found for girls only. SEP was inversely associated with girls', but not boys', total screen time and television viewing. As parental rules were generally associated with lower levels of screen time, intervention strategies could potentially encourage parents to set limits on, and switch off, screen devices. Intervention strategies should target preschool children across all SEP areas, as there was no difference by SEP in overall sedentary behavior or screen time for boys.

  13. Children's bicycle helmet attitudes and use. Association with parental rules. The Pediatric Practice Research Group.

    Science.gov (United States)

    Miller, P A; Binns, H J; Christoffel, K K

    1996-12-01

    Previous studies have assessed the attitudes of parents and children toward bicycle helmet ownership and use in various settings, but they have not addressed the role of parental rules in promoting bicycle helmet use by children. To further explore the attitudes of parents and children at pediatric practices toward bicycle helmet ownership and use by children and to assess the role of parental rules in promoting bicycle helmet use by children. One hundred sixty-nine 5- to 14-year-old children who owned bicycles and their parents were surveyed during well-child visits at 5 general pediatric practices in the Chicago, Ill, area. One hundred twenty-nine families were represented. Of the children, 60% were aged 5 to 9 years, and 50% were girls. Forty-eight children (28%) reported helmet ownership. Of the helmet owners, 21 (45%) reported helmet use; thus, the overall percentage of helmet use was 12%. Helmet ownership by children was significantly (P parental characteristics: educational level, race, perceived effectiveness of bicycle helmets, seat belt use, and parental helmet ownership. The most common reasons parents gave for lack of helmet ownership by children were "never thought about purchasing" a helmet (35%), "never got around to purchasing" a helmet (29%), "child wouldn't wear it anyway" (26%), and the bicycle helmet was "too expensive" (16%). Only 33% of the parents reported hearing about helmets from their children's pediatrician, but 40% of these parents regarded pediatricians as their most important information source. Of the children who did not own helmets, 64% said they would wear a bicycle helmet if they had one, a more frequent comment for 5- to 9-year-old children than 10- to 14-year-old children (76% vs 49%, P parents had a strict rule about wearing helmets were more likely to always wear their helmets than helmet owners whose parents had a partial rule or no rule (88% vs 19%, P Parental rules are associated with bicycle helmet use by children

  14. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    Science.gov (United States)

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. DDMGD: the database of text-mined associations between genes methylated in diseases from different species

    KAUST Repository

    Raies, A. B.

    2014-11-14

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  16. Efficient selection of association rules from lymphedema symptoms data using a graph structure.

    Science.gov (United States)

    Xu, Shuyu; Shyu, Chi-Ren

    2010-11-13

    Secondary lymphedema (LE) is a chronic progressive disease often caused by cancer treatment, especially in patients who require surgical removal of or radiation to lymph nodes. While LE is incurable, it can be managed successfully with early detection and appropriate treatment. Detection and prediction of LE is difficult due to the absence of a "gold standard" for diagnosis. Despite this, management of the disease is accomplished through adherence to a set of guidelines developed by experts in the field. Unfortunately, not all the recommendations in such a document are supported by clear research evidence, and most of them are only based on expert judgment with limited evidence. This paper focuses on developing a new algorithm to extract specific association rules from LE survey data and efficiently index the rules for easy knowledge retrieval, with the ultimate goal discovering evidence-based and relevant knowledge for inclusion into the best practice document (BP) for the LE community.

  17. Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test results analysis module of team psychological health assessment system, and will extract the relationship between each psychological impact factor, so as to provide certain guidance for psychologists in their mental illness treatment.

  18. Gaseous Oxidized Mercury Flux from Substrates Associated with Industrial Scale Gold Mining in Nevada, USA

    Science.gov (United States)

    Miller, M. B.

    2015-12-01

    Gaseous elemental and oxidized mercury (Hg) fluxes were measured in a laboratory setting from substrate materials derived from industrial-scale open pit gold mining operations in Nevada, USA. Mercury is present in these substrates at a range of concentrations (10 - 40000 ng g-1), predominantly of local geogenic origin in association with the mineralized gold ores, but altered and redistributed to a varying degree by subsequent ore extraction and processing operations, including deposition of Hg recently emitted to the atmosphere from large point sources on the mines. Waste rock, heap leach, and tailings material usually comprise the most extensive and Hg emission relevant substrate surfaces. All three of these material types were collected from active Nevada mine sites in 2010 for previous research, and have since been stored undisturbed at the University of Nevada, Reno. Gaseous elemental Hg (GEM) flux was previously measured from these materials under a variety of conditions, and was re-measured in this study, using Teflon® flux chambers and Tekran® 2537A automated ambient air analyzers. GEM flux from dry undisturbed materials was comparable between the two measurement periods. Gaseous oxidized Hg (GOM) flux from these materials was quantified using an active filter sampling method that consisted of polysulfone cation-exchange membranes deployed in conjunction with the GEM flux apparatus. Initial measurements conducted within greenhouse laboratory space indicate that in dry conditions GOM is deposited to relatively low Hg cap and leach materials, but may be emitted from the much higher Hg concentration tailings material.

  19. Analysis of time losses associated with maintenance and repair of mine hoisting systems

    Energy Technology Data Exchange (ETDEWEB)

    Krichevskii, V.L.; Datsun, Z.V.

    1984-01-01

    This article discusses regulations on maintenance and repair of hoists and other elements of hoisting systems in coal mines of the Ukrainian SSR. According to regulations, maintenance of hoisting equipment should take from 15 to 20 min per shift. In reality maintenance takes not less than 30 min per shift. Factors which influence time losses associated with maintenance and repair of hoisting systems are analyzed: type of hoists, number of hoisting ropes, hoisting depth, use of a cage or a skip, number of hoisting systems in a mine shaft. According to the TO-2 technical regulations on maintenance of hoisting systems in coal mines not less than 35% of working time should be spent on maintenance and repair. Reducing maintenance and repair causes economic losses. In 1978 number of working days in the Donbass was increased. This caused deterioration of maintenanace and repair of hoisting systems. In 1981 the TsNIEhIugol' research institute analyzed economic effects of increasing working time and reducing maintenance time of hoisting equipment. Working time increase caused economic losses.

  20. Coseismic and aseismic deformations associated with mining-induced seismic events located in deep level mines in South Africa

    CSIR Research Space (South Africa)

    Milev, A

    2013-10-01

    Full Text Available Two underground sites in a deep level gold mine in South Africa were instrumented by the Council for Scientific and Industrial Research (CSIR) with tilt meters and seismic monitors. One of the sites was also instrumented by Japanese...

  1. [Analysis of pathogens of pneumonia in children based on association rules].

    Science.gov (United States)

    Mao, Xiaojian; Wang, Heyong; An, Dong

    2012-12-01

    The present paper was aimed to study the relationship between the pneumonia clinical features and the pathogens of pneumonia in children by making use of association rules based on the clinical data of 6 300 cases of pneumonia. Through software analysis, the different association relationship can be obtained between different clinical features of pneumonia in children, such as gender, age and region, etc., and the pathogens of pneumonia. For example, children of different sex with the same pathogen showed different association relationships. Due to the different association relationships between the pneumonia clinical features and the pathogens of pneumonia in children of Guangzhou area, different methods in prevention and treatment of children's pneumonia should be adopted according to actual condition, in order to achieve the best results.

  2. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

  3. 9 CFR 201.4 - Bylaws, rules and regulations, and requirements of exchanges, associations, or other...

    Science.gov (United States)

    2010-01-01

    ... prevent the legitimate application or enforcement of any valid bylaw, rule or regulation, or requirement... 9 Animals and Animal Products 2 2010-01-01 2010-01-01 false Bylaws, rules and regulations, and... Applicability of Industry Rules § 201.4 Bylaws, rules and regulations, and requirements of exchanges...

  4. Radio-Ecological Situation in the Area of the Priargun Production Mining and Chemical Association - 13522

    Energy Technology Data Exchange (ETDEWEB)

    Semenova, M.P.; Seregin, V.A.; Kiselev, S.M.; Titov, A.V. [FSBI SRC A.I. Burnasyan Federal Medical Biophysical Center of FMBA of Russia, Zhivopisnaya Street, 46, Moscow (Russian Federation); Zhuravleva, L.A. [FSHE ' Centre of Hygiene and Epidemiology no. 107' under FMBA of Russia (Russian Federation); Marenny, A.M. [Ltd ' Radiation and Environmental Researches' (Russian Federation)

    2013-07-01

    'The Priargun Production Mining and Chemical Association' (hereinafter referred to as PPMCA) is a diversified mining company which, in addition to underground mining of uranium ore, carries out refining of such ores in hydrometallurgical process to produce natural uranium oxide. The PPMCA facilities are sources of radiation and chemical contamination of the environment in the areas of their location. In order to establish the strategy and develop criteria for the site remediation, independent radiation hygienic monitoring is being carried out over some years. In particular, this monitoring includes determination of concentration of the main dose-forming nuclides in the environmental media. The subjects of research include: soil, grass and local foodstuff (milk and potato), as well as media of open ponds (water, bottom sediments, water vegetation). We also measured the radon activity concentration inside surface workshops and auxiliaries. We determined the specific activity of the following natural radionuclides: U-238, Th-232, K-40, Ra-226. The researches performed showed that in soil, vegetation, groundwater and local foods sampled in the vicinity of the uranium mines, there is a significant excess of {sup 226}Ra and {sup 232}Th content compared to areas outside the zone of influence of uranium mining. The ecological and hygienic situation is as follows: - at health protection zone (HPZ) gamma dose rate outdoors varies within 0.11 to 5.4 μSv/h (The mean value in the reference (background) settlement (Soktui-Molozan village) is 0.14 μSv/h); - gamma dose rate in workshops within HPZ varies over the range 0.14 - 4.3 μSv/h. - the specific activity of natural radionuclides in soil at HPZ reaches 12800 Bq/kg and 510 Bq/kg for Ra-226 and Th-232, respectively. - beyond HPZ the elevated values for {sup 226}Ra have been registered near Lantsovo Lake - 430 Bq/kg; - the radon activity concentration in workshops within HPZ varies over the range 22 - 10800 Bq

  5. The association between indoor smoke-free home rules and the use of cigar and smokeless tobacco: A longitudinal study.

    Science.gov (United States)

    Zhang, Xiao

    2017-11-01

    The existence of an indoor smoke-free home rule is associated with lower use of cigar and smokeless tobacco. This study aims to use a longitudinal sample to examine the association between smoke-free home rules and the cessation and uptake of these two types of tobacco products. The Tobacco Use Supplement of the Current Population Survey surveyed 28,153 adults in May 2010 and then followed them up 12months later. Data from these two surveys and multiple logistic regressions were used to examine the association between overtime smoke-free home rule status and the use of cigar and smokeless tobacco. Among respondents who used cigar in 2010, having an indoor smoke-free home rules consistently (AOR=2.41, 95% CI=1.52-3.83) and adopting one during the 12-month period (AOR=1.92, 95% CI=1.01-3.68) increased the likelihood of not using cigar in 2011, compared to not having or forgoing a home rule over time. Among adults who had never used cigar by 2010, those having a rule consistently (AOR=0.47, 95% CI=0.38-0.71) were less likely to initiate cigar use. Having a smoke-free home rule consistently was also associated with lower likelihood of start using smokeless tobacco (AOR=0.52, 95% CI=0.35-0.78). Nevertheless, there is no evidence indicating that the adoption of a rule is correlated with the cessation of smokeless tobacco. The establishment of indoor smoke-free home rules may help reduce cigar use and prevent the uptake of cigar and smokeless tobacco. Such findings call for research using experimental design to further examine the impact of home rules on the use of cigar and smokeless tobacco. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Associations between parental rules, style of communication and children's screen time.

    Science.gov (United States)

    Bjelland, Mona; Soenens, Bart; Bere, Elling; Kovács, Éva; Lien, Nanna; Maes, Lea; Manios, Yannis; Moschonis, George; te Velde, Saskia J

    2015-10-01

    Research suggests an inverse association between parental rules and screen time in pre-adolescents, and that parents' style of communication with their children is related to the children's time spent watching TV. The aims of this study were to examine associations of parental rules and parental style of communication with children's screen time and perceived excessive screen time in five European countries. UP4FUN was a multi-centre, cluster randomised controlled trial with pre- and post-test measurements in each of five countries; Belgium, Germany, Greece, Hungary and Norway. Questionnaires were completed by the children at school and the parent questionnaire was brought home. Three structural equation models were tested based on measures of screen time and parental style of communication from the pre-test questionnaires. Of the 152 schools invited, 62 (41 %) schools agreed to participate. In total 3325 children (average age 11.2 years and 51 % girls) and 3038 parents (81 % mothers) completed the pre-test questionnaire. The average TV/DVD times across the countries were between 1.5 and 1.8 h/day, while less time was used for computer/games console (0.9-1.4 h/day). The children's perceived parental style of communication was quite consistent for TV/DVD and computer/games console. The presence of rules was significantly associated with less time watching TV/DVD and use of computer/games console time. Moreover, the use of an autonomy-supportive style was negatively related to both time watching TV/DVD and use of computer/games console time. The use of a controlling style was related positively to perceived excessive time used on TV/DVD and excessive time used on computer/games console. With a few exceptions, results were similar across the five countries. This study suggests that an autonomy-supportive style of communicating rules for TV/DVD or computer/ games console use is negatively related to children's time watching TV/DVD and use of computer/games console time

  7. Could parental rules play a role in the association between short sleep and obesity in young children?

    Science.gov (United States)

    Jones, Caroline H D; Pollard, Tessa M; Summerbell, Carolyn D; Ball, Helen

    2014-05-01

    Short sleep duration is associated with obesity in young children. This study develops the hypothesis that parental rules play a role in this association. Participants were 3-year-old children and their parents, recruited at nursery schools in socioeconomically deprived and non-deprived areas of a North-East England town. Parents were interviewed to assess their use of sleep, television-viewing and dietary rules, and given diaries to document their child's sleep for 4 days/5 nights. Children were measured for height, weight, waist circumference and triceps and subscapular skinfold thicknesses. One-hundred and eight families participated (84 with complete sleep data and 96 with complete body composition data). Parental rules were significantly associated together, were associated with longer night-time sleep and were more prevalent in the non-deprived-area compared with the deprived-area group. Television-viewing and dietary rules were associated with leaner body composition. Parental rules may in part confound the association between night-time sleep duration and obesity in young children, as rules cluster together across behavioural domains and are associated with both sleep duration and body composition. This hypothesis should be tested rigorously in large representative samples.

  8. Ecological and human health risks associated with abandoned gold mine tailings contaminated soil.

    Directory of Open Access Journals (Sweden)

    Veronica Mpode Ngole-Jeme

    Full Text Available Gold mining is a major source of metal and metalloid emissions into the environment. Studies were carried out in Krugersdorp, South Africa, to evaluate the ecological and human health risks associated with exposure to metals and metalloids in mine tailings contaminated soils. Concentrations of arsenic (As, cadmium (Cd, chromium (Cr, cobalt (Co, copper (Cu, lead (Pb, manganese (Mn, nickel (Ni, and zinc (Zn in soil samples from the area varied with the highest contamination factors (expressed as ratio of metal or metalloid concentration in the tailings contaminated soil to that of the control site observed for As (3.5x102, Co (2.8x102 and Ni (1.1x102. Potential ecological risk index values for metals and metalloids determined from soil metal and metalloid concentrations and their respective risk factors were correspondingly highest for As (3.5x103 and Co (1.4x103, whereas Mn (0.6 presented the lowest ecological risk. Human health risk was assessed using Hazard Quotient (HQ, Chronic Hazard Index (CHI and carcinogenic risk levels, where values of HQ > 1, CHI > 1 and carcinogenic risk values > 1×10-4 represent elevated risks. Values for HQ indicated high exposure-related risk for As (53.7, Cr (14.8, Ni (2.2, Zn (2.64 and Mn (1.67. Children were more at risk from heavy metal and metalloid exposure than adults. Cancer-related risks associated with metal and metalloid exposure among children were also higher than in adults with cancer risk values of 3×10-2 and 4×10-2 for As and Ni respectively among children, and 5×10-3 and 4×10-3 for As and Ni respectively among adults. There is significant potential ecological and human health risk associated with metal and metalloid exposure from contaminated soils around gold mine tailings dumps. This could be a potential contributing factor to a setback in the health of residents in informal settlements dominating this mining area as the immune systems of some of these residents are already compromised by high

  9. Ecological and human health risks associated with abandoned gold mine tailings contaminated soil

    Science.gov (United States)

    Ngole-Jeme, Veronica Mpode; Fantke, Peter

    2017-01-01

    Gold mining is a major source of metal and metalloid emissions into the environment. Studies were carried out in Krugersdorp, South Africa, to evaluate the ecological and human health risks associated with exposure to metals and metalloids in mine tailings contaminated soils. Concentrations of arsenic (As), cadmium (Cd), chromium (Cr), cobalt (Co), copper (Cu), lead (Pb), manganese (Mn), nickel (Ni), and zinc (Zn) in soil samples from the area varied with the highest contamination factors (expressed as ratio of metal or metalloid concentration in the tailings contaminated soil to that of the control site) observed for As (3.5x102), Co (2.8x102) and Ni (1.1x102). Potential ecological risk index values for metals and metalloids determined from soil metal and metalloid concentrations and their respective risk factors were correspondingly highest for As (3.5x103) and Co (1.4x103), whereas Mn (0.6) presented the lowest ecological risk. Human health risk was assessed using Hazard Quotient (HQ), Chronic Hazard Index (CHI) and carcinogenic risk levels, where values of HQ > 1, CHI > 1 and carcinogenic risk values > 1×10−4 represent elevated risks. Values for HQ indicated high exposure-related risk for As (53.7), Cr (14.8), Ni (2.2), Zn (2.64) and Mn (1.67). Children were more at risk from heavy metal and metalloid exposure than adults. Cancer-related risks associated with metal and metalloid exposure among children were also higher than in adults with cancer risk values of 3×10−2 and 4×10−2 for As and Ni respectively among children, and 5×10−3 and 4×10−3 for As and Ni respectively among adults. There is significant potential ecological and human health risk associated with metal and metalloid exposure from contaminated soils around gold mine tailings dumps. This could be a potential contributing factor to a setback in the health of residents in informal settlements dominating this mining area as the immune systems of some of these residents are already

  10. Ecological and human health risks associated with abandoned gold mine tailings contaminated soil.

    Science.gov (United States)

    Ngole-Jeme, Veronica Mpode; Fantke, Peter

    2017-01-01

    Gold mining is a major source of metal and metalloid emissions into the environment. Studies were carried out in Krugersdorp, South Africa, to evaluate the ecological and human health risks associated with exposure to metals and metalloids in mine tailings contaminated soils. Concentrations of arsenic (As), cadmium (Cd), chromium (Cr), cobalt (Co), copper (Cu), lead (Pb), manganese (Mn), nickel (Ni), and zinc (Zn) in soil samples from the area varied with the highest contamination factors (expressed as ratio of metal or metalloid concentration in the tailings contaminated soil to that of the control site) observed for As (3.5x102), Co (2.8x102) and Ni (1.1x102). Potential ecological risk index values for metals and metalloids determined from soil metal and metalloid concentrations and their respective risk factors were correspondingly highest for As (3.5x103) and Co (1.4x103), whereas Mn (0.6) presented the lowest ecological risk. Human health risk was assessed using Hazard Quotient (HQ), Chronic Hazard Index (CHI) and carcinogenic risk levels, where values of HQ > 1, CHI > 1 and carcinogenic risk values > 1×10-4 represent elevated risks. Values for HQ indicated high exposure-related risk for As (53.7), Cr (14.8), Ni (2.2), Zn (2.64) and Mn (1.67). Children were more at risk from heavy metal and metalloid exposure than adults. Cancer-related risks associated with metal and metalloid exposure among children were also higher than in adults with cancer risk values of 3×10-2 and 4×10-2 for As and Ni respectively among children, and 5×10-3 and 4×10-3 for As and Ni respectively among adults. There is significant potential ecological and human health risk associated with metal and metalloid exposure from contaminated soils around gold mine tailings dumps. This could be a potential contributing factor to a setback in the health of residents in informal settlements dominating this mining area as the immune systems of some of these residents are already compromised by

  11. HIV preventive behavior and associated factors among mining workers in Sali traditional gold mining site Bench Maji zone, Southwest Ethiopia: a cross sectional study.

    Science.gov (United States)

    Abdissa, Hordofa Gutema; Lemu, Yohannes Kebede; Nigussie, Dejene Tilahun

    2014-09-26

    Prevalence of HIV and other STI is high among migrant mining workers due to factors such as dangerous working conditions, only masculine identities existence, living away from families, desolate and in hospitable place. This makes them known to be HIV and STI vulnerable group in different part of the world. But, in Ethiopia they were not thought as at risk group yet. So the aim of this study is to assess magnitude of HIV preventive behaviours and associated factors among gold miners in Sali traditional gold mining site. A cross sectional study was conducted to assess HIV preventive behavior of the mining worker. The data were collected using interviewer administered structured questionnaire adapted from other related behavioural studies. The data was entered using EPI data version 3.1 and analyzed using SPSS version 17. Multiple logistic regression was used to assess relationship of HIV preventive behavior with constructs of health belief model. A total of 393 respondents with response rate of 93.12% were participated. All of the study participants were male 393(100%), the mean age of the participant was 24.0 (± 5.13SD). Less than half of the respondents 187(47.6%) were engaged in HIV preventive behavior. Less than half (45.3%) of them have high perceived susceptibility to HIV/AIDS; majority (62.8%) of them has high perceived severity to HIV/AIDS. HIV preventive behavior is negatively associated with being in middle, higher and highest income [OR = 0.54, 95% CI: 0.21, 0.74], [OR = 0.40, 95% CI: 0.30, 0.98] and [OR = 0.39, 95% CI: 0.20, 0.77] respectively and positively associated with Completing secondary, tertiary school and self efficacy [OR = 2.66, 95% CI: 1.11, 6.41], [OR = 5.40, 95% CI: 1.54, 19] and [OR = 1.88, 95% CI: 1.18, 2.94] respectively. The HIV preventive behavior of the mining worker was low. Being engaged in sexual intercourse with one sexual partner is very low, Consistent condom use among these mining workers was low. Income, educational status

  12. Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

    DEFF Research Database (Denmark)

    Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

    2014-01-01

    , lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently...... occurring phytochemical-disease pairs and we identified 20,654 phytochemicals from 16,102 plants associated to 1,592 human disease phenotypes. We selected colon cancer as a case study and analyzed our results in three directions; i) one stop legacy knowledge-shop for the effect of food on disease, ii......) discovery of novel bioactive compounds with drug-like properties, and iii) discovery of novel health benefits from foods. This works represents a systematized approach to the association of food with health effect, and provides the phytochemical layer of information for nutritional systems biology research....

  13. Establishing Reliable miRNA-Cancer Association Network Based on Text-Mining Method

    Directory of Open Access Journals (Sweden)

    Lun Li

    2014-01-01

    Full Text Available Associating microRNAs (miRNAs with cancers is an important step of understanding the mechanisms of cancer pathogenesis and finding novel biomarkers for cancer therapies. In this study, we constructed a miRNA-cancer association network (miCancerna based on more than 1,000 miRNA-cancer associations detected from millions of abstracts with the text-mining method, including 226 miRNA families and 20 common cancers. We further prioritized cancer-related miRNAs at the network level with the random-walk algorithm, achieving a relatively higher performance than previous miRNA disease networks. Finally, we examined the top 5 candidate miRNAs for each kind of cancer and found that 71% of them are confirmed experimentally. miCancerna would be an alternative resource for the cancer-related miRNA identification.

  14. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  15. A Chaotic Home Environment Accounts for the Association between Respect for Rules Disposition and Reading Comprehension: A Twin Study.

    Science.gov (United States)

    Taylor, Jeanette; Hart, Sara A

    2014-10-01

    This study examined the association between socioemotional dispositions from the developmental propensity model and reading comprehension and whether those associations could be accounted for by level of chaos in the home. Data from 342 monozygotic and 333 same-sex dizygotic twin pairs age 7-13 years were used. A parent rated the twins on sympathy, respect for rules, negative emotionality, and daring and level of chaos in the twins' home. Reading comprehension was measured using a state-wide school assessment. Only respect for rules significantly and uniquely predicted reading comprehension. Biometric models indicated that respect for rules was positively associated with reading comprehension via the shared environment and home chaos accounted for a significant amount of that shared environmental variance even after controlling for family income. Children with higher respect for rules have better reading comprehension scores in school and this relationship owes partly to the level of chaos in the family home.

  16. Pollution of the stream waters and sediments associated with the Crucea uranium mine (East Carpathians, Romania)

    Science.gov (United States)

    Petrescu, L.; Bilal, E.; Iatan, E. L.

    2009-04-01

    standards limits. The uranium concentration ranged from a value of 0.016-mg•L-1 to 1.43-mg•L-1, with a mean of 0.365-mg•L-1. A remarkably good correlation exists between dissolved U and the total anion concentrations, indicating that uranium in these stream waters derived mainly from oxidation of uraniferous bitumen and/or dissolution of carbonates. Based on the correlation dependence (r= 0.69) between U and the sum of Ca + Mg + K + Na major cations and the linear correlation (r= 0.70) between U and silica, we find silicate weathering as an additional source of soluble uranium. The concentrations of dissolved Th are quite low, with median values of 0.015- mg•L-1. The linear variation of dissolved thorium concentration with carbonate alkalinity (r = 0.86) strongly suggests that these concentrations are due to the increase alkalinity. The metals released (U, Th and Pb) are amplified by mining activities. The pollution degree of the sediments was classified using the index of geo-accumulation (Igeo). The Igeo of U, Th and Pb presents medium and punctual high values that represent sediments with strongly to extremely polluted classification (Igeo > 6), while the rest of the elements presents concentration close to the background values or lowers to them. 71% of uranium from bottom sediments is present as primary fractions and 21% is associated to carbonates. Thorium resulted even more insoluble (94% in primary fractions). In view of the substantial mobility and bioavailability of the fractions, this is not an alarming feature. Although neither U nor Th has an appreciable "exchangeable" fraction, the isolation of specific U- and Th-rich sediment fractions helped to identify connections between bioavailability and genesis of sediments, which control ecosystem cycling of U and Th. The measurements carried out in the surroundings of a local uranium mine show that the impact of Crucea mine on water quality downstream of mining area is insignificant.

  17. The speed of learning instructed stimulus-response association rules in human: experimental data and model.

    Science.gov (United States)

    Bugmann, Guido; Goslin, Jeremy; Duchamp-Viret, Patricia

    2013-11-06

    Humans can learn associations between visual stimuli and motor responses from just a single instruction. This is known to be a fast process, but how fast is it? To answer this question, we asked participants to learn a briefly presented (200ms) stimulus-response rule, which they then had to rapidly apply after a variable delay of between 50 and 1300ms. Participants showed a longer response time with increased variability for short delays. The error rate was low and did not vary with the delay, showing that participants were able to encode the rule correctly in less than 250ms. This time is close to the fastest synaptic learning speed deemed possible by diffusive influx of AMPA receptors. Learning continued at a slower pace in the delay period and was fully completed in average 900ms after rule presentation onset, when response latencies dropped to levels consistent with basic reaction times. A neural model was proposed that explains the reduction of response times and of their variability with the delay by (i) a random synaptic learning process that generates weights of average values increasing with the learning time, followed by (ii) random crossing of the firing threshold by a leaky integrate-and-fire neuron model, and (iii) assuming that the behavioural response is initiated when all neurons in a pool of m neurons have fired their first spike after input onset. Values of m=2 or 3 were consistent with the experimental data. The proposed model is the simplest solution consistent with neurophysiological knowledge. Additional experiments are suggested to test the hypothesis underlying the model and also to explore forgetting effects for which there were indications for the longer delay conditions. This article is part of a Special Issue entitled Neural Coding 2012. © 2013 Elsevier B.V. All rights reserved.

  18. Data mining approaches for genome-wide association of mood disorders.

    Science.gov (United States)

    Pirooznia, Mehdi; Seifuddin, Fayaz; Judy, Jennifer; Mahon, Pamela B; Potash, James B; Zandi, Peter P

    2012-04-01

    Mood disorders are highly heritable forms of major mental illness. A major breakthrough in elucidating the genetic architecture of mood disorders was anticipated with the advent of genome-wide association studies (GWAS). However, to date few susceptibility loci have been conclusively identified. The genetic etiology of mood disorders appears to be quite complex, and as a result, alternative approaches for analyzing GWAS data are needed. Recently, a polygenic scoring approach that captures the effects of alleles across multiple loci was successfully applied to the analysis of GWAS data in schizophrenia and bipolar disorder (BP). However, this method may be overly simplistic in its approach to the complexity of genetic effects. Data mining methods are available that may be applied to analyze the high dimensional data generated by GWAS of complex psychiatric disorders. We sought to compare the performance of five data mining methods, namely, Bayesian networks, support vector machine, random forest, radial basis function network, and logistic regression, against the polygenic scoring approach in the analysis of GWAS data on BP. The different classification methods were trained on GWAS datasets from the Bipolar Genome Study (2191 cases with BP and 1434 controls) and their ability to accurately classify case/control status was tested on a GWAS dataset from the Wellcome Trust Case Control Consortium. The performance of the classifiers in the test dataset was evaluated by comparing area under the receiver operating characteristic curves. Bayesian networks performed the best of all the data mining classifiers, but none of these did significantly better than the polygenic score approach. We further examined a subset of single-nucleotide polymorphisms (SNPs) in genes that are expressed in the brain, under the hypothesis that these might be most relevant to BP susceptibility, but all the classifiers performed worse with this reduced set of SNPs. The discriminative accuracy of

  19. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    Directory of Open Access Journals (Sweden)

    Joanna F Dipnall

    Full Text Available Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010. Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30, serum glucose (OR 1.01; 95% CI 1.00, 1.01 and total bilirubin (OR 0.12; 95% CI 0.05, 0.28. Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016, and current smokers (p<0.001.The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling

  20. Associations between general parenting, restrictive snacking rules, and adolescent's snack intake. The roles of fathers and mothers and interparental congruence.

    Science.gov (United States)

    Gevers, Dorus W M; van Assema, Patricia; Sleddens, Ester F C; de Vries, Nanne K; Kremers, Stef P J

    2015-04-01

    Little research has been done on the role of fathers and parenting congruence between mothers and fathers. This study aimed to clarify the roles of general parenting and restrictive snacking rules set by fathers and mothers, and to explore parenting congruence in explaining adolescents' snack intake. Adolescents aged 11 to 15 completed a questionnaire assessing their perception of general parenting constructs (i.e. nurturance, structure, behavioral control, coercive control, and overprotection), restrictive snacking rules set by their fathers and mothers, and their own energy-dense snack intakes between meals. Scores for mothers were significantly higher on all constructs than for fathers, except for coercive control. Generally, higher scores on general parenting constructs were associated with higher scores on restrictive snacking rules (most of the associations being significant). Most general parenting constructs were unrelated to the respondents' number of snacks consumed. The use of restrictive snacking rules by both fathers and mothers was significantly and negatively related to respondents' snack intake. Moderation analyses indicated that high levels of incongruence between parents attenuated the favorable impact of fathers' rules and nurturance on their children's snacking, but interactions of congruence with three other paternal scales and all maternal scales were absent. Our findings indicate that both paternal and maternal general parenting and restrictive snacking rules play important roles in adolescents' snacking, and that high parental incongruence regarding restrictive snacking rules and nurturance could be undesirable. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Prediction of diffuse sulfate emissions from a former mining district and associated groundwater discharges to surface waters

    Science.gov (United States)

    Graupner, Bastian J.; Koch, Christian; Prommer, Henning

    2014-05-01

    Rivers draining mining districts are often affected by the diffuse input of polluted groundwaters. The severity and longevity of the impact depends on a wide range of factors such as the source terms, the hydraulic regime, the distance between pollutant sources and discharge points and the dilution by discharge from upstream river reaches. In this study a deterministic multi-mine life-cycle model was developed. It is used to characterize pollutant sources and to quantify the resulting current and future effects on both groundwater and river water quality. Thereby sulfate acts as proxy for mining-related impacts. The model application to the Lausitz mining district (Germany) shows that the most important factors controlling concentrations and discharge of sulfate are mixing/dilution with ambient groundwater and the rates of biological sulfate reduction during subsurface transport. In contrast, future impacts originating from the unsaturated zones of the mining dumps showed to be of little importance due to the high age of the mining dumps and the associated depletion in reactive iron-sulfides. The simulations indicate that currently the groundwater borne diffuse input of sulfate into the rivers Kleine Spree and Spree is ∼2200 t/years. Our predictions suggest a future increase to ∼11,000 t/years within the next 40 years. Depending on river discharge rates this represents an increase in sulfate concentration of 40-300 mg/L. A trend reversal for the surface water discharge is not expected before 2050.

  2. Restoring forests and associated ecosystem services on appalachian coal surface mines.

    Science.gov (United States)

    Zipper, Carl E; Burger, James A; Skousen, Jeffrey G; Angel, Patrick N; Barton, Christopher D; Davis, Victor; Franklin, Jennifer A

    2011-05-01

    Surface coal mining in Appalachia has caused extensive replacement of forest with non-forested land cover, much of which is unmanaged and unproductive. Although forested ecosystems are valued by society for both marketable products and ecosystem services, forests have not been restored on most Appalachian mined lands because traditional reclamation practices, encouraged by regulatory policies, created conditions poorly suited for reforestation. Reclamation scientists have studied productive forests growing on older mine sites, established forest vegetation experimentally on recent mines, and identified mine reclamation practices that encourage forest vegetation re-establishment. Based on these findings, they developed a Forestry Reclamation Approach (FRA) that can be employed by coal mining firms to restore forest vegetation. Scientists and mine regulators, working collaboratively, have communicated the FRA to the coal industry and to regulatory enforcement personnel. Today, the FRA is used routinely by many coal mining firms, and thousands of mined hectares have been reclaimed to restore productive mine soils and planted with native forest trees. Reclamation of coal mines using the FRA is expected to restore these lands' capabilities to provide forest-based ecosystem services, such as wood production, atmospheric carbon sequestration, wildlife habitat, watershed protection, and water quality protection to a greater extent than conventional reclamation practices.

  3. Regulatory processes associated with metal-mine development in Alaska: A case study of the Red Dog Mine. Open File Report

    Energy Technology Data Exchange (ETDEWEB)

    Hemming, J.E.; Cocklan-Vendl, M.

    1992-09-01

    Regulatory processes associated with development of a world class lead-zinc mine, Red Dog Mine, in northwestern Alaska were reviewed and evaluated. Informal interviews with key project personnel, consultants, and agency field and permitting specialists provided perspective on the regulatory successes and failures of the project. Due to potential impacts to air quality, water quality, wetlands, and National Park lands, an Environmental Impact Statement was required. By developing a comprehensive baseline of information on the existing environment to aid in minimizing impacts during project siting/design and through regular coordination of evolving project plans with regulatory agencies, the mine developers were able to acquire necessary permits in a timely and cost effective manner. The only major exceptions occurred when inadequate information was collected on dispersal of airborne particulates, rates of surface water run-off, and groundwater quality. These deficiencies resulted in the need for emergency design changes, unscheduled construction, additional environmental monitoring costs, and delays in issuance of the NPDES permit.

  4. Inductive Querying with Virtual Mining Views

    Science.gov (United States)

    Blockeel, Hendrik; Calders, Toon; Fromont, Élisa; Prado, Adriana; Goethals, Bart; Robardet, Céline

    In an inductive database, one can not only query the data stored in the database, but also the patterns that are implicitly present in these data. In this chapter, we present an inductive database system in which the query language is traditional SQL. More specifically, we present a system in which the user can query the collection of all possible patterns as if they were stored in traditional relational tables. We show how such tables, or mining views, can be developed for three popular data mining tasks, namely itemset mining, association rule discovery and decision tree learning. To illustrate the interactive and iterative capabilities of our system, we describe a complete data mining scenario that consists in extracting knowledge from real gene expression data, after a pre-processing phase.

  5. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

  6. An incremental high-utility mining algorithm with transaction insertion.

    Science.gov (United States)

    Lin, Jerry Chun-Wei; Gan, Wensheng; Hong, Tzung-Pei; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

  7. The Apriori Stochastic Dependency Detection (ASDD) algorithm for learning Stochastic logic rules

    OpenAIRE

    Child, C. H. T.; Stathis, K.

    2005-01-01

    Apriori Stochastic Dependency Detection (ASDD) is an algorithm for fast induction of stochastic logic rules from a database of observations made by an agent situated in an environment. ASDD is based on features of the Apriori algorithm for mining association rules in large databases of sales transactions [1] and the MSDD algorithm for discovering stochastic dependencies in multiple streams of data [15]. Once these rules have been acquired the Precedence algorithm assigns operator precedence w...

  8. Identifying gene-disease associations using centrality on a literature mined gene-interaction network.

    Science.gov (United States)

    Ozgür, Arzucan; Vu, Thuy; Erkan, Günes; Radev, Dragomir R

    2008-07-01

    Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org.

  9. Mining Long, Sharable Patterns in Trajectories of Moving Objects

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2006-01-01

    The efficient analysis of spatio–temporal data, generated by moving objects, is an es- sential requirement for intelligent locationbased services. Spatio-temporal rules can be found by constructing spatio–temporal baskets, from which traditional association rule mining methods can discover spatio...... the generation of the exponential number of subroutes of long routes. A SQL–based implementation is described, and experiments on real life data show the effectiveness of the method....

  10. Beating Obesity: Factors Associated with Interest in Workplace Weight Management Assistance in the Mining Industry.

    Science.gov (United States)

    Street, Tamara D; Thomas, Drew L

    2017-03-01

    Rates of overweight and obese Australians are high and continue to rise, putting a large proportion of the population at risk of chronic illness. Examining characteristics associated with preference for a work-based weight-loss program will enable employers to better target programs to increase enrolment and benefit employees' health and fitness for work. A cross-sectional survey was undertaken at two Australian mining sites. The survey collected information on employee demographics, health characteristics, work characteristics, stages of behavior change, and preference for workplace assistance with reaching a healthy weight. A total of 897 employees participated; 73.7% were male, and 68% had a body mass index in the overweight or obese range. Employees at risk of developing obesity-related chronic illnesses (based on high body mass index) were more likely to report preference for weight management assistance than lower risk employees. This indicates that, even in the absence of workplace promotion for weight management, some at risk employees want workplace assistance. Employees who were not aware of a need to change their current nutrition or physical activity behaviors were less likely to seek assistance. This indicates that practitioners need to communicate the negative effects of excess weight and promote the benefits of a healthy lifestyle to increase the likelihood of weight management. Weight management programs should provide information, motivation. and trouble-shooting assistance to meet the needs of at-risk mining employees, including those who are attempting to change and maintain behaviors to achieve a healthy weight and be suitably fit for work.

  11. Application of text mining for customer evaluations in commercial banking

    Science.gov (United States)

    Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

    2015-07-01

    Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.

  12. Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

    Science.gov (United States)

    Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

    2014-01-01

    Awareness that disease susceptibility is not only dependent on genetic make up, but can be affected by lifestyle decisions, has brought more attention to the role of diet. However, food is often treated as a black box, or the focus is limited to few, well-studied compounds, such as polyphenols, lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently occurring phytochemical-disease pairs and we identified 20,654 phytochemicals from 16,102 plants associated to 1,592 human disease phenotypes. We selected colon cancer as a case study and analyzed our results in three directions; i) one stop legacy knowledge-shop for the effect of food on disease, ii) discovery of novel bioactive compounds with drug-like properties, and iii) discovery of novel health benefits from foods. This works represents a systematized approach to the association of food with health effect, and provides the phytochemical layer of information for nutritional systems biology research. PMID:24453957

  13. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care.

    Science.gov (United States)

    Ismail, Walaa N; Hassan, Mohammad Mehedi

    2017-04-26

    The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs) and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants' health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth) to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  14. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care

    Directory of Open Access Journals (Sweden)

    Walaa N. Ismail

    2017-04-01

    Full Text Available The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants’ health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  15. Excess screen time in US children: association with family rules and alternative activities.

    Science.gov (United States)

    Gingold, Janet A; Simon, Alan E; Schoendorf, Kenneth C

    2014-01-01

    We describe the association of screen time in excess of American Academy of Pediatrics recommendations (≤2 h/d) with family television-use policies and regular nonscreen activities among US school-aged children. Data from the 2007 National Survey of Children's Health were used. The sum of minutes spent on television, videos, video games, and recreational computer use was calculated for children 6 to 17 years old. Bivariate and multivariate logistic regression models were used to calculate relative odds of exceeding American Academy of Pediatrics guidelines and of heavy screen use (>4 h/d) for varying family media-use policies and frequency of alternative activities (physical activity and family meals). In all, 49% of school-aged children had screen time >2 h/d and 16% had screen time >4 h/d. Lower frequency of family meals, presence of TV in the bedroom, absence of rules about TV viewing, and less physical activity were associated with both >2 and >4 hours per day of screen time.

  16. Risk of hepatotoxicity associated with the use of telithromycin: a signal detection using data mining algorithms.

    Science.gov (United States)

    Chen, Yan; Guo, Jeff J; Healy, Daniel P; Lin, Xiaodong; Patel, Nick C

    2008-12-01

    With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. To detect the safety signal regarding the reporting of hepatotoxicity associated with the use of telithromycin using 4 commonly employed data mining algorithms (DMAs). Based on the Adverse Events Reporting System (AERS) database of the Food and Drug Administration, 4 DMAs, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the information component (IC), and the Gamma Poisson Shrinker (GPS), were applied to examine the association between the reporting of hepatotoxicity and the use of telithromycin. The study period was from the first quarter of 2004 to the second quarter of 2006. The reporting of hepatotoxicity was identified using the preferred terms indexed in the Medical Dictionary for Regulatory Activities. The drug name was used to identify reports regarding the use of telithromycin. A total of 226 reports describing hepatotoxicity associated with the use of telithromycin were recorded in the AERS. A safety problem of telithromycin associated with increased reporting of hepatotoxicity was clearly detected by 4 algorithms as early as 2005, signaling the problem in the first quarter by the ROR and the IC, in the second quarter by the PRR, and in the fourth quarter by the GPS. A safety signal was indicated by the 4 DMAs suggesting an association between the reporting of hepatotoxicity and the use of telithromycin. Given the wide use of telithromycin and serious consequences of hepatotoxicity, clinicians should be cautious when selecting telithromycin for treatment of an infection. In addition, further observational studies are required to evaluate the utility of signal detection systems for early recognition of serious, life-threatening, low-frequency drug-induced adverse events.

  17. Association between borderline dysnatremia and mortality insight into a new data mining approach.

    Science.gov (United States)

    Girardeau, Yannick; Jannot, Anne-Sophie; Chatellier, Gilles; Saint-Jean, Olivier

    2017-11-22

    Even small variations of serum sodium concentration may be associated with mortality. Our objective was to confirm the impact of borderline dysnatremia for patients admitted to hospital on in-hospital mortality using real life care data from our electronic health record (EHR) and a phenome-wide association analysis (PheWAS). Retrospective observational study based on patient data admitted to Hôpital Européen George Pompidou, between 01/01/2008 and 31/06/2014; including 45,834 patients with serum sodium determinations on admission. We analyzed the association between dysnatremia and in-hospital mortality, using a multivariate logistic regression model to adjust for classical potential confounders. We performed a PheWAS to identify new potential confounders. Hyponatremia and hypernatremia were recorded for 12.0% and 1.0% of hospital stays, respectively. Adjusted odds ratios (ORa) for severe, moderate and borderline hyponatremia were 3.44 (95% CI, 2.41-4.86), 2.48 (95% CI, 1.96-3.13) and 1.98 (95% CI, 1.73-2.28), respectively. ORa for severe, moderate and borderline hypernatremia were 4.07 (95% CI, 2.92-5.62), 4.42 (95% CI, 2.04-9.20) and 3.72 (95% CI, 1.53-8.45), respectively. Borderline hyponatremia (ORa = 1.57 95% CI, 1.35-1.81) and borderline hypernatremia (ORa = 3.47 95% CI, 2.43-4.90) were still associated with in-hospital mortality after adjustment for classical and new confounding factors identified through the PheWAS analysis. Borderline dysnatremia on admission are independently associated with a higher risk of in-hospital mortality. By using medical data automatically collected in EHR and a new data mining approach, we identified new potential confounding factors that were highly associated with both mortality and dysnatremia.

  18. [Effect of Chinese Herbs Used in Treating Multiple Sclerosis on T Subsets Using Association Rules].

    Science.gov (United States)

    Zhang, Qi; Li, Tao; Xu, Yong-gang; Yang, Xiao-hong

    2016-04-01

    To analyze the effect of Chinese herbs used by Prof. LI Tao on peripheral blood T subsets in treating multiple sclerosis (MS) by using association rules and statistical methods, thereby providing evidence for optimizing prescriptions. Data of MS inpatients and outpatients recorded by data collecting system, Xiyuan Hospital, China Academy of Chinese Medical Sciences were resorted. The relationship between Chinese herbs and T cell subsets were analyzed using SPSS17.0 and Aprior module in SPSS Clementine 12.0. Radix bupleuri, Radix Paeoniae alba, Fructus Aurantii, Atractylodes, and Radix Glycyrrhizae were most commonly used herbal combinations.Radix Aconiti lateralis preparata and Rhizoma Smilacis glabrae were often added. Radix Aconiti lateralis preparata was associated with decreased Th1 cells (confidence level 83.78%, supportive level 36.26%). Decreased Th1 cell was associated with Radix Aconiti lateralis preparata (confidence level 71.26%, supportive level 36.26%).Radix Aconiti lateralis preparata was obviously associated with decreased Th1 cells. Radix Bupleuri, Radix Paeoniae alba, bitter orange, Atractylodes , Radix glycyrrhizae, and Radix Aconiti lateralis preparata could reduce peripheral blood Th1 subsets of MS patients and elevate Th2 subsets (all P < 0.01). The herbal combination of Radix Bupleuri, Radix Paeoniae alba, Fructus Aurantii, Atractylodes, Radix Glycyrrhizae, Rhizoma Smilacis glabrae, and Radix Aconiti lateralis preparata could lower peripheral blood Th1 cells and elevate Th2 cells, and prevent the relapse of MS possibly by reducing Th1 cells and elevating Th2 cells. Especially Radix Aconiti lateralis preparata played important roles in aforesaid changes of Th1 and Th2.

  19. Costs of abandoned coal mine reclamation and associated recreation benefits in Ohio.

    Science.gov (United States)

    Mishra, Shruti K; Hitzhusen, Frederick J; Sohngen, Brent L; Guldmann, Jean-Michel

    2012-06-15

    Two hundred years of coal mining in Ohio have degraded land and water resources, imposing social costs on its citizens. An interdisciplinary approach employing hydrology, geographic information systems, and a recreation visitation function model, is used to estimate the damages from upstream coal mining to lakes in Ohio. The estimated recreational damages to five of the coal-mining-impacted lakes, using dissolved sulfate as coal-mining-impact indicator, amount to $21 Million per year. Post-reclamation recreational benefits from reducing sulfate concentrations by 6.5% and 15% in the five impacted lakes were estimated to range from $1.89 to $4.92 Million per year, with a net present value ranging from $14.56 Million to $37.79 Million. A benefit costs analysis (BCA) of recreational benefits and coal mine reclamation costs provides some evidence for potential Pareto improvement by investing limited resources in reclamation projects. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. The conceptual basis of function learning and extrapolation: comparison of rule-based and associative-based models.

    Science.gov (United States)

    McDaniel, Mark A; Busemeyer, Jerome R

    2005-02-01

    The purpose of this article is to provide a foundation for a more formal, systematic, and integrative approach to function learning that parallels the existing progress in category learning. First, we note limitations of existing formal theories. Next, we develop several potential formal models of function learning, which include expansion of classic rule-based approaches and associative-based models. We specify for the first time psychologically based learning mechanisms for the rule models. We then present new, rigorous tests of these competing models that take into account order of difficulty for learning different function forms and extrapolation performance. Critically, detailed learning performance was also used to conduct the model evaluations. The results favor a hybrid model that combines associative learning of trained input-prediction pairs with a rule-based output response for extrapolation (EXAM).

  1. Goal directed worry rules are associated with distinct patterns of amygdala functional connectivity and vagal modulation during perseverative cognition

    Directory of Open Access Journals (Sweden)

    Frances Meeten

    2016-11-01

    Full Text Available Excessive and uncontrollable worry is a defining feature of Generalized Anxiety Disorder. An important endeavor in the treatment of pathological worry is to understand why some people are unable to stop worrying once they have started. Worry perseveration is associated with a tendency to deploy goal-directed worry rules (known as ‘as many as can’ worry rules; AMA. These require attention to the goal of the worry task and continuation of worry until the aims of the ‘worry bout’ are achieved. This study examined the association between the tendency to use AMA worry rules and neural and autonomic responses to a perseverative cognition induction. To differentiate processes underlying AMA worry rule use from trait worry, we also examined the relationship between scores on the Penn State Worry Questionnaire and neural and autonomic responses following the same induction. We used resting-state functional magnetic resonance brain imaging while measuring emotional bodily arousal from heart rate variability (where decreased HRV indicates stress-related parasympathetic withdrawal in 19 patients with GAD and 21 control participants. Seed-based analyses were conducted to quantify brain changes in functional connectivity with the amygdala. The tendency to adopt an AMA worry rule was associated with validated measures of worry, anxiety, depression, and rumination. AMA worry rule endorsement predicted a stronger decrease in HRV and was positively associated with increased connectivity between right amygdala and locus coeruleus, a brainstem noradrenergic projection nucleus. Higher AMA scores were also associated with increased connectivity between amygdala and rostral superior frontal gyrus. Higher PSWQ scores amplified decreases in functional connectivity between right amygdala and subcallosal cortex, bilateral inferior frontal gyrus, middle frontal gyrus, and areas of parietal cortex. Our results identify neural mechanisms underlying the deployment of

  2. Goal Directed Worry Rules Are Associated with Distinct Patterns of Amygdala Functional Connectivity and Vagal Modulation during Perseverative Cognition.

    Science.gov (United States)

    Meeten, Frances; Davey, Graham C L; Makovac, Elena; Watson, David R; Garfinkel, Sarah N; Critchley, Hugo D; Ottaviani, Cristina

    2016-01-01

    Excessive and uncontrollable worry is a defining feature of Generalized Anxiety Disorder (GAD). An important endeavor in the treatment of pathological worry is to understand why some people are unable to stop worrying once they have started. Worry perseveration is associated with a tendency to deploy goal-directed worry rules (known as "as many as can" worry rules; AMA). These require attention to the goal of the worry task and continuation of worry until the aims of the "worry bout" are achieved. This study examined the association between the tendency to use AMA worry rules and neural and autonomic responses to a perseverative cognition induction. To differentiate processes underlying the AMA worry rule use from trait worry, we also examined the relationship between scores on the Penn State Worry Questionnaire (PSWQ) and neural and autonomic responses following the same induction. We used resting-state functional magnetic resonance brain imaging (fMRI) while measuring emotional bodily arousal from heart rate variability (where decreased HRV indicates stress-related parasympathetic withdrawal) in 19 patients with GAD and 21 control participants. Seed-based analyses were conducted to quantify brain changes in functional connectivity (FC) with the amygdala. The tendency to adopt an AMA worry rule was associated with validated measures of worry, anxiety, depression and rumination. AMA worry rule endorsement predicted a stronger decrease in HRV and was positively associated with increased connectivity between right amygdala and locus coeruleus (LC), a brainstem noradrenergic projection nucleus. Higher AMA scores were also associated with increased connectivity between amygdala and rostral superior frontal gyrus. Higher PSWQ scores amplified decreases in FC between right amygdala and subcallosal cortex, bilateral inferior frontal gyrus, middle frontal gyrus, and areas of parietal cortex. Our results identify neural mechanisms underlying the deployment of AMA worry

  3. Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm.

    Science.gov (United States)

    Tang, Jen-Yang; Chuang, Li-Yeh; Hsi, Edward; Lin, Yu-Da; Yang, Cheng-Hong; Chang, Hsueh-Wei

    2013-01-01

    This study computationally determines the contribution of clinicopathologic factors correlated with 5-year survival in oral squamous cell carcinoma (OSCC) patients primarily treated by surgical operation (OP) followed by other treatments. From 2004 to 2010, the program enrolled 493 OSCC patients at the Kaohsiung Medical Hospital University. The clinicopathologic records were retrospectively reviewed and compared for survival analysis. The Apriori algorithm was applied to mine the association rules between these factors and improved survival. Univariate analysis of demographic data showed that grade/differentiation, clinical tumor size, pathology tumor size, and OP grouping were associated with survival longer than 36 months. Using the Apriori algorithm, multivariate correlation analysis identified the factors that coexistently provide good survival rates with higher lift values, such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP. Without the OP, the lift values are lower. In conclusion, this hospital-based analysis suggests that early OP and other treatments starting from OP are the key to improving the survival of OSCC patients, especially for early stage tongue cancer with moderate differentiation, having a better survival (>36 months) with varied OP approaches.

  4. Identifying the Association Rules between Clinicopathologic Factors and Higher Survival Performance in Operation-Centric Oral Cancer Patients Using the Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Jen-Yang Tang

    2013-01-01

    Full Text Available This study computationally determines the contribution of clinicopathologic factors correlated with 5-year survival in oral squamous cell carcinoma (OSCC patients primarily treated by surgical operation (OP followed by other treatments. From 2004 to 2010, the program enrolled 493 OSCC patients at the Kaohsiung Medical Hospital University. The clinicopathologic records were retrospectively reviewed and compared for survival analysis. The Apriori algorithm was applied to mine the association rules between these factors and improved survival. Univariate analysis of demographic data showed that grade/differentiation, clinical tumor size, pathology tumor size, and OP grouping were associated with survival longer than 36 months. Using the Apriori algorithm, multivariate correlation analysis identified the factors that coexistently provide good survival rates with higher lift values, such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP. Without the OP, the lift values are lower. In conclusion, this hospital-based analysis suggests that early OP and other treatments starting from OP are the key to improving the survival of OSCC patients, especially for early stage tongue cancer with moderate differentiation, having a better survival (>36 months with varied OP approaches.

  5. Application of a New Probabilistic Model for Mining Implicit Associated Cancer Genes from OMIM and Medline

    Directory of Open Access Journals (Sweden)

    Shanfeng Zhu

    2006-01-01

    Full Text Available An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As bio-medical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man. We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html

  6. X3-Miner: Mining Patterns from XML Database

    NARCIS (Netherlands)

    Tan, H.; Dillon, T.; Feng, L.; Chang, E.; Hadzic, F.

    2005-01-01

    An XML enabled framework for the representation of association rules in databases was first presented in [4]. In Frequent Structure Mining (FSM), one of the popular approaches is to use graph matching that use data structures such as the adjacency matrix [7] or adjacency list [8]. Another approach

  7. The utility of web mining for epidemiological research: studying the association between parity and cancer risk.

    Science.gov (United States)

    Tourassi, Georgia; Yoon, Hong-Jun; Xu, Songhua; Han, Xuesong

    2016-05-01

    The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Unstable Malaria Transmission in the Southern Peruvian Amazon and Its Association with Gold Mining, Madre de Dios, 2001-2012.

    Science.gov (United States)

    Sanchez, Juan F; Carnero, Andres M; Rivera, Esteban; Rosales, Luis A; Baldeviano, G Christian; Asencios, Jorge L; Edgel, Kimberly A; Vinetz, Joseph M; Lescano, Andres G

    2017-02-08

    The reemergence of malaria in the last decade in Madre de Dios, southern Peruvian Amazon basin, was accompanied by ecological, political, and socioeconomic changes related to the proliferation of illegal gold mining. We conducted a secondary analysis of passive malaria surveillance data reported by the health networks in Madre de Dios between 2001 and 2012. We calculated the number of cases of malaria by year, geographic location, intensity of illegal mining activities, and proximity of health facilities to the Peru-Brazil Interoceanic Highway. During 2001-2012, 203,773 febrile cases were identified in Madre de Dios, of which 30,811 (15.1%) were confirmed cases of malaria; all but 10 cases were due to Plasmodium vivax Cases of malaria rose rapidly between 2004 and 2007, reached 4,469 cases in 2005, and then declined after 2010 to pre-2004 levels. Health facilities located in areas of intense illegal gold mining reported 30-fold more cases than those in non-mining areas (ratio = 31.54, 95% confidence interval [CI] = 19.28, 51.60). Finally, health facilities located > 1 km from the Interoceanic Highway reported significantly more cases than health facilities within this distance (ratio = 16.20, 95% CI = 8.25, 31.80). Transmission of malaria in Madre de Dios is unstable, geographically heterogeneous, and strongly associated with illegal gold mining. These findings highlight the importance of spatially oriented interventions to control malaria in Madre de Dios, as well as the need for research on malaria transmission in illegal gold mining camps. © The American Society of Tropical Medicine and Hygiene.

  9. The effect of the depth and groundwater on the formation of sinkholes or ground subsidence associated with abandoned room and pillar lignite mines under static and dynamic conditions

    Directory of Open Access Journals (Sweden)

    Ö. Aydan

    2015-11-01

    Full Text Available It is well known that some sinkholes or subsidence take place from time to time in the areas where abandoned room and pillar type mines exist. The author has been involved with the stability of abandoned mines beneath urbanized residential areas in Tokai region and there is a great concern about the stability of these abandoned mines during large earthquakes as well as in the long term. The 2003 Miyagi Hokubu and 2011 Great East Japan earthquakes caused great damage to abandoned mines and resulted in many collapses. The author presents the effect of the depth and groundwater on the formation of sinkholes or ground subsidence associated with abandoned room and pillar lignite mines under static and dynamic conditions and discusses the implications on the areas above abandoned lignite mines in this paper.

  10. An Improved Pearson’s Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

    Directory of Open Access Journals (Sweden)

    P. M. Booma

    2014-01-01

    Full Text Available Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC. Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  11. Association between Benzodiazepine Use and Dementia: Data Mining of Different Medical Databases.

    Science.gov (United States)

    Takada, Mitsutaka; Fujimoto, Mai; Hosomi, Kouichi

    2016-01-01

    Purpose: Some studies have suggested that the use of benzodiazepines in the elderly is associated with an increased risk of dementia. However, this association might be due to confounding by indication and reverse causation. To examine the association between benzodiazepine anxiolytic drug use and the risk of dementia, we conducted data mining of a spontaneous reporting database and a large organized database of prescriptions. Methods: Data from the US Food and Drug Administration Adverse Event Reporting System (FAERS) from the first quarter of 2004 through the end of 2013 and data from the Canada Vigilance Adverse Reaction Online Database from the first quarter of 1965 through the end of 2013 were used for the analyses. The reporting odds ratio (ROR) and information component (IC) were calculated. In addition, prescription sequence symmetry analysis (PSSA) was performed to identify the risk of dementia after using benzodiazepine anxiolytic drugs over the period of January 2006 to May 2014. Results: Benzodiazepine use was found to be associated with dementia in analyses using the FAERS database (ROR: 1.63, 95% CI: 1.61-1.64; IC: 0.66, 95% CI: 0.65-0.67) and the Canada Vigilance Adverse Reaction Online Database (ROR: 1.88, 95% CI: 1.83-1.94; IC: 0.85, 95% CI: 0.80-0.89). ROR and IC values increased with the duration of action of benzodiazepines. In the PSSA, a significant association was found, with adjusted sequence ratios of 1.24 (1.05-1.45), 1.20 (1.06-1.37), 1.23 (1.11-1.37), 1.34 (1.23-1.47), 1.41 (1.29-1.53), and 1.44 (1.33-1.56) at intervals of 3, 6, 12, 24, 36, and 48 months, respectively. Furthermore, the additional PSSA, in which patients who initiated a new treatment with benzodiazepines and anti-dementia drugs within 12- and 24-month periods were excluded from the analysis, demonstrated significant associations of benzodiazepine use with dementia risk. Conclusion: Multi-methodological approaches using different methods, algorithms, and databases suggest

  12. Event metadata records as a testbed for scalable data mining

    Science.gov (United States)

    van Gemmeren, P.; Malon, D.

    2010-04-01

    At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  13. Event metadata records as a testbed for scalable data mining

    Energy Technology Data Exchange (ETDEWEB)

    Gemmeren, P van; Malon, D, E-mail: gemmeren@anl.go [Argonne National Laboratory, Argonne, Illinois 60439 (United States)

    2010-04-01

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  14. Public exposure to hazards associated with natural radioactivity in open-pit mining in Ghana.

    Science.gov (United States)

    Darko, E O; Faanu, A; Awudu, A R; Emi-Reynolds, G; Yeboah, J; Oppon, O C; Akaho, E H K

    2010-01-01

    The results of studies carried out on public exposure contribution from naturally occurring radioactive materials (NORMS) in two open-pit mines in the Western and Ashanti regions of Ghana are reported. The studies were carried out under International Atomic Energy Agency-supported Technical Co-operation Project GHA/9/005. Measurements were made on samples of water, soil, ore, mine tailings and air using gamma spectrometry. Solid-state nuclear track detectors were used for radon concentration measurements. Survey was also carried out to determine the ambient gamma dose rate in the vicinity of the mines and surrounding areas. The effective doses due to external gamma irradiation, ingestion of water and inhalation of radon and ore dusts were calculated for the two mines. The average annual effective dose was found to be 0.30 +/- 0.06 mSv. The result was found to be within the levels published by other countries. The study provides a useful information and data for establishing a comprehensive framework to investigate other mines and develop guidelines for monitoring and control of NORMS in the mining industry and the environment as a whole in Ghana.

  15. Some properties of evaluated implications used in knowledge-based systems and data-mining

    Directory of Open Access Journals (Sweden)

    Jiri Ivanek

    2012-07-01

    Full Text Available The core of expert knowledge is typically represented by a set of rules (implications assigned with weights specifying their (uncertainties. The task of inference mechanism in such rule-based expert systems can be analyzed from the many-valued (fuzzy logic perspective. On the other hand, implicational relations between two Boolean attributes derived from data (association rules are quantified in data-mining procedures by [0,1]-valued functions defined on four-fold tables corresponding to pairs of the attributes. In the paper, some theoretical properties connecting these two types of many-valued implications are presented. Obtained results can serve as a basis for an integration of data-mining procedures discovering association rules and rule-based knowledge systems.

  16. Sediment quality in the Guadalquivir estuary: sublethal effects associated with the Aznalcóllar mining spill.

    Science.gov (United States)

    Riba, I; González de Canales, M; Forja, J M; DelValls, T A

    2004-01-01

    As a complementary assessment of the impact of the Aznalcóllar mining spill on the Guadalquivir estuary two different sediment toxicity tests using fish (Solea senegalensis) and clams (Scrobicularia plana) were performed. The histopathological alterations by recording lesions at 15 and 30 days in fish to the gills, liver, gut and kidney and at 14 days in clams to the gills and gut were used to determine the adverse effects associated with the contaminants bound to sediments. The lesions measured at different tissues in both organisms show that the enrichment of heavy metals from the mining spill stressed some areas in the ecosystem of the estuary. The comparison of these effects with those lethal effects detected in the same samples using a multivariate analysis approach permits to identify the adverse effects associated with the accidental spill on the estuary. The incidence of the impact, located in specific areas of the estuary show an acute effect related to the spill.

  17. Regional scale selenium loading associated with surface coal mining, Elk Valley, British Columbia, Canada.

    Science.gov (United States)

    Wellen, Christopher C; Shatilla, Nadine J; Carey, Sean K

    2015-11-01

    Selenium (Se) concentrations in surface water downstream of surface mining operations have been reported at levels in excess of water quality guidelines for the protection of wildlife. Previous research in surface mining environments has focused on downstream water quality impacts, yet little is known about the fundamental controls on Se loading. This study investigated the relationship between mining practices, stream flows and Se concentrations using a SPAtially Referenced Regression On Watershed attributes (SPARROW) model. This work is part of a R&D program examining the influence of surface coal mining on hydrological and water quality responses in the Elk Valley, British Columbia, Canada, aimed at informing effective management responses. Results indicate that waste rock volume, a product of mining activity, accounted for roughly 80% of the Se load from the Elk Valley, while background sources accounted for roughly 13%. Wet years were characterized by more than twice the Se load of dry years. A number of variables regarding placement of waste rock within the catchments, length of buried streams, and the construction of rock drains did not significantly influence the Se load. The age of the waste rock, the proportion of waste rock surface reclaimed, and the ratio of waste rock pile side area to top area all varied inversely with the Se load from watersheds containing waste rock. These results suggest operational practices that are likely to reduce the release of Se to surface waters. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Phylogenetic Analysis of Microbial Populations Associated with Iron Cycling in the Piquette Mine in Tennyson, Wisconsin

    Science.gov (United States)

    Chan, C. S.; Skatvold, A. M.; Labrenz, M.; Welch, S. A.; Banfield, J. F.

    2001-12-01

    Neutrophilic iron-oxidizing microorganisms have attracted attention recently due to the geological significance of biological iron cycling and the potential for formation of iron oxide biosignatures. We have been studying an iron oxide-rich biofilm and associated groundwater located in a Mississippi Valley Type lead-zinc mine that was closed and allowed to flood about 30 years ago. The local environment appears to be at a boundary between oxic and anoxic waters. SEM and TEM analysis reveals that the bulk of the biofilm is composed of iron oxide-coated stalks and sheaths characteristic of Leptothrix and Gallionella. In addition, novel iron oxide filaments, some of which are associated with microorganisms, are found in parts of the biofilm and throughout the cloudy layer of water above. The volume of the biofilm ( ~20 cm thick over sections of the tunnel floor) suggests that microorganisms are able to form large deposits of iron oxides, such as those in the geologic record, over short time scales. The conditions at this site also suggest that iron oxidation is an important biological metabolism in the subsurface. In order to understand the diversity of iron oxidizers and their role in the microbial community, we performed a 16S rRNA phylogenetic analysis of DNA extracted from the iron oxide-rich biofilm and the surrounding water. Analysis shows that the diversity in the biofilm is high. 186 clones were screened: 90 from DNA amplified using universal primers and 96 using bacterial primers. In both cases, ~40% of the clones possessed distinct sequences (<98% similarity). These sequences are affiliated with a variety of taxonomic groups, notably Nitrospira and β -Proteobacteria (including close relatives of Gallionella ferruginea), but also Acidobacteria, Actinobacteria, Bacteriodetes, Planctomyces, and α -, δ - and γ -Proteobacteria. The dominant physiologies apparent in this clone library are iron oxidation, nitrification, and denitrification. In addition

  19. Compass: A hybrid method for clinical and biobank data mining

    DEFF Research Database (Denmark)

    Krysiak-Baltyn, Konrad; Petersen, Thomas Nordahl; Audouze, Karine Marie Laure

    2014-01-01

    We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply...... Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically...... significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we...

  20. Morphological variation in Djalmabatista (Diptera: chironomidae) associated with coal-strip-mine ponds

    Energy Technology Data Exchange (ETDEWEB)

    Tennessen, K.J.; Gottfried, P.K.

    1983-03-01

    Larvae of Djalmabatista pulcher (Johannsen) possessing abnormally-shaped ligulas were found in abandoned coal strip mine ponds near Brilliant, Alabama. Of the total 1472 larvae examined, 50 possessed an abnormal ligula (3.4%), a frequency greater than previously reported for chironomids. Abnormalities were found in larval instars II, III, and IV. Based on monthly samples from June to October 1978, the combined frequency of abnormalities in three strip mine ponds (3.04%) was not significantly different from the frequency at the unmined reference site, Marion County Lake (1.85%) (P = 0.36). The frequencies of abnormalities were not significantly correlated with any measured water quality of biological parameter related to strip mine activity.

  1. Analysis of roof and pillar failure associated with weak floor at a limestone mine.

    Science.gov (United States)

    Murphy, Michael M; Ellenberger, John L; Esterhuizen, Gabriel S; Miller, Tim

    2016-05-01

    A limestone mine in Ohio has had instability problems that have led to massive roof falls extending to the surface. This study focuses on the role that weak, moisture-sensitive floor has in the instability issues. Previous NIOSH research related to this subject did not include analysis for weak floor or weak bands and recommended that when such issues arise they should be investigated further using a more advanced analysis. Therefore, to further investigate the observed instability occurring on a large scale at the Ohio mine, FLAC3D numerical models were employed to demonstrate the effect that a weak floor has on roof and pillar stability. This case study will provide important information to limestone mine operators regarding the impact of weak floor causing the potential for roof collapse, pillar failure, and subsequent subsidence of the ground surface.

  2. The HIPAA privacy rule and HR/benefits outsourcing: does the business associate label belong on your recordkeeper?

    Science.gov (United States)

    Hilger, Denise D

    2004-01-01

    Employers that sponsor group health plans and serve as the plan administrator of those plans are required by the HIPAA Privacy Rule to execute business associate contracts with vendors that provide services on behalf of the plans. The business associate contracts must contain many specific provisions regarding the protection, use and disclosure of health information. This article looks at the implications of imposing business associate contract obligations on an integrated HR and benefits-outsourcing recordkeeper and cautions employers against an overly broad application of the requirements.

  3. Molecular diversity of the methanotrophic bacteria communities associated with disused tin-mining ponds in Kampar, Perak, Malaysia.

    Science.gov (United States)

    Sow, S L S; Khoo, G; Chong, L K; Smith, T J; Harrison, P L; Ong, H K A

    2014-10-01

    In a previous study, notable differences of several physicochemical properties, as well as the community structure of ammonia oxidizing bacteria as judged by 16S rRNA gene analysis, were observed among several disused tin-mining ponds located in the town of Kampar, Malaysia. These variations were associated with the presence of aquatic vegetation as well as past secondary activities that occurred at the ponds. Here, methane oxidizing bacteria (MOB), which are direct participants in the nutrient cycles of aquatic environments and biological indicators of environmental variations, have been characterised via analysis of pmoA functional genes in the same environments. The MOB communities associated with disused tin-mining ponds that were exposed to varying secondary activities were examined in comparison to those in ponds that were left to nature. Comparing the sequence and phylogenetic analysis of the pmoA clone libraries at the different ponds (idle, lotus-cultivated and post-aquaculture), we found pmoA genes indicating the presence of type I and type II MOB at all study sites, but type Ib sequences affiliated with the Methylococcus/Methylocaldum lineage were most ubiquitous (46.7 % of clones). Based on rarefaction analysis and diversity indices, the disused mining pond with lotus culture was observed to harbor the highest richness of MOB. However, varying secondary activity or sample type did not show a strong variation in community patterns as compared to the ammonia oxidizers in our previous study.

  4. Soil Metals and Ectomycorrhizal Fungi Associated with American Chestnut Hybrids as Reclamation Trees on Formerly Coal Mined Land

    Directory of Open Access Journals (Sweden)

    J. M. Bauman

    2017-01-01

    Full Text Available Hybrid chestnut (Castanea dentata × C. mollissima has the potential to provide a valuable agroforestry crop on formerly coal mined landscapes. However, the soil interactions of mycorrhizal fungi and buried metals associated with mining are not known. This study examined soil, plant tissue, and ectomycorrhizal (ECM root colonization on eight-year-old hybrid (BC1F3 and BC2F3 and American chestnuts on a reclaimed coal mine in Ohio, USA. Chestnut trees were measured and ECM colonization on roots was quantified. Leaves, flowers, and soil were analyzed for heavy metals. Differences were not detected among tree types regarding metal accumulation in plant tissue or ECM colonization. BC2F3 hybrids had greater survival and less cankers than American chestnuts (P = 0.006 and <0.0001. Taller trees were associated with greater ECM root colonization and correlated with an increase in Al uptake (P = 0.02 and 0.01. When comparing tissue, manganese and aluminum were in higher concentrations in leaves than flowers, where copper and selenium were significantly higher in floral tissue (P < 0.05. All trees were flowering at this time meriting further examination in nut tissue. Block effects for selenium and zinc indicate the variability in reclaimed soils requiring further monitoring for possible elemental transfer to nut and wood tissue.

  5. Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies

    Directory of Open Access Journals (Sweden)

    Lehr Thorsten

    2011-03-01

    Full Text Available Abstract Background Several methods have been presented for the analysis of complex interactions between genetic polymorphisms and/or environmental factors. Despite the available methods, there is still a need for alternative methods, because no single method will perform well in all scenarios. The aim of this work was to evaluate the performance of three selected rule based classifier algorithms, RIPPER, RIDOR and PART, for the analysis of genetic association studies. Methods Overall, 42 datasets were simulated with three different case-control models, a varying number of subjects (300, 600, SNPs (500, 1500, 3000 and noise (5%, 10%, 20%. The algorithms were applied to each of the datasets with a set of algorithm-specific settings. Results were further investigated with respect to a the Model, b the Rules, and c the Attribute level. Data analysis was performed using WEKA, SAS and PERL. Results The RIPPER algorithm discovered the true case-control model at least once in >33% of the datasets. The RIDOR and PART algorithm performed poorly for model detection. The RIPPER, RIDOR and PART algorithm discovered the true case-control rules in more than 83%, 83% and 44% of the datasets, respectively. All three algorithms were able to detect the attributes utilized in the respective case-control models in most datasets. Conclusions The current analyses substantiate the utility of rule based classifiers such as RIPPER, RIDOR and PART for the detection of gene-gene/gene-environment interactions in genetic association studies. These classifiers could provide a valuable new method, complementing existing approaches, in the analysis of genetic association studies. The methods provide an advantage in being able to handle both categorical and continuous variable types. Further, because the outputs of the analyses are easy to interpret, the rule based classifier approach could quickly generate testable hypotheses for additional evaluation. Since the algorithms are

  6. Identification of the plastic deformation characteristics of AL5052-O sheet based on the non-associated flow rule

    Science.gov (United States)

    Pham, Quoc Tuan; Kim, Young Suk

    2017-03-01

    This study aims to determine the plastic deformation characteristics of aluminum 5052-O based on non-associated flow rule. To achieve this goal, a new strain hardening model named as Kim-Tuan hardening model is proposed to perfectly describe the stress-strain relation of the studied material in terms of the uniaxial tensile test and to predict the material's post-necking behavior. Additionally, the plastic behaviors of AL5052-O sheet are described by two approaches: the associated flow rule with YLD2000-2d yield function and the non-associated flow rule with Hill's quadratic function (NAFR-Hill48). The parameters of these functions were derived from the material properties that were obtained from uniaxial tensile tests and bulge test. The flow curve based on Kim-Tuan model and plastic behaviors obtained from two above-mentioned approaches were imported into a finite element analysis code to simulate the hydraulic bulge test for this material to confirm the precision of material characteristics achieved before. The simulation results based on the NAFR-Hill48 match well with the experiment results of bulge test while the YLD2000-2d provides highly accurate predictions for anisotropy of this material.

  7. Plastic anisotropic constitutive equation based on stress-rate dependency related with non-associated flow rule for bifurcation analysis

    Science.gov (United States)

    Oya, T.; Yanagimoto, J.; Ito, K.; Uemura, G.; Mori, N.

    2017-09-01

    In metal forming, progress in material models is required to construct a general and reliable fracture prediction framework because of the increased use of advanced materials and growing demand for higher prediction accuracy. In this study, a fracture prediction framework based on bifurcation theory is constructed. A novel material model based on the stress-rate dependence related to a non-associated flow rule is presented. This model is based on a non-associated flow rule with an arbitrary higher-order yield function and a plastic potential function for any anisotropic material. This formulation is combined with the stress-rate-dependent plastic constitutive equation, which is known as the Ito-Goya rule, to construct a generalized plastic constitutive model in which non-normality and non-associativity are reasonably included. Then, by adopting three-dimensional bifurcation theory, which is referred to the 3D theory, a new theoretical framework for fracture prediction based on the initiation of a shear band is constructed. Using virtual material data, a numerical simulation is carried out to produce a fracture limit diagram, which is used to investigate the characteristics of the proposed methodology.

  8. Spatial-temporal analysis and projection of extreme particulate matter (PM10 and PM2.5) levels using association rules: A case study of the Jing-Jin-Ji region, China

    Science.gov (United States)

    Qin, Shanshan; Liu, Feng; Wang, Chen; Song, Yiliao; Qu, Jiansheng

    2015-11-01

    The Jing-Jin-Ji region of Northern China has experienced serious extreme PM concentrations, which could exert considerable negative impacts on human health. However, only small studies have focused on extreme PM concentrations. Therefore, joint regional PM research and air pollution control has become an urgent issue in this region. To characterize PM pollution, PM10 and PM2.5 hourly samples were collected from 13 cities in Jing-Jin-Ji region for one year. This study initially analyzed extreme PM data using the Apriori algorithm to mine quantitative association rules in PM spatial and temporal variations and intercity influences. The results indicate that 1) the association rules of intercity PM are distinctive, and do not completely rely on their spatial distributions; 2) extreme PM concentrations frequently occur in southern cities, presenting stronger spatial and temporal associations than in northern cities; 3) the strength of the spatial and temporal associations of intercity PM2.5 are more substantial than those of intercity PM10.

  9. Beating Obesity: Factors Associated with Interest in Workplace Weight Management Assistance in the Mining Industry

    Directory of Open Access Journals (Sweden)

    Tamara D. Street

    2017-03-01

    Conclusion: Weight management programs should provide information, motivation. and trouble-shooting assistance to meet the needs of at-risk mining employees, including those who are attempting to change and maintain behaviors to achieve a healthy weight and be suitably fit for work.

  10. Decision rules and associated sample size planning for regional approval utilizing multiregional clinical trials.

    Science.gov (United States)

    Chen, Xiaoyuan; Lu, Nelson; Nair, Rajesh; Xu, Yunling; Kang, Cailian; Huang, Qin; Li, Ning; Chen, Hongzhuan

    2012-09-01

    Multiregional clinical trials provide the potential to make safe and effective medical products simultaneously available to patients globally. As regulatory decisions are always made in a local context, this poses huge regulatory challenges. In this article we propose two conditional decision rules that can be used for medical product approval by local regulatory agencies based on the results of a multiregional clinical trial. We also illustrate sample size planning for such trials.

  11. Displaced rocks, strong motion, and the mechanics of shallow faulting associated with the 1999 Hector Mine, California, earthquake

    Science.gov (United States)

    Michael, Andrew J.; Ross, Stephanie L.; Stenner, Heidi D.

    2002-01-01

    The paucity of strong-motion stations near the 1999 Hector Mine earthquake makes it impossible to make instrumental studies of key questions about near-fault strong-motion patterns associated with this event. However, observations of displaced rocks allow a qualitative investigation of these problems. By observing the slope of the desert surface and the frictional coefficient between these rocks and the desert surface, we estimate the minimum horizontal acceleration needed to displace the rocks. Combining this information with observations of how many rocks were displaced in different areas near the fault, we infer the level of shaking. Given current empirical shaking attenuation relationships, the number of rocks that moved is slightly lower than expected; this implies that slightly lower than expected shaking occurred during the Hector Mine earthquake. Perhaps more importantly, stretches of the fault with 4 m of total displacement at the surface displaced few nearby rocks on 15?? slopes, suggesting that the horizontal accelerations were below 0.2g within meters of the fault scarp. This low level of shaking suggests that the shallow parts of this rupture did not produce strong accelerations. Finally, we did not observe an increased incidence of displaced rocks along the fault zone itself. This suggests that, despite observations of fault-zone-trapped waves generated by aftershocks of the Hector Mine earthquake, such waves were not an important factor in controlling peak ground acceleration during the mainshock.

  12. Spatial distribution of environmental risk associated to a uranium abandoned mine (Central Portugal)

    Science.gov (United States)

    Antunes, I. M.; Ribeiro, A. F.

    2012-04-01

    The abandoned uranium mine of Canto do Lagar is located at Arcozelo da Serra, central Portugal. The mine was exploited in an open pit and produced about 12430Kg of uranium oxide (U3O8), between 1987 and 1988. The dominant geological unit is the porphyritic coarse-grained two-mica granite, with biotite>muscovite. The uranium deposit consists of two gaps crushing, parallel to the coarse-grained porphyritic granite, with average direction N30°E, silicified, sericitized and reddish jasperized, with a width of approximately 10 meters. These gaps are accompanied by two thin veins of white quartz, 70°-80° WNW, ferruginous and jasperized with chalcedony, red jasper and opal. These veins are about 6 meters away from each other. They contain secondary U-phosphates phases such as autunite and torbernite. Rejected materials (1000000ton) were deposited on two dumps and a lake was formed in the open pit. To assess the environmental risk of the abandoned uranium mine of Canto do Lagar, were collected and analysed 70 samples on stream sediments, soils and mine tailings materials. The relation between samples composition were tested using the Principal Components Analysis (PCA) (multivariate analysis) and spatial distribution using Kriging Indicator. The spatial distribution of stream sediments shows that the probability of expression for principal component 1 (explaining Y, Zr, Nb, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Hf, Th and U contents), decreases along SE-NW direction. This component is explained by the samples located inside mine influence. The probability of expression for principal component 2 (explaining Be, Na, Al, Si, P, K, Ca, Ti, Mn, Fe, Co, Ni, Cu, As, Rb, Sr, Mo, Cs, Ba, Tl and Bi contents), increases to middle stream line. This component is explained by the samples located outside mine influence. The spatial distribution of soils, shows that the probability of expression for principal component 1 (explaining Mg, P, Ca, Ge, Sr, Y, Zr, La, Ce, Pr

  13. The spatial decision-supporting system combination of RBR & CBR based on artificial neural network and association rules

    Science.gov (United States)

    Tian, Yangge; Bian, Fuling

    2007-06-01

    The technology of artificial intelligence should be imported on the basis of the geographic information system to bring up the spatial decision-supporting system (SDSS). The paper discusses the structure of SDSS, after comparing the characteristics of RBR and CBR, the paper brings up the frame of a spatial decisional system that combines RBR and CBR, which has combined the advantages of them both. And the paper discusses the CBR in agriculture spatial decisions, the application of ANN (Artificial Neural Network) in CBR, and enriching the inference rule base based on association rules, etc. And the paper tests and verifies the design of this system with the examples of the evaluation of the crops' adaptability.

  14. Perceived rules and accessibility: measurement and mediating role in the association between parental education and vegetable and soft drink intake.

    Science.gov (United States)

    Gebremariam, Mekdes K; Lien, Nanna; Torheim, Liv Elin; Andersen, Lene F; Melbye, Elisabeth L; Glavin, Kari; Hausken, Solveig E S; Sleddens, Ester F C; Bjelland, Mona

    2016-08-17

    The existence of socioeconomic differences in dietary behaviors is well documented. However, studies exploring the mechanisms behind these differences among adolescents using comprehensive and reliable measures of mediators are lacking. The aims of this study were (a) to assess the psychometric properties of new scales assessing the perceived rules and accessibility related to the consumption of vegetables and soft drinks and (b) to explore their mediating role in the association between parental education and the corresponding dietary behaviors. A cross-sectional survey including 440 adolescents from three counties in Norway (mean age 14.3 years (SD = 0.6)) was conducted using a web-based questionnaire. Principal component analysis, test-retest and internal reliability analysis were conducted. The mediating role of perceived accessibility and perceived rules in the association between parental education and the dietary behaviors was explored using linear regression analyses. Factor analyses confirmed two separate subscales, named "accessibility" and "rules", both for vegetables and soft drinks (factor loadings >0.60). The scales had good internal consistency reliability (0.70-0.87). The test-retest reliability of the scales was moderate to good (0.44-0.62). Parental education was inversely related to the consumption of soft drinks and positively related to the consumption of vegetables. Perceived accessibility and perceived rules related to soft drink consumption were found to mediate the association between parental education and soft drink consumption (47.5 and 8.5 % of total effect mediated). Accessibility of vegetables was found to mediate the association between parental education and the consumption of vegetables (51 % of total effect mediated). The new scales developed in this study are comprehensive and have adequate validity and reliability; they are therefore considered appropriate for use among 13-15 year-olds. Parents, in particular those with a

  15. Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Directory of Open Access Journals (Sweden)

    Lam Nina SN

    2006-02-01

    Full Text Available Abstract Background The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. Results Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. Conclusion Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention.

  16. Data mining in radiology.

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-04-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  17. TsPU control apparatus and experience associated with its operation in coal mines. [USSR

    Energy Technology Data Exchange (ETDEWEB)

    Magilat, G.I.; Serebrov, L.M.; Sirotkin, A.A.

    1982-05-01

    This paper presents the TsPU remote control system developed by Donavtomatgormash and produced on a commercial scale by the Makeevka Mine Automation Plant. The TsPU system is used for remote control of cutter loaders and face conveyors at longwall faces in level or inclined coal seams characterized by methane hazard, or with coal dust explosion hazard. An electrical scheme of the control system is shown. Position of the TsPU control system in relation to control system of cutter loader starter is shown in a further scheme. In comparison to the AUS remote control system the TsPU has a simplified design. Its operation is also more reliable. Performance testing of the TsPU control system carried out in some Donbass mines is evaluated. Tests show that the TsPU guarantees high reliability under conditions of methane or coal dust hazard and reduces the number of machine failures and idletime of face equipment. Weak points of the system discovered during performance testing have been taken into account in the design of the SPK-A, K10Z-A, and KD-80 remote control system being developed for thin and steep coal mines.

  18. Uncovering Hospitalists’ Information Needs from Outside Healthcare Facilities in the Context of Health Information Exchange Using Association Rule Learning

    Science.gov (United States)

    Mora, E.; Gemmani, M.; Zayas-Castro, J.

    2015-01-01

    Summary Background Important barriers to health information exchange (HIE) adoption are clinical workflow disruptions and troubles with the system interface. Prior research suggests that HIE interfaces providing faster access to useful information may stimulate use and reduce barriers for adoption; however, little is known about informational needs of hospitalists. Objective To study the association between patient health problems and the type of information requested from outside healthcare providers by hospitalists of a tertiary care hospital. Methods We searched operational data associated with fax-based exchange of patient information (previous HIE implementation) between hospitalists of an internal medicine department in a large urban tertiary care hospital in Florida, and any other affiliated and unaffiliated healthcare provider. All hospitalizations from October 2011 to March 2014 were included in the search. Strong association rules between health problems and types of information requested during each hospitalization were discovered using Apriori algorithm, which were then validated by a team of hospitalists of the same department. Results Only 13.7% (2 089 out of 15 230) of the hospitalizations generated at least one request of patient information to other providers. The transactional data showed 20 strong association rules between specific health problems and types of information exist. Among the 20 rules, for example, abdominal pain, chest pain, and anaemia patients are highly likely to have medical records and outside imaging results requested. Other health conditions, prone to have records requested, were lower urinary tract infection and back pain patients. Conclusions The presented list of strong co-occurrence of health problems and types of information requested by hospitalists from outside healthcare providers not only informs the implementation and design of HIE, but also helps to target future research on the impact of having access to outside

  19. Uncovering Hospitalists' Information Needs from Outside Healthcare Facilities in the Context of Health Information Exchange Using Association Rule Learning.

    Science.gov (United States)

    Martinez, D A; Mora, E; Gemmani, M; Zayas-Castro, J

    2015-01-01

    Important barriers to health information exchange (HIE) adoption are clinical workflow disruptions and troubles with the system interface. Prior research suggests that HIE interfaces providing faster access to useful information may stimulate use and reduce barriers for adoption; however, little is known about informational needs of hospitalists. To study the association between patient health problems and the type of information requested from outside healthcare providers by hospitalists of a tertiary care hospital. We searched operational data associated with fax-based exchange of patient information (previous HIE implementation) between hospitalists of an internal medicine department in a large urban tertiary care hospital in Florida, and any other affiliated and unaffiliated healthcare provider. All hospitalizations from October 2011 to March 2014 were included in the search. Strong association rules between health problems and types of information requested during each hospitalization were discovered using Apriori algorithm, which were then validated by a team of hospitalists of the same department. Only 13.7% (2 089 out of 15 230) of the hospitalizations generated at least one request of patient information to other providers. The transactional data showed 20 strong association rules between specific health problems and types of information exist. Among the 20 rules, for example, abdominal pain, chest pain, and anaemia patients are highly likely to have medical records and outside imaging results requested. Other health conditions, prone to have records requested, were lower urinary tract infection and back pain patients. The presented list of strong co-occurrence of health problems and types of information requested by hospitalists from outside healthcare providers not only informs the implementation and design of HIE, but also helps to target future research on the impact of having access to outside information for specific patient cohorts. Our data

  20. Variation in diel activity of ground beetles (Coleoptera: Carabidae) associated with a soybean field and coal mine remnant

    Science.gov (United States)

    Willand, J.E.; McCravy, K.W.

    2006-01-01

    Diel activities of carabids (Coleoptera: Carabidae) associated with a coal mine remnant and surrounding soybean field were studied in west-central Illinois from June through October 2002. A total of 1,402 carabids, representing 29 species and 17 genera, were collected using pitfall traps. Poecilus chalcites (Say) demonstrated roughly equal diurnal and nocturnal activity in June, but greater diurnal activity thereafter. Pterostichus permundus (Say), Cyclotrachelus seximpressus (LeConte), Amara obesa (Say), and Scarites quadriceps Chaudoir showed significant nocturnal activity. Associations between habitat and diel activity were found for three species: P. chalcites associated with the remnant and edge habitats showed greater diurnal activity than those associated with the soybean field; C. seximpressus was most active diurnally in the remnant, and Harpalus pensylvanicus (DeGeer) showed the greatest nocturnal activity in the remnant and edge habitats. We found significant temporal and habitat-related variation in diel activity among carabid species inhabiting agricultural areas in west-central Illinois.

  1. A new algorithm to extract hidden rules of gastric cancer data based on ontology.

    Science.gov (United States)

    Mahmoodi, Seyed Abbas; Mirzaie, Kamal; Mahmoudi, Seyed Mostafa

    2016-01-01

    Cancer is the leading cause of death in economically developed countries and the second leading cause of death in developing countries. Gastric cancers are among the most devastating and incurable forms of cancer and their treatment may be excessively complex and costly. Data mining, a technology that is used to produce analytically useful information, has been employed successfully with medical data. Although the use of traditional data mining techniques such as association rules helps to extract knowledge from large data sets, sometimes the results obtained from a data set are so large that it is a major problem. In fact, one of the disadvantages of this technique is a lot of nonsense and redundant rules due to the lack of attention to the concept and meaning of items or the samples. This paper presents a new method to discover association rules using ontology to solve the expressed problems. This paper reports a data mining based on ontology on a medical database containing clinical data on patients referring to the Imam Reza Hospital at Tabriz. The data set used in this paper is gathered from 490 random visitors to the Imam Reza Hospital at Tabriz, who had been suspicions of having gastric cancer. The proposed data mining algorithm based on ontology makes rules more intuitive, appealing and understandable, eliminates waste and useless rules, and as a minor result, significantly reduces Apriori algorithm running time. The experimental results confirm the efficiency and advantages of this algorithm.

  2. Mercury Contamination and Biogeochemical Cycling Associated with the Historic Idrija Mining Area of Slovenia

    Science.gov (United States)

    Hines, M. E.; Bonzongo, J. J.; Barkay, T.; Horvat, M.; Faganeli, J.

    2001-12-01

    The Idrija Mine is the second largest Hg mine in the world, which operated for 500 years before recently closing. More than five million tons of ore were mined with only 73% recovered. Hg-laden tailings still line the banks. Exhausts from stacks and mineshafts caused elevated levels of airborne Hg, most of which was deposited in the Idrija basin leading to elevated Hg levels in surficial soils. Hg is continually being transported downstream with approximately 1,500 kg per year entering the northern Adriatic Sea 100 km away. Multidisciplinary studies were conducted on samples collected throughout the Idrija and Soca River systems and waters and sediments in the Gulf of Trieste including Hg speciation, Hg transformation activities in sediments and soils, and the presence and expression of bacterial Hg resistance (mer) genes. Total Hg in the Idrija River increased from 300 ng/L with MeHg accounting for about 0.5%. Concentrations decreased downstream, but increased again in the Soca River and in the estuary with MeHg accounting for nearly 1.5% of the total. However, while bacteria upstream of the mine did not contain mer genes, such genes were detected in bacteria collected downstream for nearly 40 km, and these genes were transcribed. Total Hg levels decreased offshore, but values over 30 ng/L were noted in bottom waters. MeHg concentrations in the Gulf were highest in bottom waters. Sediments near the river mouth contained 40 micro-g/g total Hg with MeHg concentrations of about 3 ng/g. Sediments several km into the Gulf contained 50-fold less total Hg but only 10-fold less MeHg that decreased with depth in the sediment. Hg in sediment pore waters varied between 1 and 8 ng/L, with MeHg accounting for about 30%. Hg methylation and MeHg demethylation were active in Gulf sediments with highest activities near the surface. MeHg was degraded by an oxidative pathway with >97% of the C released from MeHg as carbon dioxide. Hg methylation depth profiles resembled profiles of

  3. An application of data mining in district heating substations for improving energy performance

    Science.gov (United States)

    Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing

    2017-11-01

    Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.

  4. An application of data mining in district heating substations for improving energy performance

    Directory of Open Access Journals (Sweden)

    Xue Puning

    2017-01-01

    Full Text Available Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM. Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.

  5. Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

    Directory of Open Access Journals (Sweden)

    Amin Y. Noaman

    2017-01-01

    Full Text Available Prediction of nosocomial infections among patients is an important part of clinical surveillance programs to enable the related personnel to take preventive actions in advance. Designing a clinical surveillance program with capability of predicting nosocomial infections is a challenging task due to several reasons, including high dimensionality of medical data, heterogenous data representation, and special knowledge required to extract patterns for prediction. In this paper, we present details of six data mining methods implemented using cross industry standard process for data mining to predict central line-associated blood stream infections. For our study, we selected datasets of healthcare-associated infections from US National Healthcare Safety Network and consumer survey data from Hospital Consumer Assessment of Healthcare Providers and Systems. Our experiments show that central line-associated blood stream infections (CLABSIs can be successfully predicted using AdaBoost method with an accuracy up to 89.7%. This will help in implementing effective clinical surveillance programs for infection control, as well as improving the accuracy detection of CLABSIs. Also, this reduces patients’ hospital stay cost and maintains patients’ safety.

  6. 78 FR 48591 - Refuge Alternatives for Underground Coal Mines

    Science.gov (United States)

    2013-08-08

    ... Refuge Alternatives for Underground Coal Mines; Proposed Rules #0;#0;Federal Register / Vol. 78 , No. 153... 30 CFR Part 75 RIN 1219-AB84 Refuge Alternatives for Underground Coal Mines AGENCY: Mine Safety and... alternatives in underground coal mines. The U.S. Court of Appeals for the District of Columbia Circuit remanded...

  7. VICKEY: Mining Conditional Keys on Knowledge Bases

    DEFF Research Database (Denmark)

    Symeonidou, Danai; Prado, Luis Antonio Galarraga Del; Pernelle, Nathalie

    2017-01-01

    A conditional key is a key constraint that is valid in only a part of the data. In this paper, we show how such keys can be mined automatically on large knowledge bases (KBs). For this, we combine techniques from key mining with techniques from rule mining. We show that our method can scale to KBs...... of millions of facts. We also show that the conditional keys we mine can improve the quality of entity linking by up to 47% points....

  8. In Situ Generated Colloid Transport of Cu and Zn in Reclaimed Mine Soil Profiles Associated with Biosolids Application

    Directory of Open Access Journals (Sweden)

    Jarrod O. Miller

    2011-01-01

    Full Text Available Areas reclaimed for agricultural uses following coal mining often receive biosolids applications to increase organic matter and fertility. Transport of heavy metals within these soils may be enhanced by the additional presence of biosolids colloids. Intact monoliths from reclaimed and undisturbed soils in Virginia and Kentucky were leached to observe Cu and Zn mobility with and without biosolids application. Transport of Cu and Zn was observed in both solution and colloid associated phases in reclaimed and undisturbed forest soils, where the presence of unweathered spoil material and biosolids amendments contributed to higher metal release in solution fractions. Up to 81% of mobile Cu was associated with the colloid fraction, particularly when gibbsite was present, while only up to 18% of mobile Zn was associated with the colloid fraction. The colloid bound Cu was exchangeable by ammonium acetate, suggesting that it will release into groundwater resources.

  9. Associate editors' foreword: entrepreneurship in health education and health promotion: five cardinal rules.

    Science.gov (United States)

    Cottrell, Randall R; Cooper, Hanna

    2009-07-01

    A career in health education or health promotion (HE/HP) can be developed in many ways. In past editions of this department, career development has been discussed in relation to distance (Balonna, 2001), consulting (Bookbinder, 2001), certifications (Hayden, 2005), graduate school (Cottrell & Hayden, 2007), and many other topics. This article looks at a less traditional means of career development-entrepreneurship. Health education is a field ripe with opportunities for consulting and for selling health-related products and services. Entrepreneurship can not only create financial rewards but can also provide high visibility and networking contacts that can advance one's career. This article combines both theory and practical applications to assist readers in developing entrepreneurial activities. The authors are experienced in entrepreneurial development and use that expertise to provide relevant examples and develop a framework using "five cardinal rules" for establishing an entrepreneurial enterprise in HE/HP.

  10. Hemodialysis Key Features Mining and Patients Clustering Technologies

    Directory of Open Access Journals (Sweden)

    Tzu-Chuen Lu

    2012-01-01

    Full Text Available The kidneys are very vital organs. Failing kidneys lose their ability to filter out waste products, resulting in kidney disease. To extend or save the lives of patients with impaired kidney function, kidney replacement is typically utilized, such as hemodialysis. This work uses an entropy function to identify key features related to hemodialysis. By identifying these key features, one can determine whether a patient requires hemodialysis. This work uses these key features as dimensions in cluster analysis. The key features can effectively determine whether a patient requires hemodialysis. The proposed data mining scheme finds association rules of each cluster. Hidden rules for causing any kidney disease can therefore be identified. The contributions and key points of this paper are as follows. (1 This paper finds some key features that can be used to predict the patient who may has high probability to perform hemodialysis. (2 The proposed scheme applies k-means clustering algorithm with the key features to category the patients. (3 A data mining technique is used to find the association rules from each cluster. (4 The mined rules can be used to determine whether a patient requires hemodialysis.

  11. Evaluating Learning Algorithms to Support Human Rule Evaluation Based on Objective Rule Evaluation Indices

    Directory of Open Access Journals (Sweden)

    H Abe

    2007-05-01

    Full Text Available In this paper, we present an evaluation of learning algorithms of a novel rule evaluation support method for post-processing of mined results with rule evaluation models based on objective indices. Post-processing of mined results is one of the key processes in a data mining process. However, it is difficult for human experts to completely evaluate several thousands of rules from a large dataset with noise. To reduce the costs in such rule evaluation task, we have developed a rule evaluation support method with rule evaluation models that learn from a dataset. This dataset comprises objective indices for mined classification rules and evaluation by a human expert for each rule. To evaluate performances of learning algorithms for constructing the rule evaluation models, we have done a case study on the meningitis data mining as an actual problem. Furthermore, we have also evaluated our method with ten rule sets obtained from ten UCI datasets. With regard to these results, we show the availability of our rule evaluation support method for human experts.

  12. Modeling of Erosion on Jelateng Watershed Using USLE Method, Associated with an Illegal Mining Activities (PETI)

    Science.gov (United States)

    Ananda, I. N.; Aswari, F. V.; Narmaningrum, D. A.; Nugraha, A. S. A.; Asidiqi, M. A. A.; Setiawan, Y.

    2016-11-01

    The Indonesian archipelago has abundant mineral resources, and it causes many mining activities. Mineral resource is natural based resource which cannot be renewable. An abandon mining pit makes a hole in land surface and it increase the erosion severity level on the rainy season. This erosion would brought sediment to the sea, and it causes damage the ecosystem of the coastal. Erosion modeling in Jelateng watershed performed temporally using remote sensing image data, which consist of LANDSAT-5 (1995), and LANDsAt-8 (2015), and supported by field data as well. The parameters for modeling of erosion through rasterization process as input from erosion USLE models to IDRISI software. The results shown that in 1995, the majority of the area has a low level of erosion. The low erosion rate is less than 183.67 tons/hectare/year and high erosion rate is 408.34 up to 633 tons/hectare/year. Compare with in 2015, erosion models shown that erosion is most prevalent on the upstream area of Jelateng watershed, with low erosion rate is less than 432.2 tons/hectare/year and high erosion rate is 615.64 up to 1448.31 tons/hectare/year.

  13. Heavy metals mobility associated with the molybdenum mining-concentration complex in the Buryatia Republic, Germany.

    Science.gov (United States)

    Sarapulova, Angelina; Dampilova, Bayarma V; Bardamova, Irina; Doroshkevich, Svetlana G; Smirnova, Olga

    2017-04-01

    Mining of Dzhida ore deposits in Russia has caused the formation of a large tailings dam with technogenic sands and contamination of nearby district soils. Geochemical fractions of technogenic sands were divided by a sequential extraction procedure. The sampling points with maximum concentration of Pb, Cu, and Zn were selected for investigation of heavy metal mobility. Two previously described methods of heavy metal fractionation using selective extraction were applied: a procedure developed by the Community Bureau of Reference of the Commission of the European Communities (BCR procedure) and Tessier's fractionation scheme. Despite some differences in Pb extractions, the two procedures describe equally well the distribution of heavy metals on geochemical fractions. BCR procedure was chosen as a fast method of heavy metal mobile form estimation. For considered mining object, it is revealed that there are different characters of heavy metal mobility sequence in the soils Zn > Cu > Pb and technogenic sands Pb > Zn > Cu.

  14. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    DEFF Research Database (Denmark)

    Chen, X; Lee, G; Maher, B S

    2011-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed...... bioinformatic prioritization for all the markers with P-values ¿0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE...... in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11¿380 cases and 15¿021 controls), we...

  15. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    Science.gov (United States)

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  16. Mine soils associated with open-cast coal mining in Spain: a review; Suelos mineros asociados a la mineria de carbon a cielo abierto en Espana: una revision

    Energy Technology Data Exchange (ETDEWEB)

    Arranz-Gonzalez, J. C.

    2011-07-01

    The different situations that may be found after the closure of coal mines range from the simple abandonment of pits and spoil tips to areas where reclamation work has led to the creation of artificial soils on a reconstituted surface composed of layers of rock and soil or both types of material. Soils of this type are known as mine soils, amongst which those generated by coal mining have been studied most extensively, both to assess their potential for reclamation and to learn more about their pedogenetic evolution. We present here a review of some of the more important works devoted to this subject. We have found evidence to show that in Spain, just as in other countries, the physical and chemical properties of these anthropogenic soils are changing rapidly and so the mine-soil profiles described can be considered as belonging to very young soils still undergoing incipient but rapid development. We have also found that an analysis of information obtained from the soil parameters of surface samples and its interpretation is of great practical use in restoration processes. Nevertheless, the sampling and description of soil profiles has proved to be of much greater interest, allowing us to reach a clearer understanding of the internal processes and properties that are unique to these types of anthropogenic soil. (Author) 64 refs.

  17. Longwall mining

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1995-03-14

    As part of EIA`s program to provide information on coal, this report, Longwall-Mining, describes longwall mining and compares it with other underground mining methods. Using data from EIA and private sector surveys, the report describes major changes in the geologic, technological, and operating characteristics of longwall mining over the past decade. Most important, the report shows how these changes led to dramatic improvements in longwall mining productivity. For readers interested in the history of longwall mining and greater detail on recent developments affecting longwall mining, the report includes a bibliography.

  18. Association between Pain in Adolescence and Low Back Pain in Adulthood: Studying a Cohort of Mine Workers

    Directory of Open Access Journals (Sweden)

    David Jonsson

    2017-01-01

    Full Text Available Purpose. To study the association of self-reported pain in adolescence with low back pain (LBP in adulthood among mine workers and, also, study associations between the presence of LBP over 12-month or one-month LBP intensity during a health examination and daily ratings of LBP three and nine months later. Methods. Mixed design with data collected retrospectively, cross-sectionally, and prospectively. Data was collected using a questionnaire during a health examination and by using self-reported daily ratings of LBP three and nine months after the examination. Results. Pain prevalence during teenage years was 55% and it was 59% at age 20. Pain during teenage years had a relative risk of 1.33 (95% confidence interval 1.03–1.73 of LBP 12 months prior to the health examination, but with no associations with LBP intensity or LBP assessed by text messaging. Pain at age 20 years was not associated with any measure of LBP in adulthood. Daily ratings of LBP were associated with LBP during the health examination three and nine months earlier. Conclusions. There were no clear associations between self-reported pain in adolescence and LBP in adulthood. Self-reported daily ratings of LBP were associated with LBP from the health examination. Possible limitations for this study were the retrospective design and few participants.

  19. Analysis of occupational health hazards and associated risks in fuzzy environment: a case research in an Indian underground coal mine.

    Science.gov (United States)

    Samantra, Chitrasen; Datta, Saurav; Mahapatra, Siba Sankar

    2017-09-01

    This paper presents a unique hierarchical structure on various occupational health hazards including physical, chemical, biological, ergonomic and psychosocial hazards, and associated adverse consequences in relation to an underground coal mine. The study proposes a systematic health hazard risk assessment methodology for estimating extent of hazard risk using three important measuring parameters: consequence of exposure, period of exposure and probability of exposure. An improved decision making method using fuzzy set theory has been attempted herein for converting linguistic data into numeric risk ratings. The concept of 'centre of area' method for generalized triangular fuzzy numbers has been explored to quantify the 'degree of hazard risk' in terms of crisp ratings. Finally, a logical framework for categorizing health hazards into different risk levels has been constructed on the basis of distinguished ranges of evaluated risk ratings (crisp). Subsequently, an action requirement plan has been suggested, which could provide guideline to the managers for successfully managing health hazard risks in the context of underground coal mining exercise.

  20. Optimal Bangla Keyboard Layout using Data Mining Technique

    OpenAIRE

    Kamruzzaman, S. M.; Alam, Md. Hijbul; Masum, Abdul Kadar Muhammad; Hassan, Md Mahadi

    2010-01-01

    This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the freq...

  1. Use of Six Sigma Worksheets for assessment of internal and external failure costs associated with candidate quality control rules for an ADVIA 120 hematology analyzer.

    Science.gov (United States)

    Cian, Francesco; Villiers, Elisabeth; Archer, Joy; Pitorri, Francesca; Freeman, Kathleen

    2014-06-01

    Quality control (QC) validation is an essential tool in total quality management of a veterinary clinical pathology laboratory. Cost-analysis can be a valuable technique to help identify an appropriate QC procedure for the laboratory, although this has never been reported in veterinary medicine. The aim of this study was to determine the applicability of the Six Sigma Quality Cost Worksheets in the evaluation of possible candidate QC rules identified by QC validation. Three months of internal QC records were analyzed. EZ Rules 3 software was used to evaluate candidate QC procedures, and the costs associated with the application of different QC rules were calculated using the Six Sigma Quality Cost Worksheets. The costs associated with the current and the candidate QC rules were compared, and the amount of cost savings was calculated. There was a significant saving when the candidate 1-2.5s, n = 3 rule was applied instead of the currently utilized 1-2s, n = 3 rule. The savings were 75% per year (£ 8232.5) based on re-evaluating all of the patient samples in addition to the controls, and 72% per year (£ 822.4) based on re-analyzing only the control materials. The savings were also shown to change accordingly with the number of samples analyzed and with the number of daily QC procedures performed. These calculations demonstrated the importance of the selection of an appropriate QC procedure, and the usefulness of the Six Sigma Costs Worksheet in determining the most cost-effective rule(s) when several candidate rules are identified by QC validation. © 2014 American Society for Veterinary Clinical Pathology and European Society for Veterinary Clinical Pathology.

  2. A comprehensive review on privacy preserving data mining.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Razzaque, Mohammad Abdur

    2015-01-01

    Preservation of privacy in data mining has emerged as an absolute prerequisite for exchanging confidential information in terms of data analysis, validation, and publishing. Ever-escalating internet phishing posed severe threat on widespread propagation of sensitive information over the web. Conversely, the dubious feelings and contentions mediated unwillingness of various information providers towards the reliability protection of data from disclosure often results utter rejection in data sharing or incorrect information sharing. This article provides a panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories. The fundamental notions of the existing privacy preserving data mining methods, their merits, and shortcomings are presented. The current privacy preserving data mining techniques are classified based on distortion, association rule, hide association rule, taxonomy, clustering, associative classification, outsourced data mining, distributed, and k-anonymity, where their notable advantages and disadvantages are emphasized. This careful scrutiny reveals the past development, present research challenges, future trends, the gaps and weaknesses. Further significant enhancements for more robust privacy protection and preservation are affirmed to be mandatory.

  3. Comparative Evaluation of the Different Data Mining Techniques Used for the Medical Database

    Directory of Open Access Journals (Sweden)

    Kasperczuk Anna

    2016-09-01

    Full Text Available Data mining is the upcoming research area to solve various problems. Classification and finding association are two main steps in the field of data mining. In this paper, we use three classification algorithms: J48 (an open source Java implementation of C4.5 algorithm, Multilayer Perceptron - MLP (a modification of the standard linear perceptron and Naïve Bayes (based on Bayes rule and a set of conditional independence assumptions of the Weka interface. These classifiers have been used to choose the best algorithm based on the conditions of the voice disorders database. To find association rules over transactional medical database first we use apriori algorithm for frequent item set mining. These two initial steps of analysis will help to create the medical knowledgebase. The ultimate goal is to build a model, which can improve the way to read and interpret the existing data in medical database and future data as well.

  4. AN EFFICIENT DATA MINING METHOD TO FIND FREQUENT ITEM SETS IN LARGE DATABASE USING TR- FCTM

    Directory of Open Access Journals (Sweden)

    Saravanan Suba

    2016-01-01

    Full Text Available Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items using candidate generation with more number of data base scans. FP-tree uses two database scans to find significant frequent items without using candidate generation. This proposed TR-FCTM (Transaction Reduction- Frequency Count Table Method discovers significant frequent items by generating full candidates once to form frequency count table with one database scan. Experimental results of TR-FCTM shows that this algorithm outperforms than Apriori and FP-tree.

  5. miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

    Science.gov (United States)

    Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K

    2016-04-29

    MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to

  6. Mining usage patterns for the Android API

    Directory of Open Access Journals (Sweden)

    Hudson S. Borges

    2015-07-01

    Full Text Available API methods are not used alone, but in groups and following patterns. However, despite being a key information for API users, most usage patterns are not described in official API documents. In this article, we report a study that evaluates the feasibility of automatically enriching API documents with information on usage patterns. For this purpose, we mine and analyze 1,952 usage patterns, from a set of 396 Android applications. As part of our findings, we report that the Android API has many undocumented and non-trivial usage patterns, which can be inferred using association rule mining algorithms. We also describe a field study where a version of the original Android documentation is instrumented with the extracted usage patterns. During 17 months, this documentation received 77,863 visits from professional Android developers.

  7. ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets.

    Science.gov (United States)

    Özgür Cingiz, M; Biricik, G; Diri, B

    2017-03-31

    miRNAs are key regulators that bind to target genes to suppress their gene expression level. The relations between miRNA-target genes enable users to derive co-expressed genes that may be involved in similar biological processes and functions in cells. We hypothesize that target genes of miRNAs are co-expressed, when they are regulated by multiple miRNAs. With the usage of these co-expressed genes, we can theoretically construct co-expression networks (GCNs) related to 152 diseases. In this study, we introduce ARNetMiT that utilize a hash based association rule algorithm in a novel way to infer the GCNs on miRNA-target genes data. We also present R package of ARNetMiT, which infers and visualizes GCNs of diseases that are selected by users. Our approach assumes miRNAs as transactions and target genes as their items. Support and confidence values are used to prune association rules on miRNA-target genes data to construct support based GCNs (sGCNs) along with support and confidence based GCNs (scGCNs). We use overlap analysis and the topological features for the performance analysis of GCNs. We also infer GCNs with popular GNI algorithms for comparison with the GCNs of ARNetMiT. Overlap analysis results show that ARNetMiT outperforms the compared GNI algorithms. We see that using high confidence values in scGCNs increase the ratio of the overlapped gene-gene interactions between the compared methods. According to the evaluation of the topological features of ARNetMiT based GCNs, the degrees of nodes have power-law distribution. The hub genes discovered by ARNetMiT based GCNs are consistent with the literature.

  8. Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Directory of Open Access Journals (Sweden)

    Knaus William A

    2006-03-01

    Full Text Available Abstract Background Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness, hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. Methods The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. Results We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. Conclusion The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of

  9. Dissolved metals and associated constituents in abandoned coal-mine discharges, Pennsylvania, USA. Part 1: Constituent quantities and correlations

    Science.gov (United States)

    Cravotta, C.A.

    2008-01-01

    Complete hydrochemical data are rarely reported for coal-mine discharges (CMD). This report summarizes major and trace-element concentrations and loadings for CMD at 140 abandoned mines in the Anthracite and Bituminous Coalfields of Pennsylvania. Clean-sampling and low-level analytical methods were used in 1999 to collect data that could be useful to determine potential environmental effects, remediation strategies, and quantities of valuable constituents. A subset of 10 sites was resampled in 2003 to analyze both the CMD and associated ochreous precipitates; the hydrochemical data were similar in 2003 and 1999. In 1999, the flow at the 140 CMD sites ranged from 0.028 to 2210 L s-1, with a median of 18.4 L s-1. The pH ranged from 2.7 to 7.3; concentrations (range in mg/L) of dissolved (0.45-??m pore-size filter) SO4 (34-2000), Fe (0.046-512), Mn (0.019-74), and Al (0.007-108) varied widely. Predominant metalloid elements were Si (2.7-31.3 mg L-1), B ( C > P = N = Se) were not elevated in the CMD samples compared to average river water or seawater. Compared to seawater, the CMD samples also were poor in halogens (Cl > Br > I > F), alkalies (Na > K > Li > Rb > Cs), most alkaline earths (Ca > Mg > Sr), and most metalloids but were enriched by two to four orders of magnitude with Fe, Al, Mn, Co, Be, Sc, Y and the lanthanide rare-earth elements, and one order of magnitude with Ni and Zn. The ochre samples collected at a subset of 10 sites in 2003 were dominantly goethite with minor ferrihydrite or lepidocrocite. None of the samples for this subset contained schwertmannite or was Al rich, but most contained minor aluminosilicate detritus. Compared to concentrations in global average shale, the ochres were rich in Fe, Ag, As and Au, but were poor in most other metals and rare earths. The ochres were not enriched compared to commercial ore deposits mined for Au or other valuable metals. Although similar to commercial Fe ores in composition, the ochres are dispersed and

  10. Yeasts associated with an abandoned mining area in Pernek and their tolerance to different chemical elements.

    Science.gov (United States)

    Vadkertiová, Renáta; Molnárová, Jana; Lux, Alexander; Vaculík, Marek; Lišková, Desana

    2016-05-01

    Four plants, Cirsium arvense (creeping thistle), Equisetum arvense (field horsetail), Oxalis acetosella (wood sorrel) and Phragmites australis (common reed), which grew in an abandoned Sb-mining area in Pernek (Malé Karpaty Mts., Slovakia), were investigated for the yeast species. Yeasts were isolated from both the leaves of the plants and the soil adjacent to the plants. In total, 65 yeast cultures, belonging to 11 ascomycetous and 5 basidiomycetous yeast species, were isolated. The species most frequently isolated from both the soil and leaf samples were Trichosporon porosum, Galactomyces candidus and Candida solani, whereas Aureobasidium pullulans, Candida tsuchiyae and Sporidiobolus metaroseus were isolated exclusively from the plant leaves. All the yeast species isolated were tested for their tolerance to two heavy metals (Cd, Zn) and three metalloids (As, Sb and Si). The yeasts isolated from both the leaves and soils exhibited a high tolerance level to both As and Sb, present in elevated concentrations at the locality. Among the yeast species tested, Cryptococcus musci, a close relative to Cryptococcus humicola, was the species most tolerant to all the chemical elements tested, with the exception of Si. It grew in the presence of 200 mmol/L Zn, 200 mmol/L Cd, 60 mmol/L As and 50 mmol/L Sb, and therefore, it can be considered as a multi-tolerant species. Some of the yeast species were tolerant to the individual chemical elements. The yeast-like species Trichosporon laibachii exhibited the highest tolerance to Si of all yeasts tested, and Cryptococcus flavescens and Lindnera saturnus showed the same tolerance as Cryptococcus musci to Zn and As, respectively. The majority of the yeasts showed a notably low tolerance to Cd (not exceeded 0.5 mmol/L), which was present in small amounts in the soil. However, Candida solani, isolated from the soil, exhibited a higher tolerance to Cd (20 mmol/L) than to As (2 mmol/L).

  11. Identifying crash contributory factors at urban roundabouts and using association rules to explore their relationships to different crash types.

    Science.gov (United States)

    Montella, Alfonso

    2011-07-01

    The use of roundabouts improves intersection safety by eliminating or altering conflict types, reducing crash severity, and causing drivers to reduce speeds. However, roundabout performances can degrade if precautions are not taken during either the design or the operation phase. Therefore, additional information on the safety of the roundabouts is extremely helpful for planners and designers in identifying existing deficiencies and in refining the design criteria currently being used. The aim of the paper was to investigate the crash contributory factors in 15 urban roundabouts located in Italy and to study the interdependences between these factors. The crash data refer to the period 2003-2008. The identification of the crash contributory factors was based on site inspections and rigorous analyses performed by a team of specialists with a relevant road safety engineering background. Each roundabout was inspected once every year from 2004 to 2009, both in daytime and in nighttime. Overall, 62 different contributory factors and 2156 total contributory factors were identified. In 51 crashes, a single contributory factor was found, whereas in the other 223 crashes, a combination of contributory factors was identified. Given the large amount of data, the interdependences between the contributory factors and between the contributory factors and the different crash types were explored by an association discovery. Association discovery is the identification of sets of items (i.e., crash contributory factors and crash types in our study) that occur together in a given event (i.e., a crash in our study). The rules were filtered by support, confidence, and lift. As a result, 112 association rules were discovered. Overall, numerous contributory factors related to the road and environment deficiencies but not related to the road user or to the vehicle were identified. The most important factors related to geometric design were the radius of deflection and the deviation angle

  12. Identifying Drug-Drug Interactions by Data Mining: A Pilot Study of Warfarin-Associated Drug Interactions.

    Science.gov (United States)

    Hansen, Peter Wæde; Clemmensen, Line; Sehested, Thomas S G; Fosbøl, Emil Loldrup; Torp-Pedersen, Christian; Køber, Lars; Gislason, Gunnar H; Andersson, Charlotte

    2016-11-01

    Knowledge about drug-drug interactions commonly arises from preclinical trials, from adverse drug reports, or based on knowledge of mechanisms of action. Our aim was to investigate whether drug-drug interactions were discoverable without prior hypotheses using data mining. We focused on warfarin-drug interactions as the prototype. We analyzed altered prothrombin time (measured as international normalized ratio [INR]) after initiation of a novel prescription in previously INR-stable warfarin-treated patients with nonvalvular atrial fibrillation. Data sets were retrieved from clinical work. Random forest (a machine-learning method) was set up to predict altered INR levels after novel prescriptions. The most important drug groups from the analysis were further investigated using logistic regression in a new data set. Two hundred and twenty drug groups were analyzed in 61 190 novel prescriptions. We rediscovered 2 drug groups having known interactions (β-lactamase-resistant penicillins [dicloxacillin] and carboxamide derivatives) and 3 antithrombotic/anticoagulant agents (platelet aggregation inhibitors excluding heparin, direct thrombin inhibitors [dabigatran etexilate], and heparins) causing decreasing INR. Six drug groups with known interactions were rediscovered causing increasing INR (antiarrhythmics class III [amiodarone], other opioids [tramadol], glucocorticoids, triazole derivatives, and combinations of penicillins, including β-lactamase inhibitors) and two had a known interaction in a closely related drug group (oripavine derivatives [buprenorphine] and natural opium alkaloids). Antipropulsives had an unknown signal of increasing INR. We were able to identify known warfarin-drug interactions without a prior hypothesis using clinical registries. Additionally, we discovered a few potentially novel interactions. This opens up for the use of data mining to discover unknown drug-drug interactions in cardiovascular medicine. © 2016 American Heart Association

  13. An Evolutionary Algorithm to Mine High-Utility Itemsets

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available High-utility itemset mining (HUIM is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM of association rules (ARs. In this paper, an evolutionary algorithm is presented to efficiently mine high-utility itemsets (HUIs based on the binary particle swarm optimization. A maximal pattern (MP-tree strcutrue is further designed to solve the combinational problem in the evolution process. Substantial experiments on real-life datasets show that the proposed binary PSO-based algorithm has better results compared to the state-of-the-art GA-based algorithm.

  14. Groundwater-quality data associated with abandoned underground coal mine aquifers in West Virginia, 1973-2016: Compilation of existing data from multiple sources

    Science.gov (United States)

    McAdoo, Mitchell A.; Kozar, Mark D.

    2017-11-14

    This report describes a compilation of existing water-quality data associated with groundwater resources originating from abandoned underground coal mines in West Virginia. Data were compiled from multiple sources for the purpose of understanding the suitability of groundwater from abandoned underground coal mines for public supply, industrial, agricultural, and other uses. This compilation includes data collected for multiple individual studies conducted from July 13, 1973 through September 7, 2016. Analytical methods varied by the time period of data collection and requirements of the independent studies.This project identified 770 water-quality samples from 294 sites that could be attributed to abandoned underground coal mine aquifers originating from multiple coal seams in West Virginia.

  15. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  16. Establishment of SATREPS experimental sites in South African gold mines to monitor phenomena associated with earthquake nucleation and rupture

    CSIR Research Space (South Africa)

    Durrheim, RJ

    2012-03-01

    Full Text Available Mining-induced earthquakes pose a risk to workers in deep mines, while natural earthquakes pose a risk to everywhere, but especially near plate boundaries. A five year Japanese–South African collaborative project entitled ‘Observational studies...

  17. Lack of parental rule-setting on eating is associated with a wide range of adolescent unhealthy eating behaviour both for boys and girls.

    Science.gov (United States)

    Holubcikova, Jana; Kolarcik, Peter; Madarasova Geckova, Andrea; van Dijk, Jitse P; Reijneveld, Sijmen A

    2016-04-27

    Unhealthy eating habits in adolescence lead to a wide variety of health problems and disorders. The aim of this study was to assess the prevalence of absence of parental rules on eating and unhealthy eating behaviour and to explore the relationships between parental rules on eating and a wide range of unhealthy eating habits of boys and girls. We also explored the association of sociodemographic characteristics such as gender, family affluence or parental education with eating related parental rules and eating habits of adolescents. The data on 2765 adolescents aged 13-15 years (mean age: 14.4; 50.7 % boys) from the Slovak part of the Health Behaviour in School-Aged Children (HBSC) study 2014 were assessed. The associations between eating-related parental rules and unhealthy eating patterns using logistic regression were assessed using logistic regression. Unhealthy eating habits occurred frequently among adolescents (range: 18.0 % reported skipping breakfast during weekends vs. 75.8 % for low vegetables intake). Of all adolescents, 20.5 % reported a lack of any parental rules on eating (breakfast not mandatory, meal in front of TV allowed, no rules about sweets and soft drinks). These adolescents were more likely to eat unhealthily, i.e. to skip breakfast on weekdays (odds ratio/95 % confidence interval: 5.33/4.15-6.84) and on weekends (2.66/2.12-3.34), to report low consumption of fruits (1.63/1.30-2.04) and vegetables (1.32/1.04-1.68), and the frequent consumption of sweets (1.59/1.30-1.94), soft drinks (1.93/1.56-2.38) and energy drinks (2.15/1.72-2.70). Parental rule-setting on eating is associated with eating behaviours of adolescents. Further research is needed to disentangle causality in this relationship. If causal, parents may be targeted to modify the eating habits of adolescents.

  18. Mercury Contamination and Bioaccumulation Associated with Historical Gold Mining in the Bear and Yuba River Watersheds, Sierra Nevada, California

    Science.gov (United States)

    Alpers, C. N.; Hunerlach, M. P.; Hothem, R. L.; May, J. T.; Taylor, H. E.; DeWild, J. F.; Olson, M. L.; Krabbenhoft, D. P.; Marvin-DiPasquale, M.

    2001-12-01

    Extensive use of mercury in the mining and recovery of gold during the late 19th and early 20th centuries has led to widespread mercury contamination of water, sediment, and biota in the Sierra Nevada foothills of California. The watersheds of the Bear and Yuba Rivers were selected for study by the U.S. Geological Survey and other federal, state, and local agencies on the basis of (1) results of previous studies of bioaccumulation, (2) observations of visible elemental mercury at numerous mine sites and in river sediments, and (3) extensive historical mining on federal lands and adjacent private lands. Of 53 unfiltered water samples analyzed for total recoverable mercury (Hg-T), 17 samples (32 percent) had concentrations in excess of the U.S. Environmental Protection Agency (EPA) aquatic-life criterion of 50 nanograms per liter (ng/L). Water flowing from two separate tunnels in one mining district had Hg-T concentrations greater than 100,000 ng/L, exceeding the EPA drinking-water standard of 2,000 ng/L. Monthly sampling of the Bear River near its mouth revealed monomethylmercury (MeHg) concentrations in unfiltered water samples greater than 0.4 ng/L during July-August 1999 and January 2000. Game fish were collected from 5 reservoirs and 14 stream sites during 1999 to assess the distribution of mercury in the food chain and to examine the potential risk for humans and wildlife. Of 141 fish fillet samples of black basses (Micropterus spp.), sunfish (Lepomis macrochirus and Lepomis cyanellus), black crappie (Poxomis nigromaculatus), channel catfish (Ictularus punctatus), brown trout (Salmo trutta), and rainbow trout (Oncorhynchus mykiss) analyzed for Hg-T, 52 percent exceeded the EPA criterion of 0.3 parts per million (ppm), wet basis. Eighty-nine percent of the bass had Hg-T greater than 0.3 ppm total mercury. Based on these data, three counties issued a public health notification recommending limited consumption of game fish from the Bear and Yuba watersheds

  19. High resolution microgravity investigations for the detection and characterisation of subsidence associated with abandoned, coal, chalk and salt mines

    Energy Technology Data Exchange (ETDEWEB)

    Styles, P.; Toon, S.; Branston, M.; England, R. [Keele Univ., Applied And Environmental Geophysics Group, School of Physical and Geographical Sciences (United Kingdom); Thomas, E.; Mcgrath, R. [Geotechnology, Neath (United Kingdom)

    2005-07-01

    The closure and decay of industrial activity involving mining has scarred the landscape of urban areas and geo-hazards posed by subsurface cavities are ubiquitous throughout Europe. Features of concern consist of natural solution cavities (e.g. swallow holes and sinkholes in limestone gypsum and chalk) and man-made cavities (mine workings, shafts) in a great variety of post mining environments, including coal, salt, gypsum, anhydrite, tin and chalk. These problems restrict land utilisation, hinder regeneration, pose a threat to life, seriously damage property and services and blight property values. This paper outlines the application of microgravity techniques to characterise abandoned mining hazard in case studies from Coal, Chalk and Salt Mining environments in the UK. (authors)

  20. Human exposure to lead and other potentially harmful elements associated with galena mining at New Zurak, central Nigeria

    Science.gov (United States)

    Lar, U. A.; Ngozi-Chika, C. S.; Ashano, E. C.

    2013-08-01

    Galena mining in New Zurak, central Nigeria is currently increasing in intensity, with widespread artisanal mining taking place alongside mechanised mining. These activities are causing immeasurable damage to the environment. The prolonged human exposure and ingestion of Pb and other potentially harmful elements (PHEs) such as U, Cd, Se, Zn and As that are released from ores during these (mining) activities is a cause of great concern to populations that live in the vicinity of these mine fields. Many of the communities make their living from subsistence farming, growing food from the surroundings, and obtaining drinking water from nearby surface and sub-surface water resources. An overall assessment of the degree of contamination or toxicity of Pb and other PHEs was carried out using the indices of geoaccumulation (Igeo) and contamination factors (CFs), in the different media sampled - farmland soils, uncultivated lands, mine tailings/dumps, natural waters and vegetables. Results reveal that the mine tailings and dumps are highly contaminated with Pb and other PHEs followed in decreasing degree of contamination by the uncultivated lands, farmlands and natural waters. These findings suggest that release of Pb and other PHEs from the galena mining activity has contributed significantly to the enrichment of these elements in the surrounding environment, including the natural water bodies, and are disposed to subsequent entry into the human body through the food chain. As such these PHE accumulations pose significant risks to the environment and human health, especially of children and pregnant women who are the most vulnerable groups in the area. In order to forestall a reoccurrence of the Zamfara Pb poisoning episode in northwestern Nigeria in 2010, where more than 400 children died, the authorities concerned should ensure that mining in New Zurak is done in a more environmentally friendly manner, ensuring the maintenance of an environmental quality adequate for

  1. Discovering Sentinel Rules for Business Intelligence

    DEFF Research Database (Denmark)

    Middelfart, Morten; Pedersen, Torben Bach

    2009-01-01

    to absolute data values, we are able to discover strong and useful sentinel rules that would otherwise be hidden when using sequential pattern mining or correlation techniques. We present a method for sentinel rule discovery and an implementation of this method that scales linearly on large data volumes....

  2. Human exposure and risk assessment associated with mercury contamination in artisanal gold mining areas in the Brazilian Amazon.

    Science.gov (United States)

    Castilhos, Zuleica; Rodrigues-Filho, Saulo; Cesar, Ricardo; Rodrigues, Ana Paula; Villas-Bôas, Roberto; de Jesus, Iracina; Lima, Marcelo; Faial, Kleber; Miranda, Antônio; Brabo, Edilson; Beinhoff, Christian; Santos, Elisabeth

    2015-08-01

    Mercury (Hg) contamination is an issue of concern in the Amazon region due to potential health effects associated with Hg exposure in artisanal gold mining areas. The study presents a human health risk assessment associated with Hg vapor inhalation and MeHg-contaminated fish ingestion, as well as Hg determination in urine, blood, and hair, of human populations (about 325 miners and 321 non-miners) from two gold mining areas in the Brazilian Amazon (São Chico and Creporizinho, Pará State). In São Chico and Creporizinho, 73 fish specimens of 13 freshwater species, and 161 specimens of 11 species, were collected for total Hg determination, respectively. The hazard quotient (HQ) is a risk indicator which defines the ratio of the exposure level and the toxicological reference dose and was applied to determine the threat of MeHg exposure. The mean Hg concentrations in fish from São Chico and Creporizinho were 0.83 ± 0.43 and 0.36 ± 0.33 μg/g, respectively. More than 60 and 22 % of fish collected in São Chico and Creporizinho, respectively, were above the Hg limit (0.5 μg/g) recommended by WHO for human consumption. For all sampling sites, HQ resulted from 1.5 to 28.5, except for the reference area. In Creporizinho, the values of HQ are close to 2 for most sites, whereas in São Chico, there is a hot spot of MeHg contamination in fish (A2-São Chico Reservoir) with the highest risk level (HQ = 28) associated with its human consumption. Mean Hg concentrations in urine, blood, and hair samples indicated that the miners group (in São Chico: urine = 17.37 μg/L; blood = 27.74 μg/L; hair = 4.50 μg/g and in Creporizinho: urine = 13.75 μg/L; blood = 25.23 μg/L; hair: 4.58 μg/g) was more exposed to mercury compared to non-miners (in São Chico: urine = 5.73 μg/L; blood = 16.50 μg/L; hair = 3.16 μg/g and in Creporizinho: urine = 3.91 μg/L; blood = 21.04 μg/L, hair = 1.88 μg/g). These high Hg levels (found

  3. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects.

    Directory of Open Access Journals (Sweden)

    Qingrun Zhang

    2014-06-01

    Full Text Available Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1 how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2 how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1 Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2 To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD data, we found that: (1 angiopoietin 1 (ANGPT1 and four retinal genes interact with Complement Factor H (CFH. (2 GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene. AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple

  4. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects.

    Science.gov (United States)

    Zhang, Qingrun; Long, Quan; Ott, Jurg

    2014-06-01

    Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple

  5. Dissolved metals and associated constituents in abandoned coal-mine discharges, Pennsylvania, USA. Part 2: Geochemical controls on constituent concentrations

    Science.gov (United States)

    Cravotta, C.A.

    2008-01-01

    , and most other trace cations in CMD samples were orders of magnitude less than equilibrium with sulfate, carbonate, and/or hydroxide minerals. Surface complexation (adsorption) by hydrous ferric oxides (HFO) could account for the decreased concentrations of these divalent cations with increased pH. In contrast, increased concentrations of As and, to a lesser extent, Se with increased pH could result from the adsorption of these oxyanions by HFO at low pH and desorption at near-neutral pH. Hence, the solute concentrations in CMD and the purity of associated "ochres" formed in CMD settings are expected to vary with pH and aqueous SO4 concentration, with potential for elevated SO4, As and Se in ochres formed at low pH and elevated Cu, Cd, Pb and Zn in ochres formed at near-neutral pH. Elevated SO4 content of ochres could enhance the adsorption of cations at low pH, but decrease the adsorption of anions such as As. Such information on environmental processes that control element concentrations in aqueous samples and associated precipitates could be useful in the design of systems to reduce dissolved contaminant concentrations and/or to recover potentially valuable constituents in mine effluents.

  6. Towards a database for genotype-phenotype association research: mining data from encyclopaedia

    NARCIS (Netherlands)

    Pajić, V.S.; Pavlović-Lažetić, G.M.; Beljanski, M.V.; Brandt, B.W.; Pajić, M.B.

    2013-01-01

    To associate phenotypic characteristics of an organism to molecules encoded by its genome, there is a need for well-structured genotype and phenotype data. We use a novel method for extracting data on phenotype and genotype characteristics of microorganisms from text. As a resource, we use an

  7. Power System Transient Stability Based on Data Mining Theory

    Science.gov (United States)

    Cui, Zhen; Shi, Jia; Wu, Runsheng; Lu, Dan; Cui, Mingde

    2018-01-01

    In order to study the stability of power system, a power system transient stability based on data mining theory is designed. By introducing association rules analysis in data mining theory, an association classification method for transient stability assessment is presented. A mathematical model of transient stability assessment based on data mining technology is established. Meanwhile, combining rule reasoning with classification prediction, the method of association classification is proposed to perform transient stability assessment. The transient stability index is used to identify the samples that cannot be correctly classified in association classification. Then, according to the critical stability of each sample, the time domain simulation method is used to determine the state, so as to ensure the accuracy of the final results. The results show that this stability assessment system can improve the speed of operation under the premise that the analysis result is completely correct, and the improved algorithm can find out the inherent relation between the change of power system operation mode and the change of transient stability degree.

  8. [Apply association rules to analysis adverse drug reactions of shuxuening injection based on spontaneous reporting system data].

    Science.gov (United States)

    Yang, Wei; Xie, Yan-Ming; Xiang, Yong-Yang

    2014-09-01

    This research based on the analysis of spontaneous reporting system (SRS) data which the 9 601 case reports of Shuxuening injection adverse drug reactions (ADR) in national adverse drug reaction monitoring center during 2005-2012. Apply to the association rules to analysis of the relationship between Shuxuening injection's ADR and the characteristics of ADR reports were. We found that ADR commonly combination were "nausea + breath + chills + vomiting", "nausea + chills + vomiting + palpitations", and their confidence level were 100%. The ADR and the case reports information commonly combination were "itching, and glucose and sodium chloride Injection, and generally ADR report, and normal dosage", "palpitation, and glucose and sodium chloride injection, and normal dosage, and new report", "chills, and generally ADR report, and normal dosage, and 0.9% sodium chloride injection", and their confidence level were 100% too. The results showed that patients using Shuxuening injection occurred most of ADRs were systemic damage, skin and its accessories damage, digestive system damage, etc. And most of cases were generally and new reports, and patients with normal dosage. The ADR's occurred had little related with solvent. It is showed that the Shuxuening injection occurred of ADR mainly related to drug composition. So Shuxuening injection used in clinical need to closely observation, and focus on the ADR reaction, and to do a good job of drug risk management.

  9. Two non-synonymous markers in PTPN21, identified by genome-wide association study data-mining and replication, are associated with schizophrenia.

    LENUS (Irish Health Repository)

    Chen, Jingchun

    2011-09-01

    We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r(2)>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10 × 10(-3) and rs2274736, P=1.21 × 10(-3)). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86-0.97, P=5.45 × 10(-3) and rs2401751, OR=0.92, 95% CI: 0.86-0.97, P=5.29 × 10(-3)). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR=1.08, 95% CI: 1.02-1.14, P=6.43 × 10(-3)). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736\\/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.

  10. DESTAF: a database of text-mined associations for reproductive toxins potentially affecting human fertility.

    Science.gov (United States)

    Dawe, Adam S; Radovanovic, Aleksandar; Kaur, Mandeep; Sagar, Sunil; Seshadri, Sundararajan V; Schaefer, Ulf; Kamau, Allan A; Christoffels, Alan; Bajic, Vladimir B

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database. DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. DESTAF: A database of text-mined associations for reproductive toxins potentially affecting human fertility

    KAUST Repository

    Dawe, Adam Sean

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10. 500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database.DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. © 2011 Elsevier Inc.

  12. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Sediment quality in the Guadalquivir estuary: lethal effects associated with the Aznalcóllar mining spill.

    Science.gov (United States)

    Riba, I; Conradi, M; Forja, J M; DelValls, T A

    2004-01-01

    Monitoring from 1998 to 2001 has assessed the impact of the Aznalcóllar mining spill on the sediment quality in the Guadalquivir estuary. Chemical analysis has been completed with biological effects measured in different organisms. The toxicity of sediments obtained from dilutions of toxic mud and from environmental stations affected by the accidental spill was tested using the amphipod Ampelisca brevicornis and the clam Scrobicularia plana. The results obtained show that amphipods are more sensitive to the accidental spill than the clams. A dilution of clean sediment by more than 1.8% of toxic mud produced 100% mortality of amphipods. In GR2 station is detected toxicity to amphipods but not to clams. The rest of the environmental stations show no toxicity. Toxicity to amphipods in GR2 station decreased along time (from 50% to 60% of mortality in 1998 to 10 to 15% in 2001) and it can be associated with a recovery of the areas impacted by the accidental spill.

  14. Evaluating the role of vegetation on the transport of contaminants associated with a mine tailing using the Phyto-DSS

    Energy Technology Data Exchange (ETDEWEB)

    Cano-Resendiz, Omar [Departamento de Ingenieria Quimica, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Rosa, Guadalupe de la, E-mail: delarosa@quijote.ugto.mx [Departamento de Ingenieria Quimica, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Cruz-Jimenez, Gustavo [Departamento de Farmacia, Universidad de Guanajuato, Noria Alta s/n, CP 36050 Guanajuato (Mexico); Gardea-Torresdey, Jorge L. [Chemistry Department and Environmental Science and Engineering, Ph.D. Program, The University of Texas at El Paso, 500 W. University Ave., 79968 El Paso, TX (United States); Robinson, Brett H. [Agriculture and Life Sciences, Lincoln University, P.O. Box 84 Lincoln, Canterbury 7646 (New Zealand)

    2011-05-15

    We identified contaminants associated with the Cata mine tailing depot located in the outskirts of the city of Guanajuato, Mexico. We also investigated strategies for their phytomanagement. Silver and antimony were present at 39 and 31 mg kg{sup -1}, respectively, some twofold higher than the Dutch Intervention Values. Total and extractable boron (B) occurred at concentrations of 301 and 6.3 mg L{sup -1}, respectively. Concentrations of B in soil solution above 1.9 mg L{sup -1} have been shown to be toxic to plants. Plant growth may also be inhibited by the low concentrations of extractable plant nutrients. Analysis of the aerial portions of Aloe vera (L. Burm.f.) revealed that this plant accumulates negligible concentrations of the identified contaminants. Calculations using a whole system model (Phyto-DSS) showed that establishing a crop of A. vera would have little effect on the drainage or leaching from the site. However, this plant would reduce wind and water erosion and potentially produce valuable cosmetic products. In contrast, crops of poplar, a species that is tolerant to high soil B concentrations, would mitigate leaching from this site. Alternate rows of trees could be periodically harvested and be used for timber or bioenergy.

  15. The Association between Noise, Cortisol and Heart Rate in a Small-Scale Gold Mining Community—A Pilot Study

    Directory of Open Access Journals (Sweden)

    Allyson Green

    2015-08-01

    Full Text Available We performed a cross-sectional pilot study on salivary cortisol, heart rate, and personal noise exposures in a small-scale gold mining village in northeastern Ghana in 2013. Cortisol level changes between morning and evening among participants showed a relatively low decline in cortisol through the day (−1.44 ± 4.27 nmol/L, n = 18, a pattern consistent with chronic stress. A multiple linear regression, adjusting for age, sex, smoking status, and time between samples indicated a significant increase of 0.25 nmol/L cortisol from afternoon to evening per 1 dBA increase in equivalent continuous noise exposure (Leq over that period (95% CI: 0.08–0.42, Adj R2 = 0.502, n = 17. A mixed effect linear regression model adjusting for age and sex indicated a significant increase of 0.29 heart beats per minute (BPM for every 1 dB increase in Leq. Using standard deviations (SDs as measures of variation, and adjusting for age and sex over the sampling period, we found that a 1 dBA increase in noise variation over time (Leq SD was associated with a 0.5 BPM increase in heart rate SD (95% CI: 0.04–−0.9, Adj. R2 = 0.229, n = 16. Noise levels were consistently high, with 24-hour average Leq exposures ranging from 56.9 to 92.0 dBA, with a mean daily Leq of 82.2 ± 7.3 dBA (mean monitoring duration 22.1 ± 1.9 hours, n = 22. Ninety-five percent of participants had 24-hour average Leq noise levels over the 70 dBA World health Organization (WHO guideline level for prevention of hearing loss. These findings suggest that small-scale mining communities may face multiple, potentially additive health risks that are not yet well documented, including hearing loss and cardiovascular effects of stress and noise.

  16. Optimization of cultural conditions for growth associated chromate reduction by Arthrobacter sp. SUK 1201 isolated from chromite mine overburden

    Energy Technology Data Exchange (ETDEWEB)

    Dey, Satarupa, E-mail: dey1919@gmail.com [Microbiology Laboratory, Department of Botany, University of Calcutta, Kolkata 700019 (India); Paul, A.K., E-mail: amalk_paul@yahoo.co.in [Microbiology Laboratory, Department of Botany, University of Calcutta, Kolkata 700019 (India)

    2012-04-30

    Highlights: Black-Right-Pointing-Pointer Isolation of a potent Cr(VI) resistant and reducing Arthrobacter SUK 1201 from chromite mine overburdens of Orissa, India. Black-Right-Pointing-Pointer Phylogenetically (16S rDNA analysis), Arthrobacter SUK 1201 showed 99% nucleotide base pair similarity with Arthrobacter GZK-1. Black-Right-Pointing-Pointer Production of insoluble chromium precipitates during chromate reduction under batch culture by the isolate SUK 1201. Black-Right-Pointing-Pointer Confirmation of formation of insoluble chromium precipitate during reduction studies by EDX analysis. Black-Right-Pointing-Pointer Optimization of cultural conditions for Cr(VI) reduction under batch culture leading to complete reduction of 2 mM of Cr(VI). - Abstract: Arthrobacter sp. SUK 1201, a chromium resistant and reducing bacterium having 99% sequence homology of 16S rDNA with Arthrobacter sp. GZK-1 was isolated from chromite mine overburden dumps of Orissa, India. The objective of the present study was to optimize the cultural conditions for chromate reduction by Arthrobacter sp. SUK 1201. The strain showed 67% reduction of 2 mM chromate in 7 days and was associated with the formation of green insoluble precipitate, which showed characteristic peak of chromium in to energy dispersive X-ray analysis. However, Fourier transform infrared spectra have failed to detect any complexation of end products of Cr(VI) reduction with the cell mass. Reduction of chromate increased with increased cell density and was maximum at 10{sup 10} cells/ml, but the reduction potential decreased with increase in Cr(VI) concentration. Chromate reducing efficiency was promoted when glycerol and glucose was used as electron donors. Optimum pH and temperature of Cr(VI) reduction was 7.0 and 35 Degree-Sign C respectively. The reduction process was inhibited by several metal ions and metabolic inhibitors but not by Cu(II) and DNP. These findings suggest that Arthrobacter sp. SUK 1201 has great promise

  17. Investigation of migratory bird mortality associated with exposure to Soda Ash Mine tailings water in southwestern Wyoming

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Soda ash is a pulverized mineral, commonly referred to as “trona”, and harvested from underground deposits in southwestern Wyoming. Four companies own 5 mining...

  18. Priority pollutants and associated constituents in untreated and treated discharges from coal mining or processing facilities in Pennsylvania, USA

    Science.gov (United States)

    Cravotta, III, Charles A.; Brady, Keith B.C.

    2015-01-01

    Clean sampling and analysis procedures were used to quantify more than 70 inorganic constituents, including 35 potentially toxic or hazardous constituents, organic carbon, and other characteristics of untreated (influent) and treated (effluent) coal-mine discharges (CMD) at 38 permitted coal-mining or coal-processing facilities in the bituminous coalfield and 4 facilities in the anthracite coalfield of Pennsylvania. Of the 42 facilities sampled during 2011, 26 were surface mines, 11 were underground mines, and 5 were coal refuse disposal operations. Treatment of CMD with caustic soda (NaOH), lime (CaO or Ca(OH)2), flocculent, or limestone was ongoing at 21%, 40%, 6%, and 4% of the facilities, respectively; no chemicals were added at the remaining facilities. All facilities with CMD treatment incorporated structures for active or passive aeration and settling of metal-rich precipitate.

  19. Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes

    Science.gov (United States)

    Esmaeily, Habibollah; Tayefi, Maryam; Ghayour-Mobarhan, Majid; Amirabadizadeh, Alireza

    2018-01-27

    Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes.

  20. Granitoid-associated gold mineralization in Egypt: a case study from the Atalla mine

    Science.gov (United States)

    Zoheir, Basem; Deshesh, Fatma; Broman, Curt; Pitcairn, Iain; El-Metwally, Ahmed; Mashaal, Shabaan

    2017-11-01

    Gold-bearing sulfide-quartz veins cutting mainly through the Atalla monzogranite intrusion in the Eastern Desert of Egypt are controlled by subparallel NE-trending brittle shear zones. These veins are associated with pervasive sericite-altered, silicified, and ferruginated rocks. The hosting shear zones are presumed as high-order structures of the Najd-style faults in the Central Eastern Desert ( 615-585 Ma). Ore minerals include an early pyrite-arsenopyrite (±pyrrhotite) mineralization, partly replaced by a late pyrite-galena-sphalerite-chalcopyrite (±gold/electrum ± tetrahedrite ± hessite) assemblage. Gold occurs as small inclusions in pyrite and arsenopyrite, or more commonly as intergrowths with galena and sphalerite/tetrahedrite in microfractures. Arsenopyrite geothermometry suggests formation of the early Fe-As-sulfide mineralization at 380-340 °C, while conditions of deposition of the late base metal-gold assemblage are assumed to be below 300 °C. Rare hessite, electrum, and Bi-galena are associated with sphalerite and gold in the late assemblage. The early and late sulfide minerals show consistently a narrow range of δ34S ‰ (3.4-6.5) that overlaps with sulfur isotopic values in ophiolitic rocks. The Au-quartz veins are characterized by abundant CO2 and H2O ± CO2 ± NaCl inclusions, where three-dimensional clusters of inclusions show variable aqueous/carbonic proportions and broad range of total (bimodal) homogenization temperatures. Heterogeneous entrapment of immiscible fluids is interpreted to be caused by unmixing of an originally homogenous, low salinity ( 2 eq. mass % NaCl) aqueous-carbonic fluid, during transition from lithostatic to hydrostatic conditions. Gold deposition occurred generally under mesothermal conditions, i.e., 1.3 kbar and 280 °C, and continued during system cooling to < 200 °C and pressure decrease to 0.1 kbar. Based on the vein textures, sulfur isotope values, composition of ore fluids, and conditions of ore formation

  1. Sentinel Mining

    DEFF Research Database (Denmark)

    Middelfart, Morten

    into geography dimension) combined with a decrease in the money invested in customer support for laptop computers (drilldown into product dimension) is observed. The work leading to this thesis progressed from algorithms for regular sentinel mining with only one source and one target measure, into algorithms...... progression in the efficiency of sentinel mining, where the latest bitmap-based algorithms, that also take advantage of modern CPUs, are 3–4 orders of magnitude faster than the first SQL-based sentinel mining algorithm. This work also led to the industrial implementation of sentinel mining in the commercial...

  2. Novel LanT associated lantibiotic clusters identified by genome database mining.

    Directory of Open Access Journals (Sweden)

    Mangal Singh

    Full Text Available BACKGROUND: Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. METHODOLOGY/FINDINGS: Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. CONCLUSION: This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and

  3. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia.

    LENUS (Irish Health Repository)

    Chen, X

    2011-11-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611-rs10043986, r(2)=0.008; rs10043986-rs4704591, r(2)=0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR)=1.11, 95% confidence interval (CI)=1.04-1.18, P=8.2 × 10(-4) and rs4704591, OR=1.07, 95% CI=1.03-1.11, P=3.0 × 10(-4)). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR=1.11, 95% CI=1.03-1.17, P=0.0026 and rs4704591, OR=1.07, 95% CI=1.02-1.11, P=0.0015). Furthermore, haplotype conditioned analyses indicated that the association

  4. Mining the Volatilomes of Plant-Associated Microbiota for New Biocontrol Solutions.

    Science.gov (United States)

    Bailly, Aurélien; Weisskopf, Laure

    2017-01-01

    Microbial lifeforms associated with land plants represent a rich source for crop growth- and health-promoting microorganisms and biocontrol agents. Volatile organic compounds (VOCs) produced by the plant microbiota have been demonstrated to elicit plant defenses and inhibit the growth and development of numerous plant pathogens. Therefore, these molecules are prospective alternatives to synthetic pesticides and the determination of their bioactivities against plant threats could contribute to the development of control strategies for sustainable agriculture. In our previous study we investigated the inhibitory impact of volatiles emitted by Pseudomonas species isolated from a potato field against the late blight-causing agent Phytophthora infestans . Besides the well-documented emission of hydrogen cyanide, other Pseudomonas VOCs impeded P. infestans mycelial growth and sporangia germination. Current advances in the field support the emerging concept that the microbial volatilome contains unexploited, eco-friendly chemical resources that could help select for efficient biocontrol strategies and lead to a greener chemical disease management in the field.

  5. QTL detection and elite alleles mining for stigma traits in Oryza sativa by association mapping

    Directory of Open Access Journals (Sweden)

    Xiaojing Dang

    2016-08-01

    Full Text Available Stigma traits are very important for hybrid seed production in Oryza sativa, which is a self-pollinated crop; however, the genetic mechanism controlling the traits is poorly understood. In this study, we investigated the phenotypic data of 227 accessions across two years and assessed their genotypic variation with 249 simple sequence repeat (SSR markers. By combining phenotypic and genotypic data, a genome-wide association (GWA map was generated. Large phenotypic variations in stigma length (STL, stigma brush-shaped part length (SBPL and stigma non-brush-shaped part length (SNBPL were found. Significant positive correlations were identified among stigma traits. In total, 2,072 alleles were detected among 227 accessions, with an average of 8.3 alleles per SSR locus. GWA mapping detected 6 quantitative trait loci (QTLs for the STL, 2 QTLs for the SBPL and 7 QTLs for the SNBPL. Eleven, 5, and 12 elite alleles were found for the STL, SBPL and SNBPL, respectively. Optimal cross designs were predicted for improving the target traits. The detected genetic variation in stigma traits and QTLs provides helpful information for cloning candidate STL genes and breeding rice cultivars with longer STLs in the future.

  6. An improved association-mining research for exploring Chinese herbal property theory: based on data of the Shennong's Classic of Materia Medica.

    Science.gov (United States)

    Jin, Rui; Lin, Zhi-jian; Xue, Chun-miao; Zhang, Bing

    2013-09-01

    Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.

  7. Estimation by PLFA of Microbial Community Structure Associated with the Rhizosphere of Lygeum spartum and Piptatherum miliaceum Growing in Semiarid Mine Tailings

    OpenAIRE

    Carrasco, Lucía; Gattinger, Andreas; Fließbach, Andreas; Roldán, Antonio; Schloter, Michael; Caravaca, Fuensanta

    2009-01-01

    The objective of this study was to compare the microbial community composition and biomass associated with the rhizosphere of a perennial gramineous species (Lygeum spartum L.) with that of an annual (Piptatherum miliaceum L.), both growing in semiarid mine tailings. We also established their relationship with the contents of potentially toxic metals as well as with indicators of soil quality. The total phospholipid fatty acid (PLFA) amount was significantly higher in the rhizosphere soil of ...

  8. Potential ecological and human health risks of heavy metals in surface soils associated with iron ore mining in Pahang, Malaysia.

    Science.gov (United States)

    Diami, Siti Merryan; Kusin, Faradiella Mohd; Madzin, Zafira

    2016-10-01

    The composition of heavy metals (and metalloid) in surface soils of iron ore mine-impacted areas has been evaluated of their potential ecological and human health risks. The mining areas included seven selected locations in the vicinity of active and abandoned iron ore-mining sites in Pahang, Malaysia. Heavy metals such as Fe, Mn, Cu, Zn, Co, Pb, Cr, Ni, and Cd and metalloid As were present in the mining soils of the studied area, while Cu was found exceeding the soil guideline value at all sampling locations. However, the assessment of the potential ecological risk index (RI) indicated low ecological risk (RI between 44 and 128) with respect to Cd, Pb, Cu, As, Zn, Co, and Ni in the surface soils. Contributions of potential ecological risk [Formula: see text]by metal elements to the total potential ecological RI were evident for Cd, As, Pb, and Cu. Contribution of Cu appears to be consistently greater in the abandoned mining area compared to active iron ore-mining site. For non-carcinogenic risk, no significant potential health risk was found to both children and adults as the hazard indices (HIs) were all below than 1. The lifetime cancer risk (LCR) indicated that As has greater potential carcinogenic risk compared to other metals that may induce carcinogenic effects such as Pb, Cr, and Cd, while the LCR of As for children fell within tolerable range for regulatory purposes. Irrespective of carcinogenic or non-carcinogenic risk, greater potential health risk was found among children (by an order of magnitude higher for most metals) compared to adults. The hazard quotient (HQ) and cancer risk indicated that the pathways for the risk to occur were found to be in the order of ingestion > dermal > inhalation. Overall, findings showed that some metals and metalloid were still present at comparable concentrations even long after cessation of the iron ore-mining activities.

  9. Associations between Resting, Activity, and Daily Metabolic Rate in Free-Living Endotherms: No Universal Rule in Birds and Mammals.

    Science.gov (United States)

    Portugal, Steven J; Green, Jonathan A; Halsey, Lewis G; Arnold, Walter; Careau, Vincent; Dann, Peter; Frappell, Peter B; Grémillet, David; Handrich, Yves; Martin, Graham R; Ruf, Thomas; Guillemette, Magella M; Butler, Patrick J

    2016-01-01

    Energy management models provide theories and predictions for how animals manage their energy budgets within their energetic constraints, in terms of their resting metabolic rate (RMR) and daily energy expenditure (DEE). Thus, uncovering what associations exist between DEE and RMR is key to testing these models. Accordingly, there is considerable interest in the relationship between DEE and RMR at both inter- and intraspecific levels. Interpretation of the evidence for particular energy management models is enhanced by also considering the energy spent specifically on costly activities (activity energy expenditure [AEE] = DEE - RMR). However, to date there have been few intraspecific studies investigating such patterns. Our aim was to determine whether there is a generality of intraspecific relationships among RMR, DEE, and AEE using long-term data sets for bird and mammal species. For mammals, we use minimum heart rate (fH), mean fH, and activity fH as qualitative proxies for RMR, DEE, and AEE, respectively. For the birds, we take advantage of calibration equations to convert fH into rate of oxygen consumption in order to provide quantitative proxies for RMR, DEE, and AEE. For all 11 species, the DEE proxy was significantly positively correlated with the RMR proxy. There was also evidence of a significant positive correlation between AEE and RMR in all four mammal species but only in some of the bird species. Our results indicate there is no universal rule for birds and mammals governing the relationships among RMR, AEE, and DEE. Furthermore, they suggest that birds tend to have a different strategy for managing their energy budgets from those of mammals and that there are also differences in strategy between bird species. Future work in laboratory settings or highly controlled field settings can tease out the environmental and physiological processes contributing to variation in energy management strategies exhibited by different species.

  10. Neutralization and attenuation of metal species in acid mine drainage and mine leachates using magnesite: a batch experimental approach

    CSIR Research Space (South Africa)

    Masindi, Vhahangwele

    2014-08-01

    Full Text Available International Mine Water Association Conference – An Interdisciplinary Response to Mine Water Challenges, China University of Mining and Technogy, China, China, 18-22 August 2014 Neutralization and Attenuation of Metal Species in Acid Mine Drainage and Mine...

  11. A cross-sectional survey on knowledge and perceptions of health risks associated with arsenic and mercury contamination from artisanal gold mining in Tanzania

    Directory of Open Access Journals (Sweden)

    Charles Elias

    2013-01-01

    Full Text Available Abstract Background An estimated 0.5 to 1.5 million informal miners, of whom 30-50% are women, rely on artisanal mining for their livelihood in Tanzania. Mercury, used in the processing gold ore, and arsenic, which is a constituent of some ores, are common occupational exposures that frequently result in widespread environmental contamination. Frequently, the mining activities are conducted haphazardly without regard for environmental, occupational, or community exposure. The primary objective of this study was to assess community risk knowledge and perception of potential mercury and arsenic toxicity and/or exposure from artisanal gold mining in Rwamagasa in northwestern Tanzania. Methods A cross-sectional survey of respondents in five sub-villages in the Rwamagasa Village located in Geita District in northwestern Tanzania near Lake Victoria was conducted. This area has a history of artisanal gold mining and many of the population continue to work as miners. Using a clustered random selection approach for recruitment, a total of 160 individuals over 18 years of age completed a structured interview. Results The interviews revealed wide variations in knowledge and risk perceptions concerning mercury and arsenic exposure, with 40.6% (n=65 and 89.4% (n=143 not aware of the health effects of mercury and arsenic exposure respectively. Males were significantly more knowledgeable (n=59, 36.9% than females (n=36, 22.5% with regard to mercury (x2=3.99, px2=22.82, p= Conclusions The knowledge of individuals living in Rwamagasa, Tanzania, an area with a history of artisanal gold mining, varied widely with regard to the health hazards of mercury and arsenic. In these communities there was limited awareness of the threats to health associated with exposure to mercury and arsenic. This lack of knowledge, combined with minimal environmental monitoring and controlled waste management practices, highlights the need for health education, surveillance, and policy

  12. Protecting mine hoisting systems

    Energy Technology Data Exchange (ETDEWEB)

    Sidorenko, V.A.; Shatilo, A.N.

    1982-10-01

    The paper discusses problems associated with coal and rock hoisting in underground coal mines in the USSR. Design of standardized safety systems used in Soviet coal mines is described. Failures of control systems which determine hoisting speed are analyzed. When a cage approaches a loading level or ground level at excessive speed the bumping beams accept cage energy. Cage deformation, damage and hoisting rope damage are the result. Correcting cage position in relation to loading levels is a relatively complicated process. The electronic system for automatic control of cage speed and automatic braking when cage speed exceeds the maximum permissible speed in a mine shaft section is evaluated. System design is shown in a scheme. Its specifications are given. It consists of speed sensors, a system activating safety brakes and a system for cage positioning after safety braking. Use of the safety system in some coal mines is discussed.

  13. Exploring the challenges associated with the greening of supply chains in the South African manganese and phosphate mining industry

    Directory of Open Access Journals (Sweden)

    R.I. David Pooe

    2014-03-01

    Full Text Available As with most mining activities, the mining of manganese and phosphate has serious consequences for the environment. Despite a largely adequate and progressive framework for environmental governance developed since 1994, few mines have integrated systems into their supply chain processes to minimise environmental risks and ensure the achievement of acceptable standards. Indeed, few mines have been able to implement green supply chain management (GrSCM. The purpose of this article was to explore challenges related to the implementation of GrSCM and to provide insight into how GrSCM can be implemented in the South African manganese and phosphate industry. This article reported findings of a qualitative study involving interviews with 12 participants from the manganese and phosphate industry in South Africa. Purposive sampling techniques were used. Emerging from the study were six themes, all of which were identified as key challenges in the implementation of GrSCM in the manganese and phosphate mining industry. From the findings, these challenges include the operationalisation of environmental issues, lack of collaboration and knowledge sharing, proper application of monitoring and control systems,lack of clear policy and legislative direction, the cost of implementing GrSCM practices, and the need for strong leadership and management of change. On the basis of the literature reviewed and empirical findings, conclusions were drawn and policy and management recommendations were accordingly made.

  14. The modernisation of mining

    CSIR Research Space (South Africa)

    Ritchken, E

    2017-10-01

    Full Text Available This presentation discusses the modernisation of mining. The presentation focuses on the mining clusters, Mining Challenges, Compliance versus Collaboration, The Phakisa, The Mining Precinct & the Mining Hub also Win-Win Beneficiation: Iron...

  15. CONFLICTOS ASOCIADOS A LA GRAN 4 MINERÍA EN ANTIOQUIA. CONFLICTS ASSOCIATED WITH LARGE-SCALE MINING IN ANTIOQUIA.

    Directory of Open Access Journals (Sweden)

    Alfonso Insuasty Rodriguez.

    2013-12-01

    Full Text Available El presente artículo es la primera producción de la investigación: Conflictos por el territorio asociados a la gran Minería en Antioquia - Colombia, en este texto se presentan las conclusiones de la primera fase que da cuenta de la dinámica económica extractiva que viene asumiendo Colombia en los últimos 10 años como ruta estratégica que responde a las necesidades de recursos naturales disponibles y a bajo costo que demanda la actual crisis del capital internacional,decisiones que favorecen intereses foráneos pero involucran y ponen en riesgo las lógicas culturales, las autonomía, la soberanía, la vida, la dignidad y el entorno natural de los habitantes de los territorios de interés para el desarrollo de estos grandes proyectos de extracción de recursos naturales. Abstract. This article is the first production of a piece of research: “Conflicts over the territory associated with large-scale mining in Antioquia, Colombia.” It presents the conclusions of the first phase, which gives an account of the extractive economic dynamics that Colombia has been taking in the last 10 years, as a strategic route that responds to the needs of the availability of low-cost natural resources, demanded by the current crisis of the international capital. These decisions favor foreign interests, which involve and jeopardize the cultural logics, autonomy, sovereignty, life, dignity and the natural environment of the inhabitants of the territories of interest to the development of these large natural resources extraction projects.

  16. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    Science.gov (United States)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  17. Lack of parental rule-setting on eating is associated with a wide range of adolescent unhealthy eating behaviour both for boys and girls

    Directory of Open Access Journals (Sweden)

    Jana Holubcikova

    2016-04-01

    Full Text Available Abstract Background Unhealthy eating habits in adolescence lead to a wide variety of health problems and disorders. The aim of this study was to assess the prevalence of absence of parental rules on eating and unhealthy eating behaviour and to explore the relationships between parental rules on eating and a wide range of unhealthy eating habits of boys and girls. We also explored the association of sociodemographic characteristics such as gender, family affluence or parental education with eating related parental rules and eating habits of adolescents. Methods The data on 2765 adolescents aged 13–15 years (mean age: 14.4; 50.7 % boys from the Slovak part of the Health Behaviour in School-Aged Children (HBSC study 2014 were assessed. The associations between eating-related parental rules and unhealthy eating patterns using logistic regression were assessed using logistic regression. Results Unhealthy eating habits occurred frequently among adolescents (range: 18.0 % reported skipping breakfast during weekends vs. 75.8 % for low vegetables intake. Of all adolescents, 20.5 % reported a lack of any parental rules on eating (breakfast not mandatory, meal in front of TV allowed, no rules about sweets and soft drinks. These adolescents were more likely to eat unhealthily, i.e. to skip breakfast on weekdays (odds ratio/95 % confidence interval: 5.33/4.15–6.84 and on weekends (2.66/2.12–3.34, to report low consumption of fruits (1.63/1.30–2.04 and vegetables (1.32/1.04–1.68, and the frequent consumption of sweets (1.59/1.30–1.94, soft drinks (1.93/1.56–2.38 and energy drinks (2.15/1.72–2.70. Conclusions Parental rule-setting on eating is associated with eating behaviours of adolescents. Further research is needed to disentangle causality in this relationship. If causal, parents may be targeted to modify the eating habits of adolescents.

  18. Data mining for the identification of metabolic syndrome status.

    Science.gov (United States)

    Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2018-01-01

    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS.

  19. A REVIEW ON THE DETECTION OF HEART ATTACK USING DATA MINING BY ACO TECHNIQUE

    OpenAIRE

    Pise Satish Prakashrao*1, Anoop Singh 2 & Ritesh Kumar Yadav3

    2018-01-01

    The goal of data mining is to extract knowledge from huge amount of data. Now a day’s data mining technique used in the field of medical diagnose of critical diesis and clinical data. In this research propose model give a solution to predict heart diseases. In this paper proposes a novel approach of applying the Ant Colony Optimization technique (ACO) for extracting the Association Rules (AR) from the database to detect heart attack. This algorithm is broadly are many types of heart disease ...

  20. Heavy metal pollution in soil associated with a large-scale cyanidation gold mining region in southeast of Jilin, China.

    Science.gov (United States)

    Chen, Mo; Lu, Wenxi; Hou, Zeyu; Zhang, Yu; Jiang, Xue; Wu, Jichun

    2017-01-01

    Different gold mining and smelting processes can lead to distinctive heavy metal contamination patterns and results. This work examined heavy metal pollution from a large-scale cyanidation gold mining operation, which is distinguished from artisanal and small-scale amalgamation gold mining, in Jilin Province, China. A total of 20 samples including one background sample were collected from the surface of the mining area and the tailings pond in June 2013. These samples were analyzed for heavy metal concentrations and degree of pollution as well as sources of Cr, Cu, Zn, Pb, Ni, Cd, As, and Hg. The mean concentrations of Pb, Hg, and Cu (819.67, 0.12, and 46.92 mg kg-1, respectively) in soil samples from the gold mine area exceeded local background values. The mean Hg content was less than the first-class standard of the Environmental Quality for Soils, which suggested that the cyanidation method is helpful for reducing Hg pollution. The geochemical accumulation index and enrichment factor results indicated clear signs that enrichment was present for Pb, Cu, and Hg, with the presence of serious Pb pollution and moderate presence to none of Hg and Cu pollution. Multivariate statistical analysis showed that there were three metal sources: (1) Pb, Cd, Cu, and As came from anthropogenic sources; (2) Cr and Zn were naturally occurring; whereas (3) Hg and Ni had a mix of anthropogenic and natural sources. Moreover, the tailings dam plays an important role in intercepting the tailings. Furthermore, the potential ecological risk assessment results showed that the study area poses a potentially strong risk to the ecological health. Furthermore, Pb and Hg (due to high concentration and high toxicity, respectively) are major pollutants on the risk index, and both Pb and Hg pollution should be of great concern at the Haigou gold mines in Jilin, China.

  1. Improving diagnostic accuracy using agent-based distributed data mining system.

    Science.gov (United States)

    Sridhar, S

    2013-09-01

    The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

  2. [Association rules analysis for exploring combined medication characteristics of Fufang Kushen injection: real-world study based on 49 597 cases].

    Science.gov (United States)

    Zhang, Yin; Xie, Yan-Ming; Li, Yan-Nan; Zhang, Chang; Chen, Cen; Zhuang, Yan

    2017-08-01

    The present study aimed to analyze the association rules of Fufang Kushen injection in combined medications in the real world based on electrical medical records in hospital information system, and provide reference for its reasonable clinical application. The electrical medical records of the hospitalized patients using Fufang Kushen injection were extracted to analyze the frequency distribution characteristics in combined application with Western medicine, and analyze the specific association rules between these combinations by using Apriori algorithm. A total of 49 597 patients were included in the study, and its common combined medications included 5-HT receptor blockers, hepatic protector, antibiotics, chemotherapeutic drugs, immunomodulatory drugs, glucocorticoids, analgetics and proton pump inhibitors. The results revealed that the distribution characteristics in combined application and association combinations of Fufang Kushen injection had specific rules, consistent with the clinical orientation of this drug in treatment of malignant tumor. Such results may provide reference for reasonable application of Fufang Kushen injection in clinical treatment. Copyright© by the Chinese Pharmaceutical Association.

  3. TSCA Chemical Data Reporting Fact Sheet: Reporting Manufactured Chemical Substances from Metal Mining and Related Activities

    Science.gov (United States)

    This fact sheet provides guidance on the Chemical Data Reporting (CDR) rule requirements related to the reporting of mined metals, intermediates, and byproducts manufactured during metal mining and related activities.

  4. Leaf-mining by Phyllonorycter blancardella reprograms the host-leaf transcriptome to modulate phytohormones associated with nutrient mobilization and plant defense.

    Science.gov (United States)

    Zhang, Hui; Dugé de Bernonville, Thomas; Body, Mélanie; Glevarec, Gaëlle; Reichelt, Michael; Unsicker, Sybille; Bruneau, Maryline; Renou, Jean-Pierre; Huguet, Elisabeth; Dubreuil, Géraldine; Giron, David

    2016-01-01

    Phytohormones have long been hypothesized to play a key role in the interactions between plant-manipulating organisms and their host-plants such as insect-plant interactions that lead to gall or 'green-islands' induction. However, mechanistic understanding of how phytohormones operate in these plant reconfigurations is lacking due to limited information on the molecular and biochemical phytohormonal modulation following attack by plant-manipulating insects. In an attempt to fill this gap, the present study provides an extensive characterization of how the leaf-miner Phyllonorycter blancardella modulates the major phytohormones and the transcriptional activity of plant cells in leaves of Malus domestica. We show here, that cytokinins strongly accumulate in mined tissues despite a weak expression of plant cytokinin-related genes. Leaf-mining is also associated with enhanced biosynthesis of jasmonic acid precursors but not the active form, a weak alteration of the salicylic acid pathway and a clear inhibition of the abscisic acid pathway. Our study consolidates previous results suggesting that insects may produce and deliver cytokinins to the plant as a strategy to manipulate the physiology of the leaf to create a favorable nutritional environment. We also demonstrate that leaf-mining by P. blancardella leads to a strong reprogramming of the plant phytohormonal balance associated with increased nutrient mobilization, inhibition of leaf senescence and mitigation of plant direct and indirect defense. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Phonological Rules

    Directory of Open Access Journals (Sweden)

    Iman Mingher Obied

    2017-03-01

    Full Text Available The study sheds light on the phonological rules as part of communication used through language. It tackles the reasons behind them, types, characteristics and functions. Finally, it focuses on conclusion that reaches at.

  6. [Clinical study on aconite prescriptions with incompatible herbs in different areas based on association rules and analysis on compatibility features].

    Science.gov (United States)

    Zuo, Ting; Fan, Xin-sheng; Tian, Shuo; Jiang, Chen-xue; Chen, Fei

    2015-03-01

    To explore the current application and features of Aconite prescriptions with incompatible herbs in grade A class three hospitals in east China and central China through a clinical study and comparative analysis. Clinical prescriptions containing Aconite with incompatible herbs were collected. Association rules were utilized to analyze the compatible features of these herbs. This analysis found that the frequently used incompatible herba; pairs are Aconiti Lateralis Radix Praeparata-Pinelliae Rhizoma, with the support rate of 44.45%, occupying nearly half of the surveyed prescriptions; Pinelliae Rhizoma is the most frequently used herb in the two areas, with support rate up to 76.24%. Among the top 10 herbal pairs in the support rate, except for Aconiti Lateralis Radix Praeparata and Pinelliae Rhizoma, the top 10 herbs in Central China were mostly for warming the middle jiao and tonifying qi, such as Zingiberis Rhizoma, Atractylodis Macrocephalae Rhizoma and Codonopsis Radix; Whereas those in east China were mostly for activating and nourishing blood, such as Angelicae Sinensis Radix, Chuanxiong Rhizoma, and Salviae Miltiorrhizae Radix et Rhizoma. Among the top 10 herbal pairs in the support rate, except for Aconiti Lateralis Radix Praeparata-Pinelliae Rhizoma, the core herbal pairs applied in central China were mainly for resolving phlegm and warming the middle jiao, such as Pinelliae Rhizoma-Glycyrrhizae Radix et Rhizoma, Pinelliae Rhizoma-Zingiberis Rhizoma; Whereas those in east China were principally for activating blood and tonifying qi, like Atractylodis Macrocephalae Rhizoma and Pinelliae Rhizoma, Angelicae Sinensis Radix and Pinelliae Rhizoma. Among the core herbal groups in the two areas, the most frequently used herbal groups in the two areas are Aconiti Lateralis Radix Praeparata, Glycyrrhizae Radix et Rhizoma and Pinelliae Rhizoma with the support rate of 59.73%, accounting for the highest proportion among all of herbal groups. There are the combined

  7. Process mining

    DEFF Research Database (Denmark)

    van der Aalst, W.M.P.; Rubin, V.; Verbeek, H.M.W.

    2010-01-01

    Process mining includes the automated discovery of processes from event logs. Based on observed events (e.g., activities being executed or messages being exchanged) a process model is constructed. One of the essential problems in process mining is that one cannot assume to have seen all possible...... behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more...

  8. Transfer of sediment-associated metals downstream of abandoned and active mining sites in the Quesnel River catchment, British Columbia

    NARCIS (Netherlands)

    Perk, M. van der; Lipzig, M.L.H.M. van; Karimlou, G.; Owens, P.N.; Petticrew, E.L.

    2011-01-01

    Metal mining may have considerable impact on downstream water and sediment composition. The rate and extent that metals move downstream determine the magnitude and time scale of downstream sediment contamination. Conversely, the downstream metal content of sediments provide important clues of

  9. Mercury and trace element contents of Donbas coals and associated mine water in the vicinity of Donetsk, Ukraine

    Science.gov (United States)

    Kolker, A.; Panov, B.S.; Panov, Y.B.; Landa, E.R.; Conko, K.M.; Korchemagin, V.A.; Shendrik, T.; McCord, J.D.

    2009-01-01

    Mercury-rich coals in the Donets Basin (Donbas region) of Ukraine were sampled in active underground mines to assess the levels of potentially harmful elements and the potential for dispersion of metals through use of this coal. For 29 samples representing c11 to m3 Carboniferous coals, mercury contents range from 0.02 to 3.5 ppm (whole-coal dry basis). Mercury is well correlated with pyritic sulfur (0.01 to 3.2 wt.%), with an r2 of 0.614 (one outlier excluded). Sulfides in these samples show enrichment of minor constituents in late-stage pyrite formed as a result of interaction of coal with hydrothermal fluids. Mine water sampled at depth and at surface collection points does not show enrichment of trace metals at harmful levels, indicating pyrite stability at subsurface conditions. Four samples of coal exposed in the defunct open-cast Nikitovka mercury mines in Gorlovka have extreme mercury contents of 12.8 to 25.5 ppm. This coal was formerly produced as a byproduct of extracting sandstone-hosted cinnabar ore. Access to these workings is unrestricted and small amounts of extreme mercury-rich coal are collected for domestic use, posing a limited human health hazard. More widespread hazards are posed by the abandoned Nikitovka mercury processing plant, the extensive mercury mine tailings, and mercury enrichment of soils extending into residential areas of Gorlovka.

  10. FROM DATA MINING TO BEHAVIOR MINING

    OpenAIRE

    ZHENGXIN CHEN

    2006-01-01

    Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rathe...

  11. Effective application of improved profit-mining algorithm for the interday trading model.

    Science.gov (United States)

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  12. Education Roadmap for Mining Professionals

    Energy Technology Data Exchange (ETDEWEB)

    none,

    2002-12-01

    This document represents the roadmap for education in the U.S. mining industry. It was developed based on the results of an Education Roadmap Workshop sponsored by the National Mining Association in conjunction with the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, Office of Industrial Technologies. The Workshop was held February 23, 2002 in Phoenix, Arizona.

  13. Characterization of bacterial diversity associated with calcareous deposits and drip-waters, and isolation of calcifying bacteria from two Colombian mines.

    Science.gov (United States)

    García G, Mariandrea; Márquez G, Marco Antonio; Moreno H, Claudia Ximena

    2016-01-01

    Bacterial carbonate precipitation has implications in geological processes and important biotechnological applications. Bacteria capable of precipitating carbonates have been isolated from different calcium carbonate deposits (speleothems) in caves, soil, freshwater and seawater around the world. However, the diversity of bacteria from calcareous deposits in Colombia, and their ability to precipitate carbonates, remains unknown. In this study, conventional microbiological methods and molecular tools, such as temporal temperature gradient electrophoresis (TTGE), were used to assess the composition of bacterial communities associated with carbonate deposits and drip-waters from two Colombian mines. A genetic analysis of these bacterial communities revealed a similar level of diversity, based on the number of bands detected using TTGE. The dominant phylogenetic affiliations of the bacteria, determined using 16S rRNA gene sequencing, were grouped into two phyla: Proteobacteria and Firmicutes. Within these phyla, seven genera were capable of precipitating calcium carbonates: Lysinibacillus, Bacillus, Strenotophomonas, Brevibacillus, Methylobacterium, Aeromicrobium and Acinetobacter. FTIR and SEM/EDX were used to analyze calcium carbonate crystals produced by isolated Acinetobacter gyllenbergii. The results showed that rhombohedral and angular calcite crystals with sizes of 90μm were precipitated. This research provides information regarding the presence of complex bacterial communities in secondary carbonate deposits from mines and their ability to precipitate calcium carbonate from calcareous deposits of Colombian mines. Copyright © 2015 Elsevier GmbH. All rights reserved.

  14. The Effect of Degree of Saturation of Sand on Detonation Phenomena Associated with Shallow-Buried and Ground-Laid Mines

    Directory of Open Access Journals (Sweden)

    M. Grujicic

    2006-01-01

    Full Text Available A new materials model for sand has been developed in order to include the effects of the degree of saturation and the deformation rate on the constitutive response of this material. The model is an extension of the original compaction materials model for sand in which these effects were neglected. The new materials model for sand is next used, within a non-linear-dynamics transient computational analysis, to study various phenomena associated with the explosion of shallow-buried and ground-laid mines. The computational results are compared with the corresponding experimental results obtained through the use of an instrumented horizontal mine-impulse pendulum, pressure transducers buried in sand and a post-detonation metrological study of the sand craters. The results obtained suggest that the modified compaction model for sand captures the essential features of the dynamic behavior of sand and accounts reasonably well for a variety of the experimental findings related to the detonation of shallow-buried or ground-laid mines.

  15. Alleviation of environmental risks associated with severely contaminated mine tailings using amendments: Modeling of trace element speciation, solubility, and plant accumulation.

    Science.gov (United States)

    Pardo, Tania; Bes, Cleménce; Bernal, Maria Pilar; Clemente, Rafael

    2016-11-01

    Tailings are considered one of the most relevant sources of contamination associated with mining activities. Phytostabilization of mine spoils may need the application of the adequate combination of amendments to facilitate plant establishment and reduce their environmental impact. Two pot experiments were set up to assess the capability of 2 inorganic materials (calcium carbonate and a red mud derivate, ViroBind TM ), alone or in combination with organic amendments, for the stabilization of highly acidic trace element-contaminated mine tailings using Atriplex halimus. The effects of the treatments on tailings and porewater physico-chemical properties and trace-element accumulation by the plants, as well as the processes governing trace elements speciation and solubility in soil solution and their bioavailability were modeled. The application of the amendments increased tailings pH and decreased (>99%) trace elements solubility in porewater, but also changed the speciation of soluble Cd, Cu, and Pb. All the treatments made A. halimus growth in the tailings possible; organic amendments increased plant biomass and nutritional status, and reduced trace-element accumulation in the plants. Tailings amendments modified trace-element speciation in porewater (favoring the formation of chlorides and/or organo-metallic forms) and their solubility and plant uptake, which were found to be mainly governed by tailing/porewater pH, electrical conductivity, and organic carbon content, as well as soluble/available trace-element concentrations. Environ Toxicol Chem 2016;35:2874-2884. © 2016 SETAC. © 2016 SETAC.

  16. [A study of association rules in three-dimensional property-taste-effect data of Chinese herbal medicines based on Apriori algorithm].

    Science.gov (United States)

    Jin, Rui; Lin, Qian; Zhang, Bing; Liu, Xin; Liu, Sen-Mao; Zhao, Qian; Liu, Xiu-Lan

    2011-07-01

    The theory of four properties (Qi) and five tastes (Wei) is the core of the property theory of Chinese materia medica. It is known that Qi and Wei are associated with the pharmacological effects (Xiao) of herbs. This study took records of all 365 Chinese herbs in Shennong's Classic of Materia Medica (Shennong Ben Cao Jing) as the data resource and established a three-dimensional data cube, in the purpose of finding out and analyzing the frequent patterns and valued association rules of Qi, Wei and Xiao based on Apriori algorithm. The results of this study may give rise to innovative ideas and methods in research of traditional Chinese materia medica.

  17. Multi stage attack Detection system for Network Administrators using Data Mining

    Energy Technology Data Exchange (ETDEWEB)

    Rajeshwar, Katipally [University of Tennessee; Gasior, Wade C [ORNL; Yang, Dr. Li [University of Tennessee

    2010-04-01

    In this paper, we present a method to discover, visualize, and predict behavior pattern of attackers in a network based system. We proposed a system that is able to discover temporal pattern of intrusion which reveal behaviors of attackers using alerts generated by Intrusion Detection System (IDS). We use data mining techniques to find the patterns of generated alerts by generating Association rules. Our system is able to stream real-time Snort alerts and predict intrusions based on our learned rules. Therefore, we are able to automatically discover patterns in multistage attack, visualize patterns, and predict intrusions.

  18. Comparison of Heuristics for Inhibitory Rule Optimization

    KAUST Repository

    Alsolami, Fawaz

    2014-09-13

    Knowledge representation and extraction are very important tasks in data mining. In this work, we proposed a variety of rule-based greedy algorithms that able to obtain knowledge contained in a given dataset as a series of inhibitory rules containing an expression “attribute ≠ value” on the right-hand side. The main goal of this paper is to determine based on rule characteristics, rule length and coverage, whether the proposed rule heuristics are statistically significantly different or not; if so, we aim to identify the best performing rule heuristics for minimization of rule length and maximization of rule coverage. Friedman test with Nemenyi post-hoc are used to compare the greedy algorithms statistically against each other for length and coverage. The experiments are carried out on real datasets from UCI Machine Learning Repository. For leading heuristics, the constructed rules are compared with optimal ones obtained based on dynamic programming approach. The results seem to be promising for the best heuristics: the average relative difference between length (coverage) of constructed and optimal rules is at most 2.27% (7%, respectively). Furthermore, the quality of classifiers based on sets of inhibitory rules constructed by the considered heuristics are compared against each other, and the results show that the three best heuristics from the point of view classification accuracy coincides with the three well-performed heuristics from the point of view of rule length minimization.

  19. On Intensive Late Holocene Iron Mining and Production in the Northern Congo Basin and the Environmental Consequences Associated with Metallurgy in Central Africa.

    Science.gov (United States)

    Lupo, Karen D; Schmitt, Dave N; Kiahtipes, Christopher A; Ndanga, Jean-Paul; Young, D Craig; Simiti, Bernard

    2015-01-01

    An ongoing question in paleoenvironmental reconstructions of the central African rainforest concerns the role that prehistoric metallurgy played in shaping forest vegetation. Here we report evidence of intensive iron-ore mining and smelting in forested regions of the northern Congo Basin dating to the late Holocene. Volumetric estimates on extracted iron-ore and associated slag mounds from prehistoric sites in the southern Central African Republic suggest large-scale iron production on par with other archaeological and historically-known iron fabrication areas. These data document the first evidence of intensive iron mining and production spanning approximately 90 years prior to colonial occupation (circa AD 1889) and during an interval of time that is poorly represented in the archaeological record. Additional site areas pre-dating these remains by 3-4 centuries reflect an earlier period of iron production on a smaller scale. Microbotanical evidence from a sediment core collected from an adjacent riparian trap shows a reduction in shade-demanding trees in concert with an increase in light-demanding species spanning the time interval associated with iron intensification. This shift occurs during the same time interval when many portions of the Central African witnessed forest transgressions associated with a return to moister and more humid conditions beginning 500-100 years ago. Although data presented here do not demonstrate that iron smelting activities caused widespread vegetation change in Central Africa, we argue that intense mining and smelting can have localized and potentially regional impacts on vegetation communities. These data further demonstrate the high value of pairing archeological and paleoenvironmental analyses to reconstruct regional-scale forest histories.

  20. DATA MINING IN EDUCATION: CURRENT STATE AND PERSPECTIVES OF DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    Yurii O. Kovalchuk

    2016-01-01

    Full Text Available The main tasks (classification and regression, association rules, clustering and the basic principles of the Data Mining algorithms in the context of their use for a variety of research in the field of education which are the subject of a relatively new independent direction Educational Data Mining are considered. The findings about the most popular topics of research within this area as well as the perspectives of its development are presented. Presentation of the material is illustrated by simple examples. This article is intended for readers who are engaged in research in the field of education at various levels, especially those involved in the use of e-learning systems, but little familiar with this area of data analysis.

  1. Mathematical tools for data mining set theory, partial orders, combinatorics

    CERN Document Server

    Simovici, Dan A

    2014-01-01

    Data mining essentially relies on several mathematical disciplines, many of which are presented in this second edition of this book. Topics include partially ordered sets, combinatorics, general topology, metric spaces, linear spaces, graph theory. To motivate the reader a significant number of applications of these mathematical tools are included ranging from association rules, clustering algorithms, classification, data constraints, logical data analysis, etc. The book is intended as a reference for researchers and graduate students. The current edition is a significant expansion of the firs

  2. 75 FR 78169 - Amateur Service Rules

    Science.gov (United States)

    2010-12-15

    ... rules with respect to amateur service vanity call signs. The rules are necessary to amend the amateur... amends the vanity call sign system rules to clarify the date on which the call sign associated with a... the exceptions to the general rule that a call sign is unavailable to the vanity call sign system for...

  3. A cross-sectional survey on knowledge and perceptions of health risks associated with arsenic and mercury contamination from artisanal gold mining in Tanzania.

    Science.gov (United States)

    Charles, Elias; Thomas, Deborah S K; Dewey, Deborah; Davey, Mark; Ngallaba, Sospatro E; Konje, Eveline

    2013-01-25

    An estimated 0.5 to 1.5 million informal miners, of whom 30-50% are women, rely on artisanal mining for their livelihood in Tanzania. Mercury, used in the processing gold ore, and arsenic, which is a constituent of some ores, are common occupational exposures that frequently result in widespread environmental contamination. Frequently, the mining activities are conducted haphazardly without regard for environmental, occupational, or community exposure. The primary objective of this study was to assess community risk knowledge and perception of potential mercury and arsenic toxicity and/or exposure from artisanal gold mining in Rwamagasa in northwestern Tanzania. A cross-sectional survey of respondents in five sub-villages in the Rwamagasa Village located in Geita District in northwestern Tanzania near Lake Victoria was conducted. This area has a history of artisanal gold mining and many of the population continue to work as miners. Using a clustered random selection approach for recruitment, a total of 160 individuals over 18 years of age completed a structured interview. The interviews revealed wide variations in knowledge and risk perceptions concerning mercury and arsenic exposure, with 40.6% (n=65) and 89.4% (n=143) not aware of the health effects of mercury and arsenic exposure respectively. Males were significantly more knowledgeable (n=59, 36.9%) than females (n=36, 22.5%) with regard to mercury (x²=3.99, pmining (n=63, 73.2%) were more knowledgeable about the negative health effects of mercury than individuals in other occupations. Of the few individuals (n=17, 10.6%) who knew about arsenic toxicity, the majority (n=10, 58.8%) were miners. The knowledge of individuals living in Rwamagasa, Tanzania, an area with a history of artisanal gold mining, varied widely with regard to the health hazards of mercury and arsenic. In these communities there was limited awareness of the threats to health associated with exposure to mercury and arsenic. This lack of

  4. Contextual Text Mining

    Science.gov (United States)

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  5. Surface Mines, Other - Longwall Mining Panels

    Data.gov (United States)

    NSGIC Education | GIS Inventory — Coal mining has occurred in Pennsylvania for over a century. A method of coal mining known as Longwall Mining has become more prevalent in recent decades. Longwall...

  6. Mine-induced seismicity at East-Rand proprietary mines

    CSIR Research Space (South Africa)

    Milev, AM

    1995-09-01

    Full Text Available Mining results in seismic activity of varying intensity, from small micro seismic events to larger seismic events, often associated with significant seismic induced damages. This work deals with the understanding of the present seismicity...

  7. Employee motivation and work performance: A comparative study of mining companies in Ghana

    Energy Technology Data Exchange (ETDEWEB)

    Kuranchie-Mensah, E.; Amponsah-Tawiah, K.

    2016-07-01

    The paper empirically compares employee motivation and its impact on performance in Ghanaian Mining Companies, where in measuring performance, the job satisfaction model is used. The study employed exploratory research design in gathering data from four large-scale Gold mining companies in Ghana with regards to their policies and structures in the effectiveness of motivational tools and strategies used by these companies. The study observed that, due to the risk factors associated with the mining industry, management has to ensure that employees are well motivated to curb the rate at which employees embark on industrial unrest which affect performance, and employees are to comply with health and safety rules because the industry contribute hugely to the Gross Domestic Product (GDP) of the country. Limitation to the present study include the researcher’s inability to contact other mining companies. However, the study suggests possibilities for future research including contacting other mining companies, expanding the sample size, managers ensuring that the safety and health needs of staff are addressed particularly those exposed to toxic and harmful chemicals. A lot of studies have been done on mining companies in the past. This paper fills a gap perceived that employees in this sector are highly motivated in spite of the challenges being faced by them, and knowing more about what keeps employees moving is still of national interest. (Author)

  8. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    Science.gov (United States)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  9. Design of data warehouse in teaching state based on OLAP and data mining

    Science.gov (United States)

    Zhou, Lijuan; Wu, Minhua; Li, Shuang

    2009-04-01

    The data warehouse and the data mining technology is one of information technology research hot topics. At present the data warehouse and the data mining technology in aspects and so on commercial, financial industry as well as enterprise's production, market marketing obtained the widespread application, but is relatively less in educational fields' application. Over the years, the teaching and management have been accumulating large amounts of data in colleges and universities, while the data can not be effectively used, in the light of social needs of the university development and the current status of data management, the establishment of data warehouse in university state, the better use of existing data, and on the basis dealing with a higher level of disposal --data mining are particularly important. In this paper, starting from the decision-making needs design data warehouse structure of university teaching state, and then through the design structure and data extraction, loading, conversion create a data warehouse model, finally make use of association rule mining algorithm for data mining, to get effective results applied in practice. Based on the data analysis and mining, get a lot of valuable information, which can be used to guide teaching management, thereby improving the quality of teaching and promoting teaching devotion in universities and enhancing teaching infrastructure. At the same time it can provide detailed, multi-dimensional information for universities assessment and higher education research.

  10. Data mining

    CERN Document Server

    Gorunescu, Florin

    2011-01-01

    The knowledge discovery process is as old as Homo sapiens. Until some time ago, this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since 'knowledge is power'. The goal of this book is to provide, in a friendly way

  11. Mining Review

    Science.gov (United States)

    ,

    2013-01-01

    In 2012, the estimated value of mineral production increased in the United States for the third consecutive year. Production and prices increased for most industrial mineral commodities mined in the United States. While production for most metals remained relatively unchanged, with the notable exception of gold, the prices for most metals declined. Minerals remained fundamental to the U.S. economy, contributing to the real gross domestic product (GDP) at several levels, including mining, processing and manufacturing finished products. Minerals’ contribution to the GDP increased for the second consecutive year.

  12. Source parameters of seismic events potentially associated with damage in block 33/34 of the Kiirunavaara mine (Sweden)

    Science.gov (United States)

    Nordström, Emilia; Dineva, Savka; Nordlund, Erling

    2017-12-01

    Forty-six mining-induced seismic events with moment magnitude between -1.2 and 2.1 that possibly caused damage were studied. The events occurred between 2008 and 2013 at mining level 850-1350 m in the Kiirunavaara Mine (Sweden). Hypocenter locations were refined using from 6 to 130 sensors at distances of up to 1400 m. The source parameters of the events were re-estimated using spectral analysis with a standard Brune model (slope -2). The radiated energy for the studied events varied from 4.7 × 10-1 to 3.8 × 107 J, the source radii from 4 to 110 m, the apparent stress from 6.2 × 102 to 1.1 × 106 Pa, energy ratio ( E s/ E p) from 1.2 to 126, and apparent volume from 1.8 × 103 to 1.1 × 107 m3. 90% of the events were located in the footwall, close to the ore contact. The events were classified as shear/fault slip (FS) or non-shear (NS) based on the E s/ E p ratio (>10 or <10). Out of 46 events 15 events were classified as NS located almost in the whole range between 840 and 1360 m, including many events below the production. The rest 31 FS events were concentrated mostly around the production levels and slightly below them. The relationships between some source parameters and seismic moment/moment magnitude showed dependence on the type of the source mechanism. The energy and the apparent stress were found to be three times larger for FS events than for NS events.

  13. Uranium and Associated Heavy Metals in Ovis aries in a Mining Impacted Area in Northwestern New Mexico

    Directory of Open Access Journals (Sweden)

    Christine Samuel-Nakamura

    2017-07-01

    Full Text Available The objective of this study was to determine uranium (U and other heavy metal (HM concentrations (As, Cd, Pb, Mo, and Se in tissue samples collected from sheep (Ovis aries, the primary meat staple on the Navajo reservation in northwestern New Mexico. The study setting was a prime target of U mining, where more than 1100 unreclaimed abandoned U mines and structures remain. The forage and water sources for the sheep in this study were located within 3.2 km of abandoned U mines and structures. Tissue samples from sheep (n = 3, their local forage grasses (n = 24, soil (n = 24, and drinking water (n = 14 sources were collected. The samples were analyzed using Inductively Coupled Plasma-Mass Spectrometry. Results: In general, HMs concentrated more in the roots of forage compared to the above ground parts. The sheep forage samples fell below the National Research Council maximum tolerable concentration (5 mg/kg. The bioaccumulation factor ratio was >1 in several forage samples, ranging from 1.12 to 16.86 for Mo, Cd, and Se. The study findings showed that the concentrations of HMs were greatest in the liver and kidneys. Of the calculated human intake, Se Reference Dietary Intake and Mo Recommended Dietary Allowance were exceeded, but the tolerable upper limits for both were not exceeded. Food intake recommendations informed by research are needed for individuals especially those that may be more sensitive to HMs. Further study with larger sample sizes is needed to explore other impacted communities across the reservation.

  14. Implementasi Data Warehouse dan Data Mining: Studi Kasus Analisis Peminatan Studi Siswa

    Directory of Open Access Journals (Sweden)

    Eka Miranda

    2011-06-01

    Full Text Available This paper discusses the implementation of data mining and their role in helping decision-making related to students’ specialization program selection. Currently, the university uses a database to store records of transactions which can not directly be used to assist analysis and decision making. Based on these issues then made the data warehouse design used to store large amounts of data and also has the potential to gain new data distribution perspectives and allows to answer the ad hoc question as well as to perform data analysis. The method used consists of: record analysis related to students’ academic achievement, designing data warehouse and data mining. The paper’s results are in a form of data warehouse and data mining design and its implementation with the classification techniques and association rules. From these results can be seen the students’ tendency and pattern background in choosing the specialization, to help them make decisions. 

  15. Competition, Work Rules and Productivity

    OpenAIRE

    Benjamin Bridgman

    2011-01-01

    More competitive markets are associated with higher productivity. However, changes in competition complicate productivity measurement since changing mark-ups may shift factor shares. This paper examines productivity measurement in markets with market power and restrictive work rules: rules that induce wages to be paid for non-productive labor hours. It develops a theoretical model to explain why workers would want restrictive work rules and how competition leads to their reduction. I model a ...

  16. Mining Industry of the Future Vision: The Future Begins with Mining

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1998-09-01

    The Mining Industry of the Future was started in June 1998 when the Chairman of the National Mining Association and the Secretary of Energy entered into a Compact to pursue a collaborative technology research partnership. After the Compact signing, the mining industry developed its vision document, The Future Begins with Mining, A Vision of the Mining Industry of the Future, in September 1998. This vision document lists long-term goals for the mining industry. Stemming from this vision document, targeted technology roadmaps were developed that describe pathways of research to achieve the vision goals.

  17. Mining Distance-Based Outliers in Near Linear Time

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to...

  18. artery disease guidelines with extracted knowledge from data mining

    Directory of Open Access Journals (Sweden)

    Peyman Rezaei-Hachesu

    2017-06-01

    Conclusion: Guidelines confirm the achieved results from data mining (DM techniques and help to rank important risk factors based on national and local information. Evaluation of extracted rules determined new patterns for CAD patients.

  19. Data Mining Techniques in Fraud Detection

    Directory of Open Access Journals (Sweden)

    Rekha Bhowmik

    2008-06-01

    Full Text Available The paper presents application of data mining techniques to fraud analysis. We present some classification and prediction data mining techniques which we consider important to handle fraud detection. There exist a number of data mining algorithms and we present statistics-based algorithm,   decision tree-based algorithm and rule-based algorithm. We present Bayesian classification model to detect fraud in automobile insurance.  Naïve Bayesian visualization is selected to analyze and interpret the classifier predictions. We illustrate how ROC curves can be deployed for model assessment in order to provide a more intuitive analysis of the models.

  20. mining activities.

    African Journals Online (AJOL)

    Eichhornia crassipes) is patchy. L2: Nyikonga, 02°48' 45.0"S,. 007.6"E. (M). Nyikonga area receives discharge from. Nyikonga River that drains Nyarugusu and other mining areas in Geita District. Shoreline vegetation includes Typha capensis ...