WorldWideScience

Sample records for mining algorithms ian

  1. Ian Ingram: Next Animals

    DEFF Research Database (Denmark)

    2015-01-01

    Ian Ingram: Next Animals is an exhibition catalogue presenting research on the work by Ian Ingram in relation to his exhibition Next Animals at Nikolaj Kunsthal in 2015.......Ian Ingram: Next Animals is an exhibition catalogue presenting research on the work by Ian Ingram in relation to his exhibition Next Animals at Nikolaj Kunsthal in 2015....

  2. Frequent Pattern Mining Algorithms for Data Clustering

    DEFF Research Database (Denmark)

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles

    2014-01-01

    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  3. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  4. Obituary: Ian R. Bartky, 1934-2007

    Science.gov (United States)

    Dick, Steven J.

    2009-01-01

    Ian Robertson Bartky, a physical chemist who turned to history for his second career, died 18 December 2007 of complications from lung cancer. He was 73. In addition to his scientific career, he will be remembered for his meticulous research on the evolution of time systems, especially for his two books Selling the True Time: Nineteenth Century Timekeeping in America (Stanford University Press, 2000), and One Time Fits All: The Campaigns for Global Uniformity (Stanford University Press, 2007). Ian was born on 15 March 1934 in Chicago, Illinois. He was the son of Walter Bartky, a Professor of Astronomy at the University of Chicago, and eventually its Dean of the Division of Physical Sciences. The elder Bartky's astronomy textbook, Highlights of Astronomy, published in 1935 and reprinted as late as 1964, includes a considerable discussion of time and standard meridians, which may have influenced Ian, even though his father died in 1958 at the age of 57 when Ian would have been only in his early 20s. Imbued with the love of science from his father, Ian graduated from Illinois Institute of Technology, and went on to obtain his doctorate in physical chemistry from the University of California Berkeley. He mentor was Nobelist William F. Giauque, and Ian always spoke fondly of Giauque's influence in setting rigorous standards that Ian followed when he joined the National Bureau of Standards [NBS] in 1961. Ian spent most of his career there, and it was there that he acquired his professional interest in time, notably when the House Commerce Committee asked him in the mid-1970s to determine whether the dates of Daylight Saving Time should be extended. This resulted in an NBS report in 1976, which concluded that any energy savings would be miniscule. With his usual attention to detail, Ian researched the entire history of the problem, and thus acquired his second great love after science--history. With Elizabeth Harrison he published a well-known article on the issues

  5. Research on parallel algorithm for sequential pattern mining

    Science.gov (United States)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  6. Randomized algorithms in automatic control and data mining

    CERN Document Server

    Granichin, Oleg; Toledano-Kitai, Dvora

    2015-01-01

    In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.

  7. An Evolutionary Algorithm to Mine High-Utility Itemsets

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available High-utility itemset mining (HUIM is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM of association rules (ARs. In this paper, an evolutionary algorithm is presented to efficiently mine high-utility itemsets (HUIs based on the binary particle swarm optimization. A maximal pattern (MP-tree strcutrue is further designed to solve the combinational problem in the evolution process. Substantial experiments on real-life datasets show that the proposed binary PSO-based algorithm has better results compared to the state-of-the-art GA-based algorithm.

  8. Mining algorithm for association rules in big data based on Hadoop

    Science.gov (United States)

    Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

    2018-04-01

    In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

  9. Quantum algorithm for association rules mining

    Science.gov (United States)

    Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan

    2016-10-01

    Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .

  10. Contrast data mining concepts, algorithms, and applications

    CERN Document Server

    Dong, Guozhu

    2012-01-01

    A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies

  11. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Science.gov (United States)

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  12. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

  13. A Mining Algorithm for Extracting Decision Process Data Models

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2011-01-01

    Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.

  14. A partition enhanced mining algorithm for distributed association rule mining systems

    Directory of Open Access Journals (Sweden)

    A.O. Ogunde

    2015-11-01

    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  15. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  16. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    Science.gov (United States)

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  17. pubmed.mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2015-09-29

    Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.

  18. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    Science.gov (United States)

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

    2015-10-01

    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  19. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  20. Towards an evaluation framework for process mining algorithms

    NARCIS (Netherlands)

    Rozinat, A.; Alves De Medeiros, A.K.; Günther, C.W.; Weijters, A.J.M.M.; Aalst, van der W.M.P.

    2007-01-01

    Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended

  1. Effective application of improved profit-mining algorithm for the interday trading model.

    Science.gov (United States)

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  2. Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model

    Directory of Open Access Journals (Sweden)

    Yu-Lung Hsieh

    2014-01-01

    Full Text Available Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  3. Research on Health State Perception Algorithm of Mining Equipment Based on Frequency Closeness

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2014-06-01

    Full Text Available The health state perception of mining equipment is intended to have an online real- time knowledge and analysis of the running conditions of large mining equipments. Due to its unknown failure mode, a challenge was raised to the traditional fault diagnosis of mining equipments. A health state perception algorithm of mining equipment was introduced in this paper, and through continuous sampling of the machine vibration data, the time-series data set was set up; subsequently, the mode set based on the frequency closeness was constructed by the d neighborhood method combined with the TSDM algorithm, thus the forecast method on the basis of the dual mode set was eventually formed. In the calculation of the frequency closeness, the Goertzel algorithm was introduced to effectively decrease the computation amount. It was indicated through the simulation test on the vibration data of the drum shaft base that the health state of the device could be effectively distinguished. The algorithm has been successfully applied to equipment monitoring in the Huoer Xinhe Coal Mine of Shanxi Coal Imp&Exp. Group Co., Ltd.

  4. Non-Fick ian law for the neutron density current

    International Nuclear Information System (INIS)

    Espinosa P, G.; Vazquez R, R.; Morales S, J.

    2008-01-01

    In this paper, a fractional wave equation for the average neutron motion in a nuclear reactor is considered. This representation covers the full spectrum of the average neutron transport behavior, i.e., Fick ian and non-Fick ian effects. The fractional diffusion model retains the main dynamic characteristics of the neutron motion. The relaxation time associated with a rapid variation in the neutron flux contains an adjustable parameter, which can be manipulated to obtain the best representation of the neutron transport phenomena. (Author)

  5. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  6. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

    Science.gov (United States)

    Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

    2017-11-01

    The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.

  7. Developing and Implementing the Data Mining Algorithms in RAVEN

    International Nuclear Information System (INIS)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian

    2015-01-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  8. Developing and Implementing the Data Mining Algorithms in RAVEN

    Energy Technology Data Exchange (ETDEWEB)

    Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  9. On the Suitability of Genetic-Based Algorithms for Data Mining

    NARCIS (Netherlands)

    Choenni, R.S.

    1998-01-01

    Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search

  10. Book Review Psychotherapy and Phenomenology By Ian Rory ...

    African Journals Online (AJOL)

    Book Review Psychotherapy and Phenomenology By Ian Rory Owen (2006) ... Psychotherapy and Phenomenology: On Freud, Husserl and Heidegger. New York: iUniverse. Soft Cover (352 ... AJOL African Journals Online. HOW TO USE ...

  11. An Optimization Routing Algorithm for Green Communication in Underground Mines

    Directory of Open Access Journals (Sweden)

    Heng Xu

    2018-06-01

    Full Text Available With the long-term dependence of humans on ore-based energy, underground mines are utilized around the world, and underground mining is often dangerous. Therefore, many underground mines have established networks that manage and acquire information from sensor nodes deployed on miners and in other places. Since the power supplies of many mobile sensor nodes are batteries, green communication is an effective approach of reducing the energy consumption of a network and extending its longevity. To reduce the energy consumption of networks, all factors that negatively influence the lifetime should be considered. The degree constraint minimum spanning tree (DCMST is introduced in this study to consider all the heterogeneous factors and assign weights for the next step of the evaluation. Then, a genetic algorithm (GA is introduced to cluster sensor nodes in the network and balance energy consumption according to several heterogeneous factors and routing paths from DCMST. Based on a comparison of the simulation results, the optimization routing algorithm proposed in this study for use in green communication in underground mines can effectively reduce the network energy consumption and extend the lifetimes of networks.

  12. Practical mine ventilation optimization based on genetic algorithms for free splitting networks

    Energy Technology Data Exchange (ETDEWEB)

    Acuna, E.; Maynard, R.; Hall, S. [Laurentian Univ., Sudbury, ON (Canada). Mirarco Mining Innovation; Hardcastle, S.G.; Li, G. [Natural Resources Canada, Sudbury, ON (Canada). CANMET Mining and Mineral Sciences Laboratories; Lowndes, I.S. [Nottingham Univ., Nottingham (United Kingdom). Process and Environmental Research Division; Tonnos, A. [Bestech, Sudbury, ON (Canada)

    2010-07-01

    The method used to optimize the design and operation of mine ventilation has generally been based on case studies and expert knowledge. It has yet to benefit from optimization techniques used and proven in other fields of engineering. Currently, optimization of mine ventilation systems is a manual based decision process performed by an experienced mine ventilation specialist assisted by commercial ventilation distribution solvers. These analysis tools are widely used in the mining industry to evaluate the practical and economic viability of alternative ventilation system configurations. The scenario which is usually selected is the one that reports the lowest energy consumption while delivering the required airflow distribution. Since most commercial solvers do not have an integrated optimization algorithm network, the process of generating a series of potential ventilation solutions using the conventional iterative design strategy can be time consuming. For that reason, a genetic algorithm (GA) optimization routine was developed in combination with a ventilation solver to determine the potential optimal solutions of a primary mine ventilation system based on a free splitting network. The optimization method was used in a small size mine ventilation network. The technique was shown to have the capacity to generate good feasible solutions and improve upon the manual results obtained by mine ventilation specialists. 9 refs., 7 tabs., 3 figs.

  13. The Spread of Economic Ideas among Romanian People. Case Study: Dionisie Pop Marţian

    Directory of Open Access Journals (Sweden)

    Angela ROGOJANU

    2010-12-01

    Full Text Available In the nineteenth century, the accelerating globalizationstarted to show demands that the majority of the Romanians could notunderstand. The delay in the economic development, the political-stateestablishment, the scarcity of instruction and education, the historical andgeographical context marked by hostility, all these formed the gap betweenthe "West" and "East". The renewing economic ideas penetrated hard,often deformed ... The relentless intelligence of some young peopleeducated outside the Romanian land, as Dionisie Pop Marţian (1829-1865, has started the struggle for "the economic emancipation of thenation" by promoting the ideas, the principles and the institutions on whichwas build the prosperity of the West. Seen as a "reactionary" or as a "manof progress", Marţian has delivered a heterogeneous economic outlook, amixture of liberal principles and protectionist principles. The mostsignificant "protection" supported by Marţian was the one againstignorance. The compilation made by Marţian using the works of variousauthors sustaining the "social economy" shows the dimensions of economicbackwardness - the absence of current economic terms from the lexicon.Marţian invents some economic terms, which are understandable, such as:„comerciu”(trade, „manufaptură” (manufacture, „product”, „const”,„fair price”, „banc-rupt” etc. Marţian's mission was clear: "the spreadingof economics through speaking and writing.".

  14. Rare itemsets mining algorithm based on RP-Tree and spark framework

    Science.gov (United States)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  15. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  16. Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining

    Directory of Open Access Journals (Sweden)

    P. Kalaivani

    2015-01-01

    Full Text Available With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.

  17. A hybrid heuristic algorithm for the open-pit-mining operational planning problem.

    OpenAIRE

    Souza, Marcone Jamilson Freitas; Coelho, Igor Machado; Ribas, Sabir; Santos, Haroldo Gambini; Merschmann, Luiz Henrique de Campos

    2010-01-01

    This paper deals with the Open-Pit-Mining Operational Planning problem with dynamic truck allocation. The objective is to optimize mineral extraction in the mines by minimizing the number of mining trucks used to meet production goals and quality requirements. According to the literature, this problem is NPhard, so a heuristic strategy is justified. We present a hybrid algorithm that combines characteristics of two metaheuristics: Greedy Randomized Adaptive Search Procedures and General Varia...

  18. Ian Hacking, Learner Categories and Human Taxonomies

    Science.gov (United States)

    Davis, Andrew

    2008-01-01

    I use Ian Hacking's views to explore ways of classifying people, exploiting his distinction between indifferent kinds and interactive kinds, and his accounts of how we "make up" people. The natural kind/essentialist approach to indifferent kinds is explored in some depth. I relate this to debates in psychiatry about the existence of mental…

  19. MINING ON CAR DATABASE EMPLOYING LEARNING AND CLUSTERING ALGORITHMS

    OpenAIRE

    Muhammad Rukunuddin Ghalib; Shivam Vohra; Sunish Vohra; Akash Juneja

    2013-01-01

    In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the known learning algorithms used are Naïve Bayesian (NB) and SMO (Self-Minimal-Optimisation) .Thus the following two learning algorithms are used on a Car review database and thus a model is hence created which predicts the characteristic of a review comment after getting trained. It was found that model successfully predicted correctly about the review comm...

  20. New digital control system for the operation of the Colombian research reactor IAN-R1; Nuevo sistema de control digital para la operacion del reactor de investigacion Colombiano IAN-R1

    Energy Technology Data Exchange (ETDEWEB)

    Celis del A, L.; Rivero, T.; Bucio, F.; Ramirez, R.; Segovia, A.; Palacios, J., E-mail: lina.celis@inin.gob.mx [ININ, Carretera Mexico-Toluca s/n, 52750 Ocoyoacac, Estado de Mexico (Mexico)

    2015-09-15

    En 2011, Mexico won the Colombian international tender for the renewal of instrumentation and control of the IAN-R1 Reactor, to Argentina and the United States. This paper presents the design criteria and the development made for the new digital control system installed in the Colombian nuclear reactor IAN-R1, which is based on a redundant and diverse architecture, which provides increased availability, reliability and safety in the reactor operation. This control system and associated instrumentation met all national export requirements, with the safety requirements established by the IAEA as well as the requirements demanded by the Colombian Regulatory Body in nuclear matter. On August 20, 2012, the Colombian IAN-R1 reactor reached its first criticality controlled with the new system developed at Instituto Nacional de Investigaciones Nucleares (ININ). On September 14, 2012, the new control system of the Colombian IAN-R1 reactor was officially handed over to the Colombian authorities, this being the first time that Mexico exported nuclear technology through the ININ. Currently the reactor is operating successfully with the new control system, and has an operating license for 5 years. (Author)

  1. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    Science.gov (United States)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  2. Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity

    Science.gov (United States)

    Louis, S.J.; Raines, G.L.

    2003-01-01

    We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.

  3. An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

    OpenAIRE

    Yogendra Kumar Jain,; Vinod Kumar Yadav,; Geetika S. Panday

    2011-01-01

    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and cru...

  4. Pattern recognition algorithms for data mining scalability, knowledge discovery and soft granular computing

    CERN Document Server

    Pal, Sankar K

    2004-01-01

    Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.

  5. An imperialist competitive algorithm for solving the production scheduling problem in open pit mine

    Directory of Open Access Journals (Sweden)

    Mojtaba Mokhtarian Asl

    2016-06-01

    Full Text Available Production scheduling (planning of an open-pit mine is the procedure during which the rock blocks are assigned to different production periods in a way that the highest net present value of the project achieved subject to operational constraints. The paper introduces a new and computationally less expensive meta-heuristic technique known as imperialist competitive algorithm (ICA for long-term production planning of open pit mines. The proposed algorithm modifies the original rules of the assimilation process. The ICA performance for different levels of the control factors has been studied and the results are presented. The result showed that ICA could be efficiently applied on mine production planning problem.

  6. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  7. Circuits design of action logics of the protection system of nuclear reactor IAN-R1 of Colombia; Diseno de los circuitos de la logica de actuacion del sistema de proteccion del reactor nuclear IAN-R1 de Colombia

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez M, J. L.; Rivero G, T.; Sainz M, E., E-mail: joseluis.gonzalez@inin.gob.mx [ININ, Carretera Mexico-Toluca s/n, 52750 Ocoyoacac, Estado de Mexico (Mexico)

    2014-10-15

    Due to the obsolescence of the instrumentation and control system of the nuclear research reactor IAN-R1, the Institute of Geology and Mining of Colombia, IngeoMinas, launched an international convoking for renewal it which was won by the Instituto Nacional de Investigaciones Nucleares (ININ). Within systems to design, the reactor protection system is described as important for safety, because this carried out, among others two primary functions: 1) ensuring the reactor shutdown safely, and 2) controlling the interlocks to protect against operational errors if defined conditions have not been met. To fulfill these functions, the various subsystems related to the safety report the state in which they are using binary signals and are connected to the inputs of two redundant logic wiring circuits called action logics (Al) that are part of the reactor protection system. These Al also serve as logical interface to indicate at all times the status of subsystems, both the operator and other systems. In the event that any of the subsystems indicates a state of insecurity in the reactor, the Al generate signals off (or scram) of the reactor, maintaining the interlock until the operator sends a reset signal. In this paper the design, implementation, verification and testing of circuits that make up the Al 1 and 2 of IAN-R1 reactor is described, considering the fulfillment of the requirements that the different international standards imposed on this type of design. (Author)

  8. Ian Bird, head of Grid development at CERN

    CERN Multimedia

    Patrice Loïez

    2003-01-01

    "The Grid enables us to harness the power of scientific computing centres wherever they may be to provide the most powerful computing resource the world has to offer," said Ian Bird, head of Grid development at CERN. The Grid is a new method of sharing processing power between computers in centres around the world.

  9. An Application of Data Mining Algorithms for Shipbuilding Cost Estimation

    NARCIS (Netherlands)

    Kaluzny, B.L.; Barbici, S.; Berg, G.; Chiomento, R.; Derpanis,D.; Jonsson, U.; Shaw, R.H.A.D.; Smit, M.C.; Ramaroson, F.

    2011-01-01

    This article presents a novel application of known data mining algorithms to the problem of estimating the cost of ship development and construction. The work is a product of North Atlantic Treaty Organization Research and Technology Organization Systems Analysis and Studies 076 Task Group “NATO

  10. Modification of the IAN-R1 reactor

    International Nuclear Information System (INIS)

    Jaime, J.; Ahumada, S.; Spin, R.A.

    1990-01-01

    The IAN-R1 reactor is the only nuclear reactor operating in Colombia; it is installed at the Institute of Nuclear Affairs (AIN) in Bogota, which is an official body coming under the Ministry of Mining and Energy. This reactor started operation in January 1965 with a rated power of 10 kW and was modified a year later to operate at 20 kW, which has been its rated power up to the present. Given its importance for the application of nuclear technology in Columbia for various purposes, principally in the areas of neutron activation analysis, determination of uranium content in minerals using the delayed neutron counting method, production of certain radioisotopes such as 198 Au and 82 Br for engineering applications, and production of radioactive material for teaching and research purposes, research has been in progress for some years into ways of increasing its power. The study on experimental requirements and on the demand for locally produced radioisotopes came to the conclusion that its power should be increased to 1000 kW, which would allow the facility to remain on the same site. The modification includes conversion of the core to low-enriched fuel, operation up to 1 MW, modification of the shielding, renovation of instrumentation and installation of a radioisotope processing plant. When the reactor is modified we will be able to produce other radioisotopes for applications in nuclear medicine, industry and engineering; at the same time, the safety of the facility will be optimized and the experimental facilities improved

  11. Circuits design of action logics of the protection system of nuclear reactor IAN-R1 of Colombia

    International Nuclear Information System (INIS)

    Gonzalez M, J. L.; Rivero G, T.; Sainz M, E.

    2014-10-01

    Due to the obsolescence of the instrumentation and control system of the nuclear research reactor IAN-R1, the Institute of Geology and Mining of Colombia, IngeoMinas, launched an international convoking for renewal it which was won by the Instituto Nacional de Investigaciones Nucleares (ININ). Within systems to design, the reactor protection system is described as important for safety, because this carried out, among others two primary functions: 1) ensuring the reactor shutdown safely, and 2) controlling the interlocks to protect against operational errors if defined conditions have not been met. To fulfill these functions, the various subsystems related to the safety report the state in which they are using binary signals and are connected to the inputs of two redundant logic wiring circuits called action logics (Al) that are part of the reactor protection system. These Al also serve as logical interface to indicate at all times the status of subsystems, both the operator and other systems. In the event that any of the subsystems indicates a state of insecurity in the reactor, the Al generate signals off (or scram) of the reactor, maintaining the interlock until the operator sends a reset signal. In this paper the design, implementation, verification and testing of circuits that make up the Al 1 and 2 of IAN-R1 reactor is described, considering the fulfillment of the requirements that the different international standards imposed on this type of design. (Author)

  12. New digital control system for the operation of the Colombian research reactor IAN-R1

    International Nuclear Information System (INIS)

    Celis del A, L.; Rivero, T.; Bucio, F.; Ramirez, R.; Segovia, A.; Palacios, J.

    2015-09-01

    En 2011, Mexico won the Colombian international tender for the renewal of instrumentation and control of the IAN-R1 Reactor, to Argentina and the United States. This paper presents the design criteria and the development made for the new digital control system installed in the Colombian nuclear reactor IAN-R1, which is based on a redundant and diverse architecture, which provides increased availability, reliability and safety in the reactor operation. This control system and associated instrumentation met all national export requirements, with the safety requirements established by the IAEA as well as the requirements demanded by the Colombian Regulatory Body in nuclear matter. On August 20, 2012, the Colombian IAN-R1 reactor reached its first criticality controlled with the new system developed at Instituto Nacional de Investigaciones Nucleares (ININ). On September 14, 2012, the new control system of the Colombian IAN-R1 reactor was officially handed over to the Colombian authorities, this being the first time that Mexico exported nuclear technology through the ININ. Currently the reactor is operating successfully with the new control system, and has an operating license for 5 years. (Author)

  13. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  14. Attribute Index and Uniform Design Based Multiobjective Association Rule Mining with Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    Jie Zhang

    2013-01-01

    Full Text Available In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  15. Attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm.

    Science.gov (United States)

    Zhang, Jie; Wang, Yuping; Feng, Junhong

    2013-01-01

    In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  16. Modernization of control instrumentation and security of reactor IAN - R1

    International Nuclear Information System (INIS)

    Gonzalez, J. M.

    1993-01-01

    The program to modernize IAN-R1 research reactor control and safety instrumentation has been carried out considering two main aspects: updating safety philosophy requirements and acquiring the newest reactor control instrumentation controlled by computer, following the present criteria internationally recognized, for safety and reliable reactor operations and the latest developments of nuclear electronic technology. The new IAN-R1 reactor instrumentation consist of two wide range neutron monitoring channels, commanded by microprocessor a data acquisition system and reactor control, (controlled by computers). The reactor control desk is providing through two displays; all safety and control signals to the reactor operators; furthermore some signals like reactor power, safety and period signals are also showed on digital bar graphics, which are hard wired directly from the neutron monitoring channels

  17. Book Review: Revolutionary Keywords for A New Left by Ian Parker

    Directory of Open Access Journals (Sweden)

    Eyal Z Clyne

    2018-01-01

    Full Text Available Eyal Clyne reviews Ian Parker's "Revolutionary Keywords for A New Left" (Winchester and Washington: Zero books ISBN: 978-1-78535-642-1, a book that unlocks complex Left-struggle issues in short and accessible essays.

  18. Gas Emission Prediction Model of Coal Mine Based on CSBP Algorithm

    Directory of Open Access Journals (Sweden)

    Xiong Yan

    2016-01-01

    Full Text Available In view of the nonlinear characteristics of gas emission in a coal working face, a prediction method is proposed based on cuckoo search algorithm optimized BP neural network (CSBP. In the CSBP algorithm, the cuckoo search is adopted to optimize weight and threshold parameters of BP network, and obtains the global optimal solutions. Furthermore, the twelve main affecting factors of the gas emission in the coal working face are taken as input vectors of CSBP algorithm, the gas emission is acted as output vector, and then the prediction model of BP neural network with optimal parameters is established. The results show that the CSBP algorithm has batter generalization ability and higher prediction accuracy, and can be utilized effectively in the prediction of coal mine gas emission.

  19. Non-Fick ian law for the neutron density current; Atomos para el desarrollo de Mexico

    Energy Technology Data Exchange (ETDEWEB)

    Espinosa P, G.; Vazquez R, R. [UAM-Iztapalapa, Av. San Rafael Atlixco 186, Col. Vicentina, Mexico D.F. 09340 (Mexico); Morales S, J. [UNAM, Laboratorio de Analisis en Ingenieria de Reactores Nucleares, Paseo Cuauhnahuac 8532, Jiutepec, Morelos 62550 (Mexico)]. e-mail: gepe@xanum.uam.mx

    2008-07-01

    In this paper, a fractional wave equation for the average neutron motion in a nuclear reactor is considered. This representation covers the full spectrum of the average neutron transport behavior, i.e., Fick ian and non-Fick ian effects. The fractional diffusion model retains the main dynamic characteristics of the neutron motion. The relaxation time associated with a rapid variation in the neutron flux contains an adjustable parameter, which can be manipulated to obtain the best representation of the neutron transport phenomena. (Author)

  20. A Survey on Accessing Data over Cloud Environment using Data mining Algorithms

    OpenAIRE

    B.Prasanalakshmi; A.Selvaraj

    2015-01-01

    In today's world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large s...

  1. The algebra of observables in Gaußian normal spacetime coordinates

    Energy Technology Data Exchange (ETDEWEB)

    Bodendorfer, Norbert [Faculty of Physics, University of Warsaw,Pasteura 5, 02-093, Warsaw (Poland); Duch, Paweł [Institute of Physics, Jagiellonian University,Łojasiewicza 11, 30-348 Kraków (Poland); Lewandowski, Jerzy; Świeżewski, Jędrzej [Faculty of Physics, University of Warsaw,Pasteura 5, 02-093, Warsaw (Poland)

    2016-01-11

    We discuss the canonical structure of a spacetime version of the radial gauge, i.e. Gaußian normal spacetime coordinates. While it was found for the spatial version of the radial gauge that a “local” algebra of observables can be constructed, it turns out that this is not possible for the spacetime version. The technical reason for this observation is that the new gauge condition needed to upgrade the spatial to a spacetime radial gauge does not Poisson-commute with the previous gauge conditions. It follows that the involved Dirac bracket is inherently non-local in the sense that no complete set of observables can be found which is constructed locally and at the same time has local Dirac brackets. A locally constructed observable here is defined as a finite polynomial of the canonical variables at a given physical point specified by the Gaußian normal spacetime coordinates.

  2. Formulations and algorithms for problems on rock mass and support deformation during mining

    Science.gov (United States)

    Seryakov, VM

    2018-03-01

    The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.

  3. Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan

    Directory of Open Access Journals (Sweden)

    Senol Celik

    Full Text Available ABSTRACT The present study aimed at comparing predictive performance of some data mining algorithms (CART, CHAID, Exhaustive CHAID, MARS, MLP, and RBF in biometrical data of Mengali rams. To compare the predictive capability of the algorithms, the biometrical data regarding body (body length, withers height, and heart girth and testicular (testicular length, scrotal length, and scrotal circumference measurements of Mengali rams in predicting live body weight were evaluated by most goodness of fit criteria. In addition, age was considered as a continuous independent variable. In this context, MARS data mining algorithm was used for the first time to predict body weight in two forms, without (MARS_1 and with interaction (MARS_2 terms. The superiority order in the predictive accuracy of the algorithms was found as CART > CHAID ≈ Exhaustive CHAID > MARS_2 > MARS_1 > RBF > MLP. Moreover, all tested algorithms provided a strong predictive accuracy for estimating body weight. However, MARS is the only algorithm that generated a prediction equation for body weight. Therefore, it is hoped that the available results might present a valuable contribution in terms of predicting body weight and describing the relationship between the body weight and body and testicular measurements in revealing breed standards and the conservation of indigenous gene sources for Mengali sheep breeding. Therefore, it will be possible to perform more profitable and productive sheep production. Use of data mining algorithms is useful for revealing the relationship between body weight and testicular traits in describing breed standards of Mengali sheep.

  4. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  5. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics.

    Science.gov (United States)

    Hauben, Manfred

    2004-09-01

    To compare the results from one frequently cited data mining algorithm with those from a study, which was published in a peer-reviewed journal, that examined the association of pancreatitis with selected atypical antipsychotics observed by traditional rule-based methods of signal detection. Retrospective pharmacovigilance study. The widely studied data mining algorithm known as the Multi-item Gamma Poisson Shrinker (MGPS) was applied to adverse-event reports from the United States Food and Drug Administration's Adverse Event Reporting System database through the first quarter of 2003 for clozapine, olanzapine, and risperidone to determine if a significant signal of pancreatitis would have been generated by this method in advance of their review or the addition of these events to the respective product labels. Data mining was performed by using nine preferred terms relevant to drug-induced pancreatitis from the Medical Dictionary for Regulatory Activities (MedDRA). Results from a previous study on the antipsychotics were reviewed and analyzed. Physicians' Desk References (PDRs) starting from 1994 were manually reviewed to determine the first year that pancreatitis was listed as an adverse event in the product label for each antipsychotic. This information was used as a surrogate marker of the timing of initial signal detection by traditional criteria. Pancreatitis was listed as an adverse event in a PDR for all three atypical antipsychotics. Despite the presence of up to 88 reports/drug-event combination in the Food and Drug Administration's Adverse Event Reporting System database, the MGPS failed to generate a signal of disproportional reporting of pancreatitis associated with the three antipsychotics despite the signaling of these drug-event combinations by traditional rule-based methods, as reflected in product labeling and/or the literature. These discordant findings illustrate key principles in the application of data mining algorithms to drug safety

  6. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  7. Infection by rhinovirus: similarity of clinical signs included in the case definition of influenza IAn/H1N1.

    Science.gov (United States)

    de Oña Navarro, Maria; Melón García, Santiago; Alvarez-Argüelles, Marta; Fernández-Verdugo, Ana; Boga Riveiro, Jose Antonio

    2012-08-01

    Although new influenza virus (IAn/H1N1) infections are mild and indistinguishable from any other seasonal influenza virus infections, there are few data on comparisons of the clinical features of infection with (IAn/H1N1) and with other respiratory viruses. The incidence, clinical aspects and temporal distribution of those respiratory viruses circulating during flu pandemic period were studied. Respiratory samples from patients with acute influenza-like symptoms were collected from May 2009 to December 2009. Respiratory viruses were detected by conventional culture methods and genome amplification techniques. Although IAn/H1N1 was the virus most frequently detected, several other respiratory viruses co-circulated with IAn/H1N1 during the pandemic period, especially rhinovirus. The similarity between clinical signs included in the clinical case definition for influenza and those caused by other respiratory viruses, particularly rhinovirus, suggest that a high percentage of viral infections were clinically diagnosed as case of influenza. Our study offers useful information to face future pandemics caused by influenza virus, indicating that differential diagnoses are required in order to not overestimate the importance of the pandemic. Copyright © 2011 Elsevier España, S.L. All rights reserved.

  8. Human (InConsistencies in Ian McEwan’s Amsterdam

    Directory of Open Access Journals (Sweden)

    Anghel Florentina

    2016-12-01

    Full Text Available Ian McEwan’s Amsterdam has supplied its readers with psychological, moral and social topical issues presented in an easy flowing and exhilarating style. Starting from the assumption that life consists of a series of inconsistencies which are inherent and bring their contribution to the individual’s formation, the paper aims at demonstrating that the protagonists’ judgmental and moral inconsistencies, which are used as a plot generator and are environmentally determined, reveal features of their personality.

  9. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper

    Science.gov (United States)

    Luo, Gang

    2017-01-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022

  10. Mining the multigroup-discrete ordinates algorithm for high quality solutions

    International Nuclear Information System (INIS)

    Ganapol, B.D.; Kornreich, D.E.

    2005-01-01

    A novel approach to the numerical solution of the neutron transport equation via the discrete ordinates (SN) method is presented. The new technique is referred to as 'mining' low order (SN) numerical solutions to obtain high order accuracy. The new numerical method, called the Multigroup Converged SN (MGCSN) algorithm, is a combination of several sequence accelerators: Romberg and Wynn-epsilon. The extreme accuracy obtained by the method is demonstrated through self consistency and comparison to the independent semi-analytical benchmark BLUE. (authors)

  11. A genetic algorithm approach to recognition and data mining

    Energy Technology Data Exchange (ETDEWEB)

    Punch, W.F.; Goodman, E.D.; Min, Pei [Michigan State Univ., East Lansing, MI (United States)] [and others

    1996-12-31

    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  12. An algorithm of opinion leaders mining based on signed network

    Science.gov (United States)

    Cao, Linlin; Zheng, Mingchun; Zhang, Yuanyuan; Zhang, Fuming

    2018-04-01

    With the rapid development of mobile Internet, user gradually become the leader of social media, the abruptly rise of new media has changed the traditional information's dissemination pattern and regularity. There is new era significance of opinion leaders, gatekeepers in the classical theory of mass communication, and it has further expansion and extension to a certain extent. In the existing mining of opinion leaders, it is mainly from the research of network structure and user behavior without considering an important attribute: whether the user has a real impact. In this paper, we take the symbolic network as the research tool, by giving symbol which correspondingly represents support or oppose to the link about point of view relationship between users and combining traditional algorithms of mining with symbolism which can describe the change of view between users, we will get the opinion leader who has real impact on users, then the result is more accurate and effective.

  13. Blood on the Stone Ian Smillie se raconte | CRDI - Centre de ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Ci-dessous, Ian Smillie raconte comment il en est venu à participer à la lutte pour mettre fin au commerce illicite de diamants, qui a attisé des guerres ayant coûté la vie à des millions de personnes en Afrique. Il décrit cette lutte dans son nouveau livre,Blood on the Stone: Greed, Corruption and War in the Global Diamond ...

  14. Calibration of Mine Ventilation Network Models Using the Non-Linear Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Guang Xu

    2017-12-01

    Full Text Available Effective ventilation planning is vital to underground mining. To ensure stable operation of the ventilation system and to avoid airflow disorder, mine ventilation network (MVN models have been widely used in simulating and optimizing the mine ventilation system. However, one of the challenges for MVN model simulation is that the simulated airflow distribution results do not match the measured data. To solve this problem, a simple and effective calibration method is proposed based on the non-linear optimization algorithm. The calibrated model not only makes simulated airflow distribution results in accordance with the on-site measured data, but also controls the errors of other parameters within a minimum range. The proposed method was then applied to calibrate an MVN model in a real case, which is built based on ventilation survey results and Ventsim software. Finally, airflow simulation experiments are carried out respectively using data before and after calibration, whose results were compared and analyzed. This showed that the simulated airflows in the calibrated model agreed much better to the ventilation survey data, which verifies the effectiveness of calibrating method.

  15. Urinary metabolic profiling of asymptomatic acute intermittent porphyria using a rule-mining-based algorithm.

    Science.gov (United States)

    Luck, Margaux; Schmitt, Caroline; Talbi, Neila; Gouya, Laurent; Caradeuc, Cédric; Puy, Hervé; Bertho, Gildas; Pallet, Nicolas

    2018-01-01

    Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease. In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses. Urine 1 H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained. Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients. These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.

  16. A study of the Bienstock-Zuckerberg algorithm, Applications in Mining and Resource Constrained Project Scheduling

    OpenAIRE

    Muñoz, Gonzalo; Espinoza, Daniel; Goycoolea, Marcos; Moreno, Eduardo; Queyranne, Maurice; Rivera, Orlando

    2016-01-01

    We study a Lagrangian decomposition algorithm recently proposed by Dan Bienstock and Mark Zuckerberg for solving the LP relaxation of a class of open pit mine project scheduling problems. In this study we show that the Bienstock-Zuckerberg (BZ) algorithm can be used to solve LP relaxations corresponding to a much broader class of scheduling problems, including the well-known Resource Constrained Project Scheduling Problem (RCPSP), and multi-modal variants of the RCPSP that consider batch proc...

  17. Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Directory of Open Access Journals (Sweden)

    Harry Dhika

    2015-12-01

    Full Text Available The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC. Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC is used to find data tables and comparison Area Under Curve (AUC.

  18. "A'ole" Drugs! Cultural Practices and Drug Resistance of Rural Hawai'ian Youths

    Science.gov (United States)

    Po'A-Kekuawela, Ka'Ohinani; Okamoto, Scott K.; Nebre, La Risa H.; Helm, Susana; Chin, Coralee I. H.

    2009-01-01

    This qualitative study examined how Native Hawai'ian youths from rural communities utilized cultural practices to promote drug resistance and/or abstinence. Forty-seven students from five different middle schools participated in gender-specific focus groups that focused on the cultural and environmental contexts of drug use for Native Hawai'ian…

  19. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  20. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

    Science.gov (United States)

    Ortíz Díaz, Agustín; Ramos-Jiménez, Gonzalo; Frías Blanco, Isvani; Caballero Mota, Yailé; Morales-Bueno, Rafael

    2015-01-01

    The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts. PMID:25879051

  1. An overview of data mining algorithms in drug induced toxicity prediction.

    Science.gov (United States)

    Omer, Ankur; Singh, Poonam; Yadav, N K; Singh, R K

    2014-04-01

    The growth in chemical diversity has increased the need to adjudicate the toxicity of different chemical compounds raising the burden on the demand of animal testing. The toxicity evaluation requires time consuming and expensive undertaking, leading to the deprivation of the methods employed for screening chemicals pointing towards the need to develop more efficient toxicity assessment systems. Computational approaches have reduced the time as well as the cost for evaluating the toxicity and kinetic behavior of any chemical. The accessibility of a large amount of data and the intense need of turning this data into useful information have attracted the attention towards data mining. Machine Learning, one of the powerful data mining techniques has evolved as the most effective and potent tool for exploring new insights on combinatorial relationships among various experimental data generated. The article accounts on some sophisticated machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machine (SVM), k-mean clustering and Self Organizing Maps (SOM) with some of the available tools used for classification, sorting and toxicological evaluation of data, clarifying, how data mining and machine learning interact cooperatively to facilitate knowledge discovery. Addressing the association of some commonly used expert systems, we briefly outline some real world applications to consider the crucial role of data set partitioning.

  2. Personal continuous route pattern mining

    Institute of Scientific and Technical Information of China (English)

    Qian YE; Ling CHEN; Gen-cai CHEN

    2009-01-01

    In the daily life, people often repeat regular routes in certain periods. In this paper, a mining system is developed to find the continuous route patterns of personal past trips. In order to count the diversity of personal moving status, the mining system employs the adaptive GPS data recording and five data filters to guarantee the clean trips data. The mining system uses a client/server architecture to protect personal privacy and to reduce the computational load. The server conducts the main mining procedure but with insufficient information to recover real personal routes. In order to improve the scalability of sequential pattern mining, a novel pattern mining algorithm, continuous route pattern mining (CRPM), is proposed. This algorithm can tolerate the different disturbances in real routes and extract the frequent patterns. Experimental results based on nine persons' trips show that CRPM can extract more than two times longer route patterns than the traditional route pattern mining algorithms.

  3. Who's there? - First morphological and DNA barcoding catalogue of the shallow Hawai'ian sponge fauna.

    Science.gov (United States)

    Núñez Pons, Laura; Calcinai, Barbara; Gates, Ruth D

    2017-01-01

    The sponge fauna has been largely overlooked in the Archipelago of Hawai'i, notwithstanding the paramount role of this taxon in marine ecosystems. The lack of knowledge about Porifera populations inhabiting the Hawai'ian reefs limits the development of ecological studies aimed at understanding the functioning of these marine systems. Consequently, this project addresses this gap by describing the most representative sponge species in the shallow waters of the enigmatic bay of Kane'ohe Bay, in O'ahu Island. A total of 30 species (28 demosponges and two calcareous sponges) living associated to the reef structures are here reported. Six of these species are new records to the Hawai'ian Porifera catalogue and are suspected to be recent introductions to these islands. Morphological descriptions of the voucher specimens are provided, along with sequencing data of two partitions involving the mitochondrial cytochrome oxidase subunit 1 (COI) marker and a fragment covering partial (18S and 28S) and full (ITS-1, 5.8S and ITS-2) nuclear ribosomal genes. Species delimitations based on genetic distances were calculated to valitate how taxonomic assignments from DNA barcoding aligned with morphological identifications. Of the 60 sequences submitted to GenBank ~88% are the first sequencing records for the corresponding species and genetic marker. This work compiles the first catalogue combining morphological characters with DNA barcoding of Hawai'ian sponges, and contributes to the repository of public databases through the Sponge Barcoding Project initiative.

  4. Who's there? - First morphological and DNA barcoding catalogue of the shallow Hawai'ian sponge fauna.

    Directory of Open Access Journals (Sweden)

    Laura Núñez Pons

    Full Text Available The sponge fauna has been largely overlooked in the Archipelago of Hawai'i, notwithstanding the paramount role of this taxon in marine ecosystems. The lack of knowledge about Porifera populations inhabiting the Hawai'ian reefs limits the development of ecological studies aimed at understanding the functioning of these marine systems. Consequently, this project addresses this gap by describing the most representative sponge species in the shallow waters of the enigmatic bay of Kane'ohe Bay, in O'ahu Island. A total of 30 species (28 demosponges and two calcareous sponges living associated to the reef structures are here reported. Six of these species are new records to the Hawai'ian Porifera catalogue and are suspected to be recent introductions to these islands. Morphological descriptions of the voucher specimens are provided, along with sequencing data of two partitions involving the mitochondrial cytochrome oxidase subunit 1 (COI marker and a fragment covering partial (18S and 28S and full (ITS-1, 5.8S and ITS-2 nuclear ribosomal genes. Species delimitations based on genetic distances were calculated to valitate how taxonomic assignments from DNA barcoding aligned with morphological identifications. Of the 60 sequences submitted to GenBank ~88% are the first sequencing records for the corresponding species and genetic marker. This work compiles the first catalogue combining morphological characters with DNA barcoding of Hawai'ian sponges, and contributes to the repository of public databases through the Sponge Barcoding Project initiative.

  5. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

    Science.gov (United States)

    2014-01-01

    Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism

  6. Ian D. Copestake, The Ethics of William Carlos Williams’s Poetry.

    Directory of Open Access Journals (Sweden)

    Aristotle University of Greece

    2011-09-01

    Full Text Available Ian D. Copestake’s monograph on William Carlos Williams’s poetry offers a well-informed and well-documented insight into the connection between Williams’s writing with Unitarianism and Emersonian thinking. In this very well-written and accessible book, the reader gets introduced to a number of poems in addition to excerpts from Williams’s essays, letters and autobiography which facilitate the understanding and appreciation of the poet’s attempt to promote “independent thought and action” (5....

  7. Algorithms for Regular Tree Grammar Network Search and Their Application to Mining Human-viral Infection Patterns.

    Science.gov (United States)

    Smoly, Ilan; Carmel, Amir; Shemer-Avni, Yonat; Yeger-Lotem, Esti; Ziv-Ukelson, Michal

    2016-03-01

    Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.

  8. An uncommon clinical feature of IAN injury after third molar removal: a delayed paresthesia case series and literature review.

    Science.gov (United States)

    Borgonovo, Andrea; Bianchi, Albino; Marchetti, Andrea; Censi, Rachele; Maiorana, Carlo

    2012-05-01

    After an inferior alveolar nerve (IAN) injury, the onset of altered sensation usually begins immediately after surgery. However, it sometimes begins after several days, which is referred to as delayed paresthesia. The authors considered three different etiologies that likely produce inflammation along the nerve trunk and cause delayed paresthesia: compression of the clot, fibrous reorganization of the clot, and nerve trauma caused by bone fragments during clot organization. The aim of this article was to evaluate the etiology of IAN delayed paresthesia, analyze the literature, present a case series related to three different causes of this pathology, and compare delayed paresthesia with the classic immediate symptomatic paresthesia.

  9. Calculation of radiation heat generation on a graphite reflector side of IAN-R1 Reactor

    International Nuclear Information System (INIS)

    Duque O, J.; Velez A, L.H.

    1987-01-01

    Calculation methods for radiation heat generation in nuclear reactor, based on the point kernel approach are revisited and applied to the graphite reflector of IAN-R1 reactor. A Fortran computer program was written for the determination of total heat generation in the reflector, taking 1155 point in it

  10. Who’s there? – First morphological and DNA barcoding catalogue of the shallow Hawai’ian sponge fauna

    Science.gov (United States)

    Gates, Ruth D.

    2017-01-01

    The sponge fauna has been largely overlooked in the Archipelago of Hawai’i, notwithstanding the paramount role of this taxon in marine ecosystems. The lack of knowledge about Porifera populations inhabiting the Hawai’ian reefs limits the development of ecological studies aimed at understanding the functioning of these marine systems. Consequently, this project addresses this gap by describing the most representative sponge species in the shallow waters of the enigmatic bay of Kane’ohe Bay, in O’ahu Island. A total of 30 species (28 demosponges and two calcareous sponges) living associated to the reef structures are here reported. Six of these species are new records to the Hawai’ian Porifera catalogue and are suspected to be recent introductions to these islands. Morphological descriptions of the voucher specimens are provided, along with sequencing data of two partitions involving the mitochondrial cytochrome oxidase subunit 1 (COI) marker and a fragment covering partial (18S and 28S) and full (ITS-1, 5.8S and ITS-2) nuclear ribosomal genes. Species delimitations based on genetic distances were calculated to valitate how taxonomic assignments from DNA barcoding aligned with morphological identifications. Of the 60 sequences submitted to GenBank ~88% are the first sequencing records for the corresponding species and genetic marker. This work compiles the first catalogue combining morphological characters with DNA barcoding of Hawai’ian sponges, and contributes to the repository of public databases through the Sponge Barcoding Project initiative. PMID:29267311

  11. Neutron flux measurement and thermal power calibration of the IAN-R1 TRIGA reactor

    Energy Technology Data Exchange (ETDEWEB)

    Sarta Fuentes, Jose A.; Castiblanco Bohorquez, Luis A

    2008-10-29

    The IAN-R1 TRIGA reactor in Colombia was initially fueled with MTR-HEU enriched to 93% U-235, operated since 1965 at 10 kW, and was upgraded to 30 kW in 1980. General Atomics achieved in 1997 the conversion of HEU fuel to LEU fuel TRIGA type, and upgraded the reactor power to 100 kW. Since the IAN-R1 TRIGA reactor was in an extended shutdown during seven years, it was necessary to repeat some results of the commissioning test conducted in 1997. The thermal power calibration was carried out using the calorimetric method. The reactor was operated approximately at 20 kW during 3.5 hours, with manual power corrections since the automatic control system failed and with the forced refrigeration off. During the calorimetric experiment, the pool temperature was measured with a RTD which is installed near to the core. The dates were collected in intervals of 30 minutes. For establishing thermal power reactor, the water temperature versus the running were registered. For a calculated tank volume of 16 m{sup 3}, the tank constant calculated for the IAN-R1 TRIGA reactor is 0.0539 C/kW-hr. The reactor power determined was 19 kW. The core configuration is a rectangular grid plate that holds a combination of 4-rod and 3-rod clusters. The core contains 50 fuel rods with LEU fuel TRIGA (UZr H1.6) type enriched to 19.7%. The radial reflector consists of twenty graphite elements six of which are used for isotope production. The top an bottom reflectors are the cylindrical graphite end reflectors which are installed above and below of the active fuel section in each fuel rod. The spatial dependence of thermal neutron flux was measured axially in the 3-rod clusters 4C, 3D, 5E and in the 4F graphite element. The spatial distribution of the thermal neutron was determined using a self-powered detector and the absolute value of thermal neutron flux was determined by a gold activation detector. The (n, b- ) reaction is applied to determine the relative spatial distribution of thermal

  12. A hybrid GA-TS algorithm for open vehicle routing optimization of coal mines material

    Energy Technology Data Exchange (ETDEWEB)

    Yu, S.W.; Ding, C.; Zhu, K.J. [China University of Geoscience, Wuhan (China)

    2011-08-15

    In the open vehicle routing problem (OVRP), the objective is to minimize the number of vehicles and the total distance (or time) traveled. This study primarily focuses on solving an open vehicle routing problem (OVRP) by applying a novel hybrid genetic algorithm and the Tabu search (GA-TS), which combines the GA's parallel computing and global optimization with TS's Tabu search skill and fast local search. Firstly, the proposed algorithm uses natural number coding according to the customer demands and the captivity of the vehicle for globe optimization. Secondly, individuals of population do TS local search with a certain degree of probability, namely, do the local routing optimization of all customer sites belong to one vehicle. The mechanism not only improves the ability of global optimization, but also ensures the speed of operation. The algorithm was used in Zhengzhou Coal Mine and Power Supply Co., Ltd.'s transport vehicle routing optimization.

  13. The Exploration of Political Conflicts and Personal Relationships in Ian McEwan’s The Innocent

    Directory of Open Access Journals (Sweden)

    Mina Abbasiyannejad

    2014-03-01

    Full Text Available Political conflicts have historically affected the relationships of nations. Ian McEwan’s The Innocent is an excellent example of a story set within the web of such a conflict—the Cold War—that was brought about by U.S. and Soviet confrontation over spheres of influence after the Second World War. This article aims to show how Ian McEwan pictures Americanization as a form of cultural politics aimed at spreading American influence throughout the occupied countries such as Germany for political domination. Max Weber’s theory of political power along with semiotics as a tool is the framework of the article. Signs that refer to the Americanization process, including inferences in the dialogues, gestures, choice of food, and even clothing, are scrutinized and interpreted within the socio-political context the of Cold War. The analysis of The Innocent provides an example of the ways in which fiction represents political conflicts permeating personal and intimate relationships, and how such conflicts may result in a sense of mistrust and intrigue among both people and nations.

  14. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    Science.gov (United States)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  15. Data mining in agriculture

    CERN Document Server

    Mucherino, Antonio; Pardalos, Panos M

    2009-01-01

    Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.

  16. A genetic algorithm approach for open-pit mine production scheduling

    Directory of Open Access Journals (Sweden)

    Aref Alipour

    2017-06-01

    Full Text Available In an Open-Pit Production Scheduling (OPPS problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA is a well-known member of evolutionary algorithms that widely are utilized to solve NP-hard problems. Herein, GA is implemented in a hypothetical Two-Dimensional (2D copper orebody model. The orebody is featured as two-dimensional (2D array of blocks. Likewise, counterpart 2D GA array was used to represent the OPPS problem’s solution space. Thereupon, the fitness function is defined according to the OPPS problem’s objective function to assess the solution domain. Also, new normalization method was used for the handling of block sequencing constraint. A numerical study is performed to compare the solutions of the exact and GA-based methods. It is shown that the gap between GA and the optimal solution by the exact method is less than % 5; hereupon GA is found to be efficiently in solving OPPS problem.

  17. The "Goldberg Variations" and Ian McEwan's Saturday - a study of interdisciplinary analogies

    OpenAIRE

    Lykka, Inga Hild

    2014-01-01

    This thesis examines the musical influence on Ian McEwan’s fiction, in particular that of the Goldberg Variations’ influence on his novel Saturday. This involves an interdisciplinary analysis that compares the two arts, and sheds light on both possibilities and difficulties with regards to which musical features are likely to occur in literature or not. The analysis is founded on previous interdisciplinary studies of music and literature in general, studies of representations of the Goldberg ...

  18. Neutrons characterization of the nuclear reactor Ian-R1 of Colombia; Caracterizacion de los neutrones del reactor nuclear IAN-R1 de Colombia

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez P, L. X.; Martinez O, S. A. [Universidad Pedagogica y Tecnologica de Colombia, Grupo de Fisica Nuclear Aplicada y Simulacion, Carretera Central del Norte Km. 1, Via Paipa, 150003 Tunja, Boyaca (Colombia); Vega C, H. R., E-mail: s.agustin.martinez@uptc.edu.co [Universidad Autonoma de Zacatecas, Unidad Academica de Estudios Nucleares, Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2014-08-15

    By means of Monte Carlo methods, with the code MCNPX, the neutron characteristics of the research nuclear reactor Ian-R1 of Colombia, in power off but with the neutrons source in their start position, have been valued. The neutrons spectra, the total flow and their average power were calculated in the irradiation spaces inside the graphite reflector, as well as in the cells with air. Also the spectra, the total flow and the absorbed dose were calculated in several places distributed along the radial shaft inside the water moderator. The neutrons total flow was also considered to the long of the axial shaft. The characteristics of the neutrons spectra vary depending on their position regarding the source and the material that surrounds to the cell where the calculation was made. (Author)

  19. Improving diagnostic accuracy using agent-based distributed data mining system.

    Science.gov (United States)

    Sridhar, S

    2013-09-01

    The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

  20. AN EFFECTIVE RECOMMENDATIONS BY DIFFUSION ALGORITHM FOR WEB GRAPH MINING

    Directory of Open Access Journals (Sweden)

    S. Vasukipriya

    2013-04-01

    Full Text Available The information on the World Wide Web grows in an explosive rate. Societies are relying more on the Web for their miscellaneous needs of information. Recommendation systems are active information filtering systems that attempt to present the information items like movies, music, images, books recommendations, tags recommendations, query suggestions, etc., to the users. Various kinds of data bases are used for the recommendations; fundamentally these data bases can be molded in the form of many types of graphs. Aiming at provided that a general framework on effective DR (Recommendations by Diffusion algorithm for web graphs mining. First introduce a novel graph diffusion model based on heat diffusion. This method can be applied to both undirected graphs and directed graphs. Then it shows how to convert different Web data sources into correct graphs in our models.

  1. A Transdisciplinary Mind: An Interview with Ian Mitroff

    Directory of Open Access Journals (Sweden)

    Russ Volckmann

    2006-06-01

    Full Text Available Known more widely as the “Father of Crisis Management,” University of Southern California professor Ian Mitroff came to the work of Ken Wilber and integral theory over two decades ago. No one else has brought an integral perspective to the fields of management and organization theory for as long as Mitroff. In this interview he talks about the development of his theories, the people he has worked closely with, his spiritual development and the streams of his work, including his research on spirituality in organizations. While his involvement with Wilber’s Integral Institute is not what he would like it to be, he sees there the potential to develop an institution that addresses the politicization and failures of our institutions of higher education. In the face of the crisis in leadership, integral and transdisciplinary approaches have the potential for making a positive difference as we are faced with the dissolution of distinctions that underlie how we make meaning in the world.

  2. Mining Product Data Models: A Case Study

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2014-01-01

    Full Text Available This paper presents two case studies used to prove the validity of some data-flow mining algorithms. We proposed the data-flow mining algorithms because most part of mining algorithms focuses on the control-flow perspective. First case study uses event logs generated by an ERP system (Navision after we set several trackers on the data elements needed in the process analyzed; while the second case study uses the event logs generated by YAWL system. We offered a general solution of data-flow model extraction from different data sources. In order to apply the data-flow mining algorithms the event logs must comply a certain format (using InputOutput extension. But to respect this format, a set of conversion tools is needed. We depicted the conversion tools used and how we got the data-flow models. Moreover, the data-flow model is compared to the control-flow model.

  3. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    Science.gov (United States)

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  4. Evolving temporal association rules with genetic algorithms

    OpenAIRE

    Matthews, Stephen G.; Gongora, Mario A.; Hopgood, Adrian A.

    2010-01-01

    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of...

  5. Cani kunagine laulja Tallinnas. Led Zeppelin koguneb taas? Ian Brown müüb oma maja liiga kallilt

    Index Scriptorium Estoniae

    2002-01-01

    Damo Suzuki ja tema uue ansambli Damo Suzuki Network esinemisest Von Krahli Teatri baaris 21.oktoobril. Led Zeppelini plaanist korraldada järgmisel aastal kontserttuur mööda USAd. Ansambli Stone Roses kunagise solisti Ian Browni suvemaja müügist Põhja-Walesis Llithfaenis

  6. Ian Scott, From Pinewood to Hollywood: British Filmmakers in American Cinema, 1910-1969.

    Directory of Open Access Journals (Sweden)

    Hilaria Loyo

    2011-05-01

    Full Text Available Ian Scott’s From Pinewood to Hollywood is a book about the emigration, film careers and socio-cultural influence of British filmmakers who moved to Hollywood during a time period that precedes and follows the studio era, as clearly indicated in its subtitle, British Filmmakers in American Cinema, 1910-1969. Although it is not presented as such, this book can be seen as a timely contribution to the recent academic interest within film studies in the transnational practices that have historical...

  7. A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

    Science.gov (United States)

    Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

    The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

  8. Review of Super Crunchers by Ian Ayers

    Directory of Open Access Journals (Sweden)

    Eric Gaze

    2009-07-01

    Full Text Available Ayers, I. Super Crunchers: Why Thinking-by-Numbers Is the New Way to be Smart. (Bantam Dell Publishing Group, 2007. 272 pp. Hard cover $25 ISBN 978-0-553-80540-6.Super Crunchers tells the story of how analyzing data is changing the ways in which decisions are made. We in the National Numeracy Network make a case for the importance of quantitative literacy by referring to how much quantitative information is now available to each of us: “a world awash in numbers.” Ian Ayres zeroes in on the people who are making a living crunching all of these data. From the seemingly innocuous (how wines are rated, and the scouting of baseball players to the life impacting (diagnosis of disease, and parole of inmates, this book paints a vivid portrayal of how data analysis is impacting decision making at every level in our society. The use of simple regression models and randomized trials is calling into question who the “experts” of the twenty-first century will be, and why thinking-by-numbers really is the new way to be smart.

  9. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  10. Data Mining and Machine Learning in Astronomy

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  11. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...

  12. A node linkage approach for sequential pattern mining.

    Directory of Open Access Journals (Sweden)

    Osvaldo Navarro

    Full Text Available Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT, has better performance and scalability in comparison with state of the art algorithms.

  13. Distributed genetic process mining

    NARCIS (Netherlands)

    Bratosin, C.C.; Sidorova, N.; Aalst, van der W.M.P.

    2010-01-01

    Process mining aims at discovering process models from data logs in order to offer insight into the real use of information systems. Most of the existing process mining algorithms fail to discover complex constructs or have problems dealing with noise and infrequent behavior. The genetic process

  14. Calculations and selection of a TRIGA core for the Nuclear Reactor IAN-R1

    International Nuclear Information System (INIS)

    Castiblanco, L.A.; Sarta, J.A.

    1997-01-01

    The Reactor Group used the code WIMS reduced to five groups of energy, together with the code CITATION, and evaluated four configurations for a core, according to the grid actually installed. The four configurations were taken from the two proposals presented to the Instituto de Ciencias Nucleares y Energias Alternativas by General Atomics Company. In this paper, the Authors selected the best configuration according to the performance of flux distribution and excess reactivity, for a TRIGA core to be installed in the Nuclear Reactor IAN-R1

  15. His Excellency Mr Ian de Jong, Ambassador, Permanent Representative of the Kingdom of the Netherlands to the United Nations Office in Geneva

    CERN Multimedia

    Maximilien Brice

    2003-01-01

    Visit of His Excellency Mr Ian de Jong, Ambassador, Permanent Representative of the Kingdom of the Netherlands to the United Nations Office in Geneva, June 2003. From left to right: Dr Albert Ijspeert, Deputy Leader, Magnet and electrical systems Group, Accelerator Technology Division; Mr Maarten Wilbers, Legal Service; Prof. Cecilia Jarlskog, Adviser to the Director-General for Member State Relations; Mr Jan van der Boon, Director of Administration; His Excellency Mr Ian de Jong, Ambassador, Permanent Representative of the Kingdom of the Netherlands to the United Nations Office in Geneva; Prof. Frank Linde, NIKHEF; Dr Lucie Linssen Experimental Physics Division, Technical Assistance Group and Mr C. J. van Riel, Ministry of Education, Culture and Science, Netherlands, Dutch Delegate to Council and Finance Committee.

  16. Milk progesterone to monitor reproductive performance in Holstein Fries ian cows

    International Nuclear Information System (INIS)

    Lubbadeh, W.F.

    1995-01-01

    A study was conducted to monitor reproductive of lactating Holstein Fries ian cows by measuring milk progesterone levels. Sequential post-partum milk samples were collected weekly throughout 20 weeks after pregnancy. Progesterone concentrations were determined by solid phase RIA. Lactating cows required an average of 5.2 weeks to resume luteal activity; 48% of the cows conceived after first insemination and had significantly high progesterone concentrations during the first 5 weeks after insemination than cows which returned to heat 5-8 weeks after insemination> Results also revealed that adequate level of progesterone, which varied between 4.2 and 9.1 nmol/l, is required to maintain early pregnancy and progesterone level remains high in pregnant cows. (Author) 17 refs., 3 Tabs

  17. An optimization framework for process discovery algorithms

    NARCIS (Netherlands)

    Weijters, A.J.M.M.; Stahlbock, R.

    2011-01-01

    Today there are many process mining techniques that, based on an event log, allow for the automatic induction of a process model. The process mining algorithms that are able to deal with incomplete event logs, exceptions, and noise typically have many parameters to tune the algorithm. Therefore, the

  18. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  19. Graph Mining Meets the Semantic Web

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sangkeun (Matt) [ORNL; Sukumar, Sreenivas R [ORNL; Lim, Seung-Hwan [ORNL

    2015-01-01

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluate the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.

  20. Big data mining analysis method based on cloud computing

    Science.gov (United States)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  1. Neutrons characterization of the nuclear reactor Ian-R1 of Colombia

    International Nuclear Information System (INIS)

    Gonzalez P, L. X.; Martinez O, S. A.; Vega C, H. R.

    2014-08-01

    By means of Monte Carlo methods, with the code MCNPX, the neutron characteristics of the research nuclear reactor Ian-R1 of Colombia, in power off but with the neutrons source in their start position, have been valued. The neutrons spectra, the total flow and their average power were calculated in the irradiation spaces inside the graphite reflector, as well as in the cells with air. Also the spectra, the total flow and the absorbed dose were calculated in several places distributed along the radial shaft inside the water moderator. The neutrons total flow was also considered to the long of the axial shaft. The characteristics of the neutrons spectra vary depending on their position regarding the source and the material that surrounds to the cell where the calculation was made. (Author)

  2. Web Mining and Social Networking

    CERN Document Server

    Xu, Guandong; Li, Lin

    2011-01-01

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal s

  3. High utility-itemset mining and privacy-preserving utility mining

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2016-03-01

    Full Text Available In recent decades, high-utility itemset mining (HUIM has emerging a critical research topic since the quantity and profit factors are both concerned to mine the high-utility itemsets (HUIs. Generally, data mining is commonly used to discover interesting and useful knowledge from massive data. It may, however, lead to privacy threats if private or secure information (e.g., HUIs are published in the public place or misused. In this paper, we focus on the issues of HUIM and privacy-preserving utility mining (PPUM, and present two evolutionary algorithms to respectively mine HUIs and hide the sensitive high-utility itemsets in PPUM. Extensive experiments showed that the two proposed models for the applications of HUIM and PPUM can not only generate the high quality profitable itemsets according to the user-specified minimum utility threshold, but also enable the capability of privacy preserving for private or secure information (e.g., HUIs in real-word applications.

  4. Algorithms for Academic Search and Recommendation Systems

    DEFF Research Database (Denmark)

    Amolochitis, Emmanouil

    2014-01-01

    are part of a developed Movie Recommendation system, the first such system to be commercially deployed in Greece by a major Triple Play services provider. In the third part of the work we present the design of a quantitative association rule mining algorithm. The introduced mining algorithm processes......In this work we present novel algorithms for academic search, recommendation and association rules mining. In the first part of the work we introduce a novel hierarchical heuristic scheme for re-ranking academic publications. The scheme is based on the hierarchical combination of a custom...... implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper’s index terms with each other. On the second part we describe the design of hybrid recommender ensemble (user, item and content based). The newly introduced algorithms...

  5. Mining of high utility-probability sequential patterns from uncertain databases.

    Directory of Open Access Journals (Sweden)

    Binbin Zhang

    Full Text Available High-utility sequential pattern mining (HUSPM has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs. They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM for mining high utility-probability sequential patterns (HUPSPs in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds.

  6. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  7. Text Clustering Algorithm Based on Random Cluster Core

    Directory of Open Access Journals (Sweden)

    Huang Long-Jun

    2016-01-01

    Full Text Available Nowadays clustering has become a popular text mining algorithm, but the huge data can put forward higher requirements for the accuracy and performance of text mining. In view of the performance bottleneck of traditional text clustering algorithm, this paper proposes a text clustering algorithm with random features. This is a kind of clustering algorithm based on text density, at the same time using the neighboring heuristic rules, the concept of random cluster is introduced, which effectively reduces the complexity of the distance calculation.

  8. Setup of a testing environment for mission planning in mining

    NARCIS (Netherlands)

    Groenen, J.P.J.; Steinbuch, M.

    2013-01-01

    Mission planning algorithms for surface mining applications are difficult to test as a result of the large scale tasks. To validate these algorithms, a scaled setup is created where the mining excavator is mimicked by an industrial robot. This report discusses the development of a software

  9. Applied data mining

    CERN Document Server

    Xu, Guandong

    2013-01-01

    Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.

  10. pubmed. mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...

  11. Physics Mining of Multi-Source Data Sets

    Science.gov (United States)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  12. Declarative Process Mining for DCR Graphs

    DEFF Research Database (Denmark)

    Debois, Søren; Hildebrandt, Thomas T.; Laursen, Paw Høvsgaard

    2017-01-01

    We investigate process mining for the declarative Dynamic Condition Response (DCR) graphs process modelling language. We contribute (a) a process mining algorithm for DCR graphs, (b) a proposal for a set of metrics quantifying output model quality, and (c) a preliminary example-based comparison...

  13. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  14. High Performance Data mining by Genetic Neural Network

    Directory of Open Access Journals (Sweden)

    Dadmehr Rahbari

    2013-10-01

    Full Text Available Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Most methods for mining problems is based on artificial intelligence algorithms. Neural network optimization based on three basic parameters topology, weights and the learning rate is a powerful method. We introduce optimal method for solving this problem. In this paper genetic algorithm with mutation and crossover operators change the network structure and optimized that. Dataset used for our work is stroke disease with twenty features that optimized number of that achieved by new hybrid algorithm. Result of this work is very well incomparison with other similar method. Low present of error show that our method is our new approach to efficient, high-performance data mining problems is introduced.

  15. Research of Improved Apriori Algorithm Based on Itemset Array

    Directory of Open Access Journals (Sweden)

    Naili Liu

    2013-06-01

    Full Text Available Mining frequent item sets is a major key process in data mining research. Apriori and many improved algorithms are lowly efficient because they need scan database many times and storage transaction ID in memory, so time and space overhead is very high. Especially, they are lower efficient when they process large scale database. The main task of the improved algorithm is to reduce time and space overhead for mining frequent item sets. Because, it scans database only once to generate binary item set array, it adopts binary instead of transaction ID when it storages transaction flag, it adopts logic AND operation to judge whether an item set is frequent item set. Moreover, the improved algorithm is more suitable for large scale database. Experimental results show that the improved algorithm has better efficiency than classic Apriori algorithm.

  16. New plasmid-mediated aminoglycoside 6'-N-acetyltransferase, AAC(6')-Ian, and ESBL, TLA-3, from a Serratia marcescens clinical isolate.

    Science.gov (United States)

    Jin, Wanchun; Wachino, Jun-Ichi; Kimura, Kouji; Yamada, Keiko; Arakawa, Yoshichika

    2015-05-01

    Enterobacteriaceae clinical isolates showing amikacin resistance (MIC 64 to >256 mg/L) in the absence of 16S rRNA methyltransferase (MTase) genes were found. The aim of this study was to clarify the molecular mechanisms underlying amikacin resistance in Enterobacteriaceae clinical isolates that do not produce 16S rRNA MTases. PCR was performed to detect already-known amikacin resistance determinants. Cloning experiments and sequence analyses were performed to characterize unknown amikacin resistance determinants. Transfer of amikacin resistance determinants was performed by conjugation and transformation. The complete nucleotide sequence of the plasmids was determined by next-generation sequencing technology. Amikacin resistance enzymes were purified with a column chromatography system. The enzymatic function of the purified protein was investigated by thin-layer chromatography (TLC) and HPLC. Among the 14 isolates, 9 were found to carry already-known amikacin resistance determinants such as aac(6')-Ia and aac(6')-Ib. Genetic analyses revealed the presence of a new amikacin acetyltransferase gene, named aac(6')-Ian, located on a 169 829 bp transferable plasmid (p11663) of the Serratia marcescens strain NUBL-11663, one of the five strains negative for known aac(6') genes by PCR. Plasmid p11663 also carried a novel ESBL gene, named blaTLA-3. HPLC and TLC analyses demonstrated that AAC(6')-Ian catalysed the transfer of an acetyl group from acetyl coenzyme A onto an amine at the 6'-position of various aminoglycosides. We identified aac(6')-Ian as a novel amikacin resistance determinant together with a new ESBL gene, blaTLA-3, on a transferable plasmid of a S. marcescens clinical isolate. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. EMiT: a process mining tool

    NARCIS (Netherlands)

    Dongen, van B.F.; Aalst, van der W.M.P.; Cortadella, J.; Reisig, W.

    2004-01-01

    Process mining offers a way to distill process models from event logs originating from transactional systems in logistics, banking, e-business, health-care, etc. The algorithms used for process mining are complex and in practise large logs are needed to derive a high-quality process model. To

  18. A direct mining approach to efficient constrained graph pattern discovery

    DEFF Research Database (Denmark)

    Zhu, Feida; Zhang, Zequn; Qu, Qiang

    2013-01-01

    Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-spe...

  19. Grade Distribution Modeling within the Bauxite Seams of the Wachangping Mine, China, Using a Multi-Step Interpolation Algorithm

    Directory of Open Access Journals (Sweden)

    Shaofeng Wang

    2017-05-01

    Full Text Available Mineral reserve estimation and mining design depend on a precise modeling of the mineralized deposit. A multi-step interpolation algorithm, including 1D biharmonic spline estimator for interpolating floor altitudes, 2D nearest neighbor, linear, natural neighbor, cubic, biharmonic spline, inverse distance weighted, simple kriging, and ordinary kriging interpolations for grade distribution on the two vertical sections at roadways, and 3D linear interpolation for grade distribution between sections, was proposed to build a 3D grade distribution model of the mineralized seam in a longwall mining panel with a U-shaped layout having two roadways at both sides. Compared to field data from exploratory boreholes, this multi-step interpolation using a natural neighbor method shows an optimal stability and a minimal difference between interpolation and field data. Using this method, the 97,576 m3 of bauxite, in which the mass fraction of Al2O3 (Wa and the mass ratio of Al2O3 to SiO2 (Wa/s are 61.68% and 27.72, respectively, was delimited from the 189,260 m3 mineralized deposit in the 1102 longwall mining panel in the Wachangping mine, Southwest China. The mean absolute errors, the root mean squared errors and the relative standard deviations of errors between interpolated data and exploratory grade data at six boreholes are 2.544, 2.674, and 32.37% of Wa; and 1.761, 1.974, and 67.37% of Wa/s, respectively. The proposed method can be used for characterizing the grade distribution in a mineralized seam between two roadways at both sides of a longwall mining panel.

  20. Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.

    Science.gov (United States)

    Al-Saggaf, Yeslam; Islam, Md Zahidul

    2015-08-01

    This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.

  1. Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

    Directory of Open Access Journals (Sweden)

    Lixiong Xu

    2017-01-01

    Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

  2. Prediction of pork quality parameters by applying fractals and data mining on MRI

    DEFF Research Database (Denmark)

    Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés

    2017-01-01

    This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One...... Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear...... regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate...

  3. Computer-aided system for fire fighting in an underground mine

    Energy Technology Data Exchange (ETDEWEB)

    Rosiek, F; Sikora, M; Urbanski, J [Politechnika Wroclawska (Poland). Instytut Gornictwa

    1989-01-01

    Discusses structure of an algorithm for computer-aided planning of fire fighting and rescue in an underground coal mine. The algorithm developed by the Mining Institute of the Wroclaw Technical University consists of ten options: regulations on fire fighting, fire alarm for miners working underground (rescue ways, fire zones etc.), information system for mine management, movements of fire fighting teams, distribution of fire fighting equipment, assessment of explosion hazards of fire gases, fire gas temperature control of blower operation, detection of endogenous fires, ventilation control. 2 refs.

  4. Set-Oriented Mining for Association Rules in Relational Databases

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Houtsma, M.A.W.; Swami, A.

    1995-01-01

    Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these

  5. Climatic zonation and land suitability determination for saffron in Khorasan-Razavi province using data mining algorithms

    Directory of Open Access Journals (Sweden)

    mehdi Bashiri

    2017-12-01

    Full Text Available Yield prediction for agricultural crops plays an important role in export-import planning, purchase guarantees, pricing, secure profits and increasing in agricultural productivity. Crop yield is affected by several parameters especially climate. In this study, the saffron yield in the Khorasan-Razavi province was evaluated by different classification algorithms including artificial neural networks, regression models, local linear trees, decision trees, discriminant analysis, random forest, support vector machine and nearest neighbor analysis. These algorithms analyzed data for 20 years (1989-2009 including 11 climatological parameters. The results showed that a few numbers of climatological parameters affect the saffron yield. The minimum, mean and maximum of temperature, had the highest positive correlations and the relative humidity of 6.5h, sunny hours, relative humidity of 18.5h, evaporation, relative humidity of 12.5h and absolute humidity had the highest negative correlations with saffron cultivation areas, respectively. In addition, in classification of saffron cultivation areas, the discriminant analysis and support vector machine had higher accuracies. The correlation between saffron cultivation area and saffron yield values was relatively high (r=0.38. The nearest neighbor analysis had the best prediction accuracy for classification of cultivation areas. For this algorithm the coefficients of determination were 1 and 0.944 for training and testing stages, respectively. However, the algorithms accuracy for prediction of crop yield from climatological parameters was low (the average coefficients of determination equal to 0.48 and 0.05 for training and testing stages. The best algorithm i.e. nearest neighbor analysis had coefficients of determination equal to 1 and 0.177 for saffron yield prediction. Results showed that, using climatological parameters and data mining algorithms can classify cultivation areas. By this way it is possible

  6. Mining Research on Vibration Signal Association Rules of Quayside Container Crane Hoisting Motor Based on Apriori Algorithm

    Science.gov (United States)

    Yang, Chencheng; Tang, Gang; Hu, Xiong

    2017-07-01

    Shore-hoisting motor in the daily work will produce a large number of vibration signal data,in order to analyze the correlation among the data and discover the fault and potential safety hazard of the motor, the data are discretized first, and then Apriori algorithm are used to mine the strong association rules among the data. The results show that the relationship between day 1 and day 16 is the most closely related, which can guide the staff to analyze the work of these two days of motor to find and solve the problem of fault and safety.

  7. Data mining, mining data : energy consumption modelling

    Energy Technology Data Exchange (ETDEWEB)

    Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

    2007-09-15

    Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.

  8. Data Mining Web Services for Science Data Repositories

    Science.gov (United States)

    Graves, S.; Ramachandran, R.; Keiser, K.; Maskey, M.; Lynnes, C.; Pham, L.

    2006-12-01

    The maturation of web services standards and technologies sets the stage for a distributed "Service-Oriented Architecture" (SOA) for NASA's next generation science data processing. This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the Internet. NASA has initiated a project to create a suite of specialized data mining web services designed specifically for science data. The project leverages the Algorithm Development and Mining (ADaM) toolkit as its basis. The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by several research organizations and educational institutions worldwide. These mining services will give the scientific community a powerful and versatile data mining capability that can be used to create higher order products such as thematic maps from current and future NASA satellite data records with methods that are not currently available. The package of mining and related services are being developed using Web Services standards so that community-based measurement processing systems can access and interoperate with them. These standards-based services allow users different options for utilizing them, from direct remote invocation by a client application to deployment of a Business Process Execution Language (BPEL) solutions package where a complex data mining workflow is exposed to others as a single service. The ability to deploy and operate these services at a data archive allows the data mining algorithms to be run where the data are stored, a more efficient scenario than moving large amounts of data over the network. This will be demonstrated in a scenario in which a user uses a remote Web-Service-enabled clustering algorithm to create cloud masks from satellite imagery at the Goddard Earth Sciences Data and Information Services Center (GES DISC).

  9. Support-Less Association Rule Mining Using Tuple Count Cube

    OpenAIRE

    Qin Ding; William Perrizo

    2007-01-01

    Association rule mining is one of the important tasks in data mining and knowledge discovery (KDD). The traditional task of association rule mining is to find all the rules with high support and high confidence. In some applications, we are interested in finding high confidence rules even though the support may be low. This type of problem differs from the traditional association rule mining problem; hence, it is called support-less association rule mining. Existing algorithms for association...

  10. An Unsupervised Opinion Mining Approach for Japanese Weblog Reputation Information Using an Improved SO-PMI Algorithm

    Science.gov (United States)

    Wang, Guangwei; Araki, Kenji

    In this paper, we propose an improved SO-PMI (Semantic Orientation Using Pointwise Mutual Information) algorithm, for use in Japanese Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. When this algorithm was translated into Japanese naively, most phrases, whether positive or negative in meaning, received a negative SO. For dealing with this slanting phenomenon, we propose three improvements: to expand the reference words to sets of words, to introduce a balancing factor and to detect neutral expressions. In our experiments, the proposed improvements obtained a well-balanced result: both positive and negative accuracy exceeded 62%, when evaluated on 1,200 opinion sentences sampled from three different domains (reviews of Electronic Products, Cars and Travels from Kakaku. com). In a comparative experiment on the same corpus, a supervised approach (SA-Demo) achieved a very similar accuracy to our method. This shows that our proposed approach effectively adapted SO-PMI for Japanese, and it also shows the generality of SO-PMI.

  11. Improve Data Mining and Knowledge Discovery through the use of MatLab

    Science.gov (United States)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and

  12. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    OpenAIRE

    Hayden Wimmer; Loreen Powell

    2014-01-01

    While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM), a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Baye...

  13. Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling

    Science.gov (United States)

    Zhang, Wen-Ran

    2002-03-01

    An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.

  14. A Framework To Support Management Of HIVAIDS Using K-Means And Random Forest Algorithm

    Directory of Open Access Journals (Sweden)

    Gladys Iseu

    2017-06-01

    Full Text Available Healthcare industry generates large amounts of complex data about patients hospital resources disease management electronic patient records and medical devices among others. The availability of these huge amounts of medical data creates a need for powerful mining tools to support health care professionals in diagnosis treatment and management of HIVAIDS. Several data mining techniques have been used in management of different data sets. Data mining techniques have been categorized into regression algorithms segmentation algorithms association algorithms sequence analysis algorithms and classification algorithms. In the medical field there has not been a specific study that has incorporated two or more data mining algorithms hence limiting decision making levels by medical practitioners. This study identified the extent to which K-means algorithm cluster patient characteristics it has also evaluated the extent to which random forest algorithm can classify the data for informed decision making as well as design a framework to support medical decision making in the treatment of HIVAIDS related diseases in Kenya. The paper further used random forest classification algorithm to compute proximities between pairs of cases that can be used in clustering locating outliers or by scaling to give interesting views of the data.

  15. Application of Three Existing Stope Boundary Optimisation Methods in an Operating Underground Mine

    Science.gov (United States)

    Erdogan, Gamze; Yavuz, Mahmut

    2017-12-01

    The underground mine planning and design optimisation process have received little attention because of complexity and variability of problems in underground mines. Although a number of optimisation studies and software tools are available and some of them, in special, have been implemented effectively to determine the ultimate-pit limits in an open pit mine, there is still a lack of studies for optimisation of ultimate stope boundaries in underground mines. The proposed approaches for this purpose aim at maximizing the economic profit by selecting the best possible layout under operational, technical and physical constraints. In this paper, the existing three heuristic techniques including Floating Stope Algorithm, Maximum Value Algorithm and Mineable Shape Optimiser (MSO) are examined for optimisation of stope layout in a case study. Each technique is assessed in terms of applicability, algorithm capabilities and limitations considering the underground mine planning challenges. Finally, the results are evaluated and compared.

  16. DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.

    Science.gov (United States)

    Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan

    2018-04-01

    Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

  17. A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

    Directory of Open Access Journals (Sweden)

    Fengcai Qiao

    2018-02-01

    Full Text Available Frequent subgraph mining (FSM plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining, a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining algorithm by an order of magnitude for all datasets and can work with a lower support threshold.

  18. Expressive power of an algebra for data mining

    NARCIS (Netherlands)

    Calders, T.; Lakshmanan, L.V.S.; Ng, R.T.; Paredaens, J.

    2006-01-01

    The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can

  19. Advanced land mine detection using a synthesis of conventional technologies

    International Nuclear Information System (INIS)

    Rappaport, C.M.

    1998-01-01

    A team at Northeastern University develops and optimizes land mine detection based on ground-penetrating radar, infrared thermography, electromagnetic induction (EM), and high frequency acoustic sensors. It implements sophisticated, physics-based mathematical models to describe the interaction of EM or acoustic waves with mines buried in realistic (electromagnetically loose, inhomogeneous) soil and as a result develops signal processing algorithms to identify and classify mines. These mathematical models are derived from actual soil and land mine measurements, and include detection statistics of the sensors. The novel aspects of Northeastern University's approach are: (1) to combine multiple sensors synergistically, yielding more information than would be available to any single sencor technology operating alone, and (2) to use signal-processing algorithms derived from physics-based models which take into account the actual sensor parameters as well as material and electrical characteristics of the soil and land mines

  20. Cooperative organic mine avoidance path planning

    Science.gov (United States)

    McCubbin, Christopher B.; Piatko, Christine D.; Peterson, Adam V.; Donnald, Creighton R.; Cohen, David

    2005-06-01

    The JHU/APL Path Planning team has developed path planning techniques to look for paths that balance the utility and risk associated with different routes through a minefield. Extending on previous years' efforts, we investigated real-world Naval mine avoidance requirements and developed a tactical decision aid (TDA) that satisfies those requirements. APL has developed new mine path planning techniques using graph based and genetic algorithms which quickly produce near-minimum risk paths for complicated fitness functions incorporating risk, path length, ship kinematics, and naval doctrine. The TDA user interface, a Java Swing application that obtains data via Corba interfaces to path planning databases, allows the operator to explore a fusion of historic and in situ mine field data, control the path planner, and display the planning results. To provide a context for the minefield data, the user interface also renders data from the Digital Nautical Chart database, a database created by the National Geospatial-Intelligence Agency containing charts of the world's ports and coastal regions. This TDA has been developed in conjunction with the COMID (Cooperative Organic Mine Defense) system. This paper presents a description of the algorithms, architecture, and application produced.

  1. Transparent data mining for big and small data

    CERN Document Server

    Quercia, Daniele; Pasquale, Frank

    2017-01-01

    This book focuses on new and emerging data mining solutions that offer a greater level of transparency than existing solutions. Transparent data mining solutions with desirable properties (e.g. effective, fully automatic, scalable) are covered in the book. Experimental findings of transparent solutions are tailored to different domain experts, and experimental metrics for evaluating algorithmic transparency are presented. The book also discusses societal effects of black box vs. transparent approaches to data mining, as well as real-world use cases for these approaches. As algorithms increasingly support different aspects of modern life, a greater level of transparency is sorely needed, not least because discrimination and biases have to be avoided. With contributions from domain experts, this book provides an overview of an emerging area of data mining that has profound societal consequences, and provides the technical background to for readers to contribute to the field or to put existing approaches to prac...

  2. WEKA-G: Parallel data mining on computational grids

    Directory of Open Access Journals (Sweden)

    PIMENTA, A.

    2009-12-01

    Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.

  3. Uncertainty modeling for data mining a label semantics approach

    CERN Document Server

    Qin, Zengchang

    2014-01-01

    Outlining a new research direction in fuzzy set theory applied to data mining, this volume proposes a number of new data mining algorithms and includes dozens of figures and illustrations that help the reader grasp the complexities of the concepts.

  4. Improving clinical decision support using data mining techniques

    Science.gov (United States)

    Burn-Thornton, Kath E.; Thorpe, Simon I.

    1999-02-01

    Physicians, in their ever-demanding jobs, are looking to decision support systems for aid in clinical diagnosis. However, clinical decision support systems need to be of sufficiently high accuracy that they help, rather than hinder, the physician in his/her diagnosis. Decision support systems with accuracies, of patient state determination, of greater than 80 percent, are generally perceived to be sufficiently accurate to fulfill the role of helping the physician. We have previously shown that data mining techniques have the potential to provide the underpinning technology for clinical decision support systems. In this paper, an extension of the work in reverence 2, we describe how changes in data mining methodologies, for the analysis of 12-lead ECG data, improve the accuracy by which data mining algorithms determine which patients are suffering from heart disease. We show that the accuracy of patient state prediction, for all the algorithms, which we investigated, can be increased by up to 6 percent, using the combination of appropriate test training ratios and 5-fold cross-validation. The use of cross-validation greater than 5-fold, appears to reduce the improvement in algorithm classification accuracy gained by the use of this validation method. The accuracy of 84 percent in patient state predictions, obtained using the algorithm OCI, suggests that this algorithm will be capable of providing the required accuracy for clinical decision support systems.

  5. Web multimedia information retrieval using improved Bayesian algorithm.

    Science.gov (United States)

    Yu, Yi-Jun; Chen, Chun; Yu, Yi-Min; Lin, Huai-Zhong

    2003-01-01

    The main thrust of this paper is application of a novel data mining approach on the log of user's feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author's expression and the user's understanding and expectation. User space model was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the authors' proposed algorithm was efficient.

  6. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  7. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  8. INFORMAÇÕES SOBRE GESTÃO DE RISCOS NAS IANs DAS EMPRESAS LISTADAS NO NOVO MERCADO DA BOVESPA

    Directory of Open Access Journals (Sweden)

    Francisco Carlos Fernandes

    2008-01-01

    Full Text Available Las mejores prácticas de gobierno corporativo previenen que la gestión de la empresa conozca los riesgos asumidos y que los inversionistas sean informados sobre ellos. Este trabajo tiene como objetivo levantar las prácticas de divulgación sobre riesgos adoptadas en las Informaciones Anuales (IANs de las empresas. Se presenta un análisis del con - tenido de estos instrumentos de comunicación de las empresas clasificadas en el nivel de gobierno del Nuevo Mercado de la Bovespa. La metodología de la pesquisa aplicada se caracteriza como descriptiva y cualitativa, empleando como método el análisis docu - mental. Los resultados muestran que 73 de las 99 empresas del Nuevo Mercado habían divulgado sus IANs hasta 31.05.08, siendo que 12 de ellas nada relatan sobre su políticas de gestión de riesgos y de las 61 restantes apenas 6 presentan un apartado especifico sobre gestión de riesgo en sus informes. El análisis de contenido muestra que las empre-sas que adoptan las mejores prácticas ofrecen diversas informaciones sobre gestión de riesgo, incluyendo estructura organizacional, técnicas utilizadas y políticas de protección. Se concluye que los reportes sobre riesgos de la mayoría de las empresas aún presentan niveles bajos de evidencia.

  9. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  10. Collaborative mining and transfer learning for relational data

    Science.gov (United States)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  11. Extracting software static defect models using data mining

    Directory of Open Access Journals (Sweden)

    Ahmed H. Yousef

    2015-03-01

    Full Text Available Large software projects are subject to quality risks of having defective modules that will cause failures during the software execution. Several software repositories contain source code of large projects that are composed of many modules. These software repositories include data for the software metrics of these modules and the defective state of each module. In this paper, a data mining approach is used to show the attributes that predict the defective state of software modules. Software solution architecture is proposed to convert the extracted knowledge into data mining models that can be integrated with the current software project metrics and bugs data in order to enhance the prediction. The results show better prediction capabilities when all the algorithms are combined using weighted votes. When only one individual algorithm is used, Naïve Bayes algorithm has the best results, then the Neural Network and the Decision Trees algorithms.

  12. An AK-LDMeans algorithm based on image clustering

    Science.gov (United States)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  13. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Concepts, Diagnosis and the History of Medicine: Historicising Ian Hacking and Munchausen Syndrome.

    Science.gov (United States)

    Millard, Chris

    2017-08-01

    Concepts used by historians are as historical as the diagnoses or categories that are studied. The example of Munchausen syndrome (deceptive presentation of illness in order to adopt the 'sick role') is used to explore this. Like most psychiatric diagnoses, Munchausen syndrome is not thought applicable across time by social historians of medicine. It is historically specific, drawing upon twentieth-century anthropology and sociology to explain motivation through desire for the 'sick role'. Ian Hacking's concepts of 'making up people' and 'looping effects' are regularly utilised outside of the context in which they are formed. However, this context is precisely the same anthropological and sociological insight used to explain Munchausen syndrome. It remains correct to resist the projection of Munchausen syndrome into the past. However, it seems inconsistent to use Hacking's concepts to describe identity formation before the twentieth century as they are given meaning by an identical context.

  15. HSM: Heterogeneous Subspace Mining in High Dimensional Data

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Seidl, Thomas

    2009-01-01

    Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional...... challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant...... for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines...

  16. Information mining in remote sensing imagery

    Science.gov (United States)

    Li, Jiang

    The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and

  17. Mining top-k frequent closed itemsets in data streams using sliding window

    International Nuclear Information System (INIS)

    Rehman, Z.; Shahbaz, M.

    2013-01-01

    Frequent itemset mining has become a popular research area in data mining community since the last few years. T here are two main technical hitches while finding frequent itemsets. First, to provide an appropriate minimum support value to start and user need to tune this minimum support value by running the algorithm again and again. Secondly, generated frequent itemsets are mostly numerous and as a result a number of association rules generated are also very large in numbers. Applications dealing with streaming environment need to process the data received at high rate, therefore, finding frequent itemsets in data streams becomes complex. In this paper, we present an algorithm to mine top-k frequent closed itemsets using sliding window approach from streaming data. We developed a single-pass algorithm to find frequent closed itemsets of length between user's defined minimum and maximum- length. To improve the performance of algorithm and to avoid rescanning of data, we have transformed data into bitmap based tree data structure. (author)

  18. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Advances in Machine Learning and Data Mining for Astronomy

    Science.gov (United States)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  20. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    Science.gov (United States)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  1. Score Mining Rents in Terms of Investment Attractiveness of Peat Mining

    Science.gov (United States)

    Alexandrov, Gennady; Yablonev, Alexander

    2017-11-01

    In this article, as determinants in the system factors underlying the investment attractiveness of the peat industry is considered a rental factor, which predetermines the significant differences and peculiarities of the investment climate in the mining business and, in particular, in the sphere of peat mining. In contrast to modern studies treated the essence and role of rents in the economic mechanism, is proposed for a new approach to solving the problems of its formation. Our approach differs in that it, firstly, adequate rental relations, objectively in extractive industries, secondly, provides consensus in the interests of the owner of peat deposits and entrepreneurs, businesses in these deposits and, thus, thirdly, contributes to the creation of a favourable investment climate in the peat extraction industry. In practical terms, in accordance with the proposed approach, we have proposed specific allocation algorithm of mining rents from the profits of peat extraction enterprises.

  2. Analysis Of Data Mining For Car Sales Sparepart Using Apriori Algorithm (Case Study: PT. IDK 1 FIELD

    Directory of Open Access Journals (Sweden)

    Khairul Ummi

    2016-10-01

    Full Text Available PT. IDK 1 is one of the branch offices honda car dealership that sells various types of variants honda matic or manual car and motorcycle parts. Any sales or goods sold will be performed by inputting the database directly connected directly to the central office. But PT. IDK 1 do not know a couple items frequently purchased parts simultaneously. When the stock of spare parts which amount is low, the office is only asking them to send the stock of spare parts from the central office without knowing that the other parts if the parts were purchased then the other parts were also purchased. It was considered difficult when restocking of goods because of the many types of auto parts. Data mining techniques have been widely used to solve the existing problems with the implementation of the algorithm one A-Priori to obtain information about the association between the product of a database transaction. Sales transaction data honda car parts at PT. IDK 1 can be reprocessed using data mining applications resulting association rules is a strong link between itemset sales of spare parts so that it can provide recommendations and facilitate restocking items in the arrangement or placement of goods related to a strong interdependence.

  3. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2018-05-01

    Full Text Available As the application of a coal mine Internet of Things (IoT, mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  4. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation.

    Science.gov (United States)

    Wang, Gang; Zhao, Zhikai; Ning, Yongjie

    2018-05-28

    As the application of a coal mine Internet of Things (IoT), mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH) is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  5. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    Science.gov (United States)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.

  6. EVALUATION OF WEB SEARCHING METHOD USING A NOVEL WPRR ALGORITHM FOR TWO DIFFERENT CASE STUDIES

    Directory of Open Access Journals (Sweden)

    V. Lakshmi Praba

    2012-04-01

    Full Text Available The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to web data and documents. Web content mining and web structure mining have important roles in identifying the relevant web page. Relevancy of web page denotes how well a retrieved web page or set of web pages meets the information need of the user. Page Rank, Weighted Page Rank and Hypertext Induced Topic Selection (HITS are existing algorithms which considers only web structure mining. Vector Space Model (VSM, Cover Density Ranking (CDR, Okapi similarity measurement (Okapi and Three-Level Scoring method (TLS are some of existing relevancy score methods which consider only web content mining. In this paper, we propose a new algorithm, Weighted Page with Relevant Rank (WPRR which is blend of both web content mining and web structure mining that demonstrates the relevancy of the page with respect to given query for two different case scenarios. It is shown that WPRR’s performance is better than the existing algorithms.

  7. Protective and control relays as coal-mine power-supply ACS subsystem

    Science.gov (United States)

    Kostin, V. N.; Minakova, T. E.

    2017-10-01

    The paper presents instantaneous selective short-circuit protection for the cabling of the underground part of a coal mine and central control algorithms as a Coal-Mine Power-Supply ACS Subsystem. In order to improve the reliability of electricity supply and reduce the mining equipment down-time, a dual channel relay protection and central control system is proposed as a subsystem of the coal-mine power-supply automated control system (PS ACS).

  8. A framework for query optimization to support data mining

    NARCIS (Netherlands)

    S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

    1996-01-01

    textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization

  9. Contemporary computational mathematics a celebration of the 80th birthday of Ian Sloan

    CERN Document Server

    Kuo, Frances; Woźniakowski, Henryk

    2018-01-01

    This book is a tribute to Professor Ian Hugh Sloan on the occasion of his 80th birthday. It consists of nearly 60 articles written by international leaders in a diverse range of areas in contemporary computational mathematics. These papers highlight the impact and many achievements of Professor Sloan in his distinguished academic career. The book also presents state of the art knowledge in many computational fields such as quasi-Monte Carlo and Monte Carlo methods for multivariate integration, multi-level methods, finite element methods, uncertainty quantification, spherical designs and integration on the sphere, approximation and interpolation of multivariate functions, oscillatory integrals, and in general in information-based complexity and tractability, as well as in a range of other topics. The book also tells the life story of the renowned mathematician, family man, colleague and friend, who has been an inspiration to many of us. The reader may especially enjoy the story from the perspective of his fami...

  10. The Relationship between History and Ethics in Ian McEwan’s Black Dogs

    Directory of Open Access Journals (Sweden)

    Maryam Sedaghat

    2014-06-01

    Full Text Available The relationship between history and ethics may seem irrelevant at first; however, these two have been related during the long history of war, violence and mass killing. The need of history to ethics is for saving itself from all the violence and terror. Emmanuel Levinas as a philosopher has tried to define ethics in a way that suits the terrible historical condition of humanity in the twentieth century. In his view, ethics is the infinite responsibility towards other human beings. He defines ‘being’ in relation to the ‘other’ who may be a complete stranger. In this definition a person bears complete responsibility toward the other and should answer the other’s call for help. In Ian McEwan's novel Black Dogs the protagonist is exposed to historical legacies of violence, and develops an ethical consciousness until the end of the novel. Responsibility seems to be a good answer to historical mass killing and violence that is dominant in the world.

  11. Anchor-Free Localization Method for Mobile Targets in Coal Mine Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Xiao Xu

    2009-04-01

    Full Text Available Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes’ location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines.

  12. Anchor-free localization method for mobile targets in coal mine wireless sensor networks.

    Science.gov (United States)

    Pei, Zhongmin; Deng, Zhidong; Xu, Shuo; Xu, Xiao

    2009-01-01

    Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS) and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes' location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines.

  13. Applying Supervised Opinion Mining Techniques on Online User Reviews

    Directory of Open Access Journals (Sweden)

    Ion SMEUREANU

    2012-01-01

    Full Text Available In recent years, the spectacular development of web technologies, lead to an enormous quantity of user generated information in online systems. This large amount of information on web platforms make them viable for use as data sources, in applications based on opinion mining and sentiment analysis. The paper proposes an algorithm for detecting sentiments on movie user reviews, based on naive Bayes classifier. We make an analysis of the opinion mining domain, techniques used in sentiment analysis and its applicability. We implemented the proposed algorithm and we tested its performance, and suggested directions of development.

  14. Analysing Customer Opinions with Text Mining Algorithms

    Science.gov (United States)

    Consoli, Domenico

    2009-08-01

    Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.

  15. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    Science.gov (United States)

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  16. Improvements in seismic event locations in a deep western U.S. coal mine using tomographic velocity models and an evolutionary search algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Adam Lurka; Peter Swanson [Central Mining Institute, Katowice (Poland)

    2009-09-15

    Methods of improving seismic event locations were investigated as part of a research study aimed at reducing ground control safety hazards. Seismic event waveforms collected with a 23-station three-dimensional sensor array during longwall coal mining provide the data set used in the analyses. A spatially variable seismic velocity model is constructed using seismic event sources in a passive tomographic method. The resulting three-dimensional velocity model is used to relocate seismic event positions. An evolutionary optimization algorithm is implemented and used in both the velocity model development and in seeking improved event location solutions. Results obtained using the different velocity models are compared. The combination of the tomographic velocity model development and evolutionary search algorithm provides improvement to the event locations. 13 refs., 5 figs., 4 tabs.

  17. Efficient constraint-based Sequential Pattern Mining (SPM algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

    Directory of Open Access Journals (Sweden)

    Niti Ashish Kumar Desai

    2015-12-01

    Full Text Available Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user’s interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency, active for certain time span (Compactness, profitable and indicative of next timeline for purchase (Length―Item―Gap. The article also attempts to throw light on how proposed Constraint-based Prefix Span algorithm is helpful to understand buying behaviour of customer which is in formative stage.

  18. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  19. Biomedical text mining and its applications in cancer research.

    Science.gov (United States)

    Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

    2013-04-01

    Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.

  20. Mathematics of an automatic control system for ventilation of gassy coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Puchkov, L.A.; Bakhvalov, L.A.; Kravchenko, A.G.

    1987-09-01

    Describes and presents a circuit diagram of an automatic control system introduced to control ventilation in the Kommunist mine belonging to the Oktyabr'ugol' coal mining association. The system comprises: sensors to register the parameters of the mine atmosphere (e.g. methane and air flow rate); communications channels and remote control devices to convert and transmit the data; a CM-4 computer with a high-speed processor, an 128-256 kByte operating memory, external memory devices, polydiaphragm air flow controllers, devices for controlling the electric drive of the main ventilation system, devices for collecting, processing and displaying the data. This system uses two groups of algorithms: algorithms for a data subsystem responsible for centralized control of the mine atmosphere parameters and a control subsystem which forms and implements the necessary control commands. The main software is the DISMAIN program. Introducing this system increased the productivity of the mine by 2%, reduced energy consumption by 5-7% and increased safety levels. 2 refs

  1. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2010-07-01

    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public

  2. Data Mining at NASA: From Theory to Applications

    Science.gov (United States)

    Srivastava, Ashok N.

    2009-01-01

    This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.

  3. Data mining for the social sciences an introduction

    CERN Document Server

    Attewell, Paul

    2015-01-01

    We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining

  4. Data mining in soft computing framework: a survey.

    Science.gov (United States)

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  5. Prediction and Analysis of students Behavior using BARC Algorithm

    OpenAIRE

    M.Sindhuja; Dr.S.Rajalakshmi; S.M.Nandagopal

    2013-01-01

    Educational Data mining is a recent trends where data mining methods are experimented for the improvement of student performance in academics. The work describes the mining of higher education students’ related attributes such as behavior, attitude and relationship. The data were collected from a higher education institution in terms of the mentioned attributes. The proposed work explored Behavior Attitude Relationship Clustering (BARC) Algorithm, which showed the improvement in students’ per...

  6. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    Science.gov (United States)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  7. Data preprocessing in data mining

    CERN Document Server

    García, Salvador; Herrera, Francisco

    2015-01-01

    Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying t...

  8. A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms

    Directory of Open Access Journals (Sweden)

    Ming Dong

    2010-01-01

    Full Text Available The primary objective of engineering asset management is to optimize assets service delivery potential and to minimize the related risks and costs over their entire life through the development and application of asset health and usage management in which the health and reliability prediction plays an important role. In real-life situations where an engineering asset operates under dynamic operational and environmental conditions, the lifetime of an engineering asset is generally described as monitored nonlinear time-series data and subject to high levels of uncertainty and unpredictability. It has been proved that application of data mining techniques is very useful for extracting relevant features which can be used as parameters for assets diagnosis and prognosis. In this paper, a tutorial on nonlinear time-series data mining in engineering asset health and reliability prediction is given. Besides that an overview on health and reliability prediction techniques for engineering assets is covered, this tutorial will focus on concepts, models, algorithms, and applications of hidden Markov models (HMMs and hidden semi-Markov models (HSMMs in engineering asset health prognosis, which are representatives of recent engineering asset health prediction techniques.

  9. Mining Hesitation Information by Vague Association Rules

    Science.gov (United States)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  10. Towards the generic framework for utility considerations in data mining research

    NARCIS (Netherlands)

    Puuronen, S.; Pechenizkiy, M.; Soares, C.; Ghani, R.

    2010-01-01

    Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is

  11. Mapping Changes in a Recovering Mine Site with Hyperspectral Airborne HyMap Imagery (Sotiel, SW Spain

    Directory of Open Access Journals (Sweden)

    Jorge Buzzi

    2014-04-01

    Full Text Available Hyperspectral high spatial resolution HyMap data are used to map mine waste from massive sulfide ore deposits, mostly abandoned, on the Iberian Pyrite Belt (southwest Spain. Mine dams, mill tailings and mine dumps in variable states of pyrite oxidation are recognizable. The interpretation of hyperspectral remote sensing requires specific algorithms able to manage high dimensional data compared to multispectral data. The routine of image processing methods used to extract information from hyperspectral data to map geological features is explained, as well as the sequence of algorithms used to produce maps of the mine sites. The mineralogical identification capability of algorithms to produce maps based on archive spectral libraries is discussed. Trends of mineral growth differ spectrally over time according to the geological setting and the recovery state of the mine site. Subtle mineralogical changes are enhanced using the spectral response as indicators of pyrite oxidation intensity of the mine waste piles and pyrite mud tailings. The changes in the surface of the mill tailings deserve a detailed description, as the surfaces are inaccessible to direct observation. Such mineralogical changes respond faithfully to industrial activities or the influence of climate when undisturbed by human influence.

  12. A Dirty Hero’s Fight for Clean Energy: Satire, Allegory, and Risk Narrative in Ian McEwan’s Solar

    OpenAIRE

    Evi Zemanek

    2012-01-01

    A diferencia de la mayoría de los textos de ficción que narran crisis écologicas, la novela "Solar" (2010) de Ian McEwan, celebrada como "el libro sobre el cambio climático", no esboza un escenario apocalíptico que culmina en una catástrofe colectiva. En lugar de eso, se burla a nivel del discurso de la actual retórica del riesgo, al mismo tiempo que pone en escena la desastrosa gestión de riesgos de su protagonista utilizando el género de la sátira. Mientras que las descr...

  13. Classification algorithm of Web document in ionization radiation

    International Nuclear Information System (INIS)

    Geng Zengmin; Liu Wanchun

    2005-01-01

    Resources in the Internet is numerous. It is one of research directions of Web mining (WM) how to mine the resource of some calling or trade more efficiently. The paper studies the classification of Web document in ionization radiation (IR) based on the algorithm of Bayes, Rocchio, Widrow-Hoff, and analyses the result of trial effect. (authors)

  14. Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

    Science.gov (United States)

    Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

    2018-03-01

    A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.

  15. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    Science.gov (United States)

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  16. Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

    Energy Technology Data Exchange (ETDEWEB)

    Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.

    2017-12-11

    Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). We explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.

  17. Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

    Directory of Open Access Journals (Sweden)

    Jin Qi

    2015-01-01

    Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.

  18. Data mining algorithms for land cover change detection: a review

    Indian Academy of Sciences (India)

    Sangram Panigrahi

    2017-11-24

    Nov 24, 2017 ... values, poor quality measurement, high resolution and high dimensional data. The land cover .... These data sets also include quality assurance information, ...... 2012 A new data mining framework for forest fire mapping.

  19. Statistically significant relational data mining :

    Energy Technology Data Exchange (ETDEWEB)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  20. Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Chun-Wei Lin

    2014-01-01

    Full Text Available Data mining is used to mine meaningful and useful information or knowledge from a very large database. Some secure or private information can be discovered by data mining techniques, thus resulting in an inherent risk of threats to privacy. Privacy-preserving data mining (PPDM has thus arisen in recent years to sanitize the original database for hiding sensitive information, which can be concerned as an NP-hard problem in sanitization process. In this paper, a compact prelarge GA-based (cpGA2DT algorithm to delete transactions for hiding sensitive itemsets is thus proposed. It solves the limitations of the evolutionary process by adopting both the compact GA-based (cGA mechanism and the prelarge concept. A flexible fitness function with three adjustable weights is thus designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets with minimal side effects of hiding failure, missing cost, and artificial cost. Experiments are conducted to show the performance of the proposed cpGA2DT algorithm compared to the simple GA-based (sGA2DT algorithm and the greedy approach in terms of execution time and three side effects.

  1. Air Pollution Monitoring and Mining Based on Sensor Grid in London.

    Science.gov (United States)

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-06-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  2. A survey on Big Data Stream Mining

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... huge amount of stream like telecommunication systems. So, there ... streams have many challenges for data mining algorithm design like using of ..... A. Bifet and R. Gavalda, "Learning from Time-Changing Data with. Adaptive ...

  3. Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test results analysis module of team psychological health assessment system, and will extract the relationship between each psychological impact factor, so as to provide certain guidance for psychologists in their mental illness treatment.

  4. Data-Throughput Enhancement Using Data Mining-Informed Cognitive Radio

    Directory of Open Access Journals (Sweden)

    Khashayar Kotobi

    2015-03-01

    Full Text Available We propose the data mining-informed cognitive radio, which uses non-traditional data sources and data-mining techniques for decision making and improving the performance of a wireless network. To date, the application of information other than wireless channel data in cognitive radios has not been significantly studied. We use a novel dataset (Twitter traffic as an indicator of network load in a wireless channel. Using this dataset, we present and test a series of predictive algorithms that show an improvement in wireless channel utilization over traditional collision-detection algorithms. Our results demonstrate the viability of using these novel datasets to inform and create more efficient cognitive radio networks.

  5. A New Fast Vertical Method for Mining Frequent Patterns

    Directory of Open Access Journals (Sweden)

    Zhihong Deng

    2010-12-01

    Full Text Available Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.

  6. Kernel Methods for Mining Instance Data in Ontologies

    Science.gov (United States)

    Bloehdorn, Stephan; Sure, York

    The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.

  7. Physics Mining of Multi-source Data Sets, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to implement novel physics mining algorithms with analytical capabilities to derive diagnostic and prognostic numerical models from multi-source...

  8. EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration

    Energy Technology Data Exchange (ETDEWEB)

    2015-01-16

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution, diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'

  9. The Smallest Valid Extension-Based Efficient, Rare Graph Pattern Mining, Considering Length-Decreasing Support Constraints and Symmetry Characteristics of Graphs

    Directory of Open Access Journals (Sweden)

    Unil Yun

    2016-05-01

    Full Text Available Frequent graph mining has been proposed to find interesting patterns (i.e., frequent sub-graphs from databases composed of graph transaction data, which can effectively express complex and large data in the real world. In addition, various applications for graph mining have been suggested. Traditional graph pattern mining methods use a single minimum support threshold factor in order to check whether or not mined patterns are interesting. However, it is not a sufficient factor that can consider valuable characteristics of graphs such as graph sizes and features of graph elements. That is, previous methods cannot consider such important characteristics in their mining operations since they only use a fixed minimum support threshold in the mining process. For this reason, in this paper, we propose a novel graph mining algorithm that can consider various multiple, minimum support constraints according to the types of graph elements and changeable minimum support conditions, depending on lengths of graph patterns. In addition, the proposed algorithm performs in mining operations more efficiently because it can minimize duplicated operations and computational overheads by considering symmetry features of graphs. Experimental results provided in this paper demonstrate that the proposed algorithm outperforms previous mining approaches in terms of pattern generation, runtime and memory usage.

  10. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    Science.gov (United States)

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  11. A Recommendation Algorithm for Automating Corollary Order Generation

    Science.gov (United States)

    Klann, Jeffrey; Schadow, Gunther; McCoy, JM

    2009-01-01

    Manual development and maintenance of decision support content is time-consuming and expensive. We explore recommendation algorithms, e-commerce data-mining tools that use collective order history to suggest purchases, to assist with this. In particular, previous work shows corollary order suggestions are amenable to automated data-mining techniques. Here, an item-based collaborative filtering algorithm augmented with association rule interestingness measures mined suggestions from 866,445 orders made in an inpatient hospital in 2007, generating 584 potential corollary orders. Our expert physician panel evaluated the top 92 and agreed 75.3% were clinically meaningful. Also, at least one felt 47.9% would be directly relevant in guideline development. This automated generation of a rough-cut of corollary orders confirms prior indications about automated tools in building decision support content. It is an important step toward computerized augmentation to decision support development, which could increase development efficiency and content quality while automatically capturing local standards. PMID:20351875

  12. Predicting mining activity with parallel genetic algorithms

    Science.gov (United States)

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  13. IMPROVING THE ORGANIZATION OF THE SHOVEL-TRUCK SYSTEMS IN OPEN-PIT COAL MINES

    Directory of Open Access Journals (Sweden)

    Mark KORYAGIN

    2017-06-01

    Full Text Available The aim of the study is to reduce idle times of mining trucks and shovels in an open-pit coal mine. A heuristic algorithm for making dispatching decisions in conditions of dynamic allocation of trucks is developed. Priority parameters for choosing the shovel after the end-of-truck unloading are introduced. Also, an algorithm for searching for the optimal priority parameters to satisfy the required efficiency criterion is developed. This algorithm is based on a simulation model of a shovel-truck system. The proposed approach is applicable in terms of the group of shovels with a common dump point in various open-pit coal mines. The importance of this work lies in the fact that the proposed model takes into account the random factors related with the duration of loading and dumping, truck movement, repair of shovels and haul trucks, as well as the duration of periods between repairs.

  14. Application and Exploration of Big Data Mining in Clinical Medicine.

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-03-20

    To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.

  15. Data Mining Learning Models and Algorithms on a Scada System Data Repository

    Directory of Open Access Journals (Sweden)

    Mircea Rîşteiu

    2010-06-01

    Full Text Available This paper presents three data mining techniques applied
    on a SCADA system data repository: Naijve Bayes, k-Nearest Neighbor and Decision Trees. A conclusion that k-Nearest Neighbor is a suitable method to classify the large amount of data considered is made finally according to the mining result and its reasonable explanation. The experiments are built on the training data set and evaluated using the new test set with machine learning tool WEKA.

  16. Standard values of quality and ore mining costs in management of multi-plant mining company

    Energy Technology Data Exchange (ETDEWEB)

    Kudelko, Jan [KGHM CUPRUM Research and Development Center, Wroclaw (Poland); Wirth, Herbert [KGHM Polska Miedz S.A., Lubin (Poland)

    2010-03-15

    Profitability of copper deposit mining depends on three basic variables, electrolytic copper price, manufacturing and selling costs of copper and company property involved in production process. If the company property is adjusted to its tasks then the mining profiability depends on costs of copper mining and selling, because the price is the external variable defined by the market. We can shape the costs in two (complementary) ways, traditionally, reducing the labor, material and power consumption, and by adjusting the quality of mined ore (copper content) to the level required by the current copper prices. Required quality of copper ore in the whole company we determine according to the accepted profitability criteria and then we determine quality standard for individual mines. Algorithms determining the ore quality standard resulting from current market price of copper are presented in the paper. Calculation models for the mined ore quality standards, unit mining costs per one ton of copper, electrolytic copper production and ore output are given. Standards were established for one variable assuming that the other variables are determined in this calculation. Innovative solution, presented in the paper, is the method of decomposition of the company controllable variables into the tasks for individual mines providing reaching the targets to the whole technological circuit. Using the models, having relatively few data, it will be possible to calculate quickly the values which are interesting for managers such as for example the prognosis of rate of return (economic or operational), required copper content in the mined ore for the whole company and individual mines at given rate of return or boundary level of copper content in comparison with cost and production level. Examples of calculation are provided. (orig.)

  17. Step-by-Step Model for the Study of the Apriori Algorithm for Predictive Analysis

    Directory of Open Access Journals (Sweden)

    Daniel Grigore ROŞCA

    2015-06-01

    Full Text Available The goal of this paper was to develop an educational oriented application based on the Data Mining Apriori Algorithm which facilitates both the research and the study of data mining by graduate students. The application could be used to discover interesting patterns in the corpus of data and to measure the impact on the speed of execution as a function of problem constraints (value of support and confidence variables or size of the transactional data-base. The paper presents a brief overview of the Apriori Algorithm, aspects about the implementation of the algorithm using a step-by-step process, a discussion of the education-oriented user interface and the process of data mining of a test transactional data base. The impact of some constraints on the speed of the algorithm is also experimentally measured without a systematic review of different approaches to increase execution speed. Possible applications of the implementation, as well as its limits, are briefly reviewed.

  18. Predicting Students’ Performance using Modified ID3 Algorithm

    OpenAIRE

    Ramanathan L; Saksham Dhanda; Suresh Kumar D

    2013-01-01

    The ability to predict performance of students is very crucial in our present education system. We can use data mining concepts for this purpose. ID3 algorithm is one of the famous algorithms present today to generate decision trees. But this algorithm has a shortcoming that it is inclined to attributes with many values. So , this research aims to overcome this shortcoming of the algorithm by using gain ratio(instead of information gain) as well as by giving weights to each attribute at every...

  19. Quick fuzzy backpropagation algorithm.

    Science.gov (United States)

    Nikov, A; Stoeva, S

    2001-03-01

    A modification of the fuzzy backpropagation (FBP) algorithm called QuickFBP algorithm is proposed, where the computation of the net function is significantly quicker. It is proved that the FBP algorithm is of exponential time complexity, while the QuickFBP algorithm is of polynomial time complexity. Convergence conditions of the QuickFBP, resp. the FBP algorithm are defined and proved for: (1) single output neural networks in case of training patterns with different targets; and (2) multiple output neural networks in case of training patterns with equivalued target vector. They support the automation of the weights training process (quasi-unsupervised learning) establishing the target value(s) depending on the network's input values. In these cases the simulation results confirm the convergence of both algorithms. An example with a large-sized neural network illustrates the significantly greater training speed of the QuickFBP rather than the FBP algorithm. The adaptation of an interactive web system to users on the basis of the QuickFBP algorithm is presented. Since the QuickFBP algorithm ensures quasi-unsupervised learning, this implies its broad applicability in areas of adaptive and adaptable interactive systems, data mining, etc. applications.

  20. Simple, Scalable, Script-based, Science Processor for Measurements - Data Mining Edition (S4PM-DME)

    Science.gov (United States)

    Pham, L. B.; Eng, E. K.; Lynnes, C. S.; Berrick, S. W.; Vollmer, B. E.

    2005-12-01

    The S4PM-DME is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web-based data mining environment. The S4PM-DME replaces the Near-line Archive Data Mining (NADM) system with a better web environment and a richer set of production rules. S4PM-DME enables registered users to submit and execute custom data mining algorithms. The S4PM-DME system uses the GES DAAC developed Simple Scalable Script-based Science Processor for Measurements (S4PM) to automate tasks and perform the actual data processing. A web interface allows the user to access the S4PM-DME system. The user first develops personalized data mining algorithm on his/her home platform and then uploads them to the S4PM-DME system. Algorithms in C and FORTRAN languages are currently supported. The user developed algorithm is automatically audited for any potential security problems before it is installed within the S4PM-DME system and made available to the user. Once the algorithm has been installed the user can promote the algorithm to the "operational" environment. From here the user can search and order the data available in the GES DAAC archive for his/her science algorithm. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the GES DAAC archive. The generated mined data products are then made available for FTP pickup. The benefits of using S4PM-DME are 1) to decrease the downloading time it typically takes a user to transfer the GES DAAC data to his/her system thus off-load the heavy network traffic, 2) to free-up the load on their system, and last 3) to utilize the rich and abundance ocean, atmosphere data from the MODIS and AIRS instruments available from the GES DAAC.

  1. Optimal sampling strategy for data mining

    International Nuclear Information System (INIS)

    Ghaffar, A.; Shahbaz, M.; Mahmood, W.

    2013-01-01

    Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)

  2. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    Science.gov (United States)

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-01-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a two-layer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm. PMID:27879895

  3. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    Directory of Open Access Journals (Sweden)

    John Hassard

    2008-06-01

    Full Text Available In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  4. Near-line Archive Data Mining at the Goddard Distributed Active Archive Center

    Science.gov (United States)

    Pham, L.; Mack, R.; Eng, E.; Lynnes, C.

    2002-12-01

    NASA's Earth Observing System (EOS) is generating immense volumes of data, in some cases too much to provide to users with data-intensive needs. As an alternative to moving the data to the user and his/her research algorithms, we are providing a means to move the algorithms to the data. The Near-line Archive Data Mining (NADM) system is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web data mining portal to the EOS Data and Information System (EOSDIS) data pool, a 50-TB online disk cache. The NADM web portal enables registered users to submit and execute data mining algorithm codes on the data in the EOSDIS data pool. A web interface allows the user to access the NADM system. The users first develops personalized data mining code on their home platform and then uploads them to the NADM system. The C, FORTRAN and IDL languages are currently supported. The user developed code is automatically audited for any potential security problems before it is installed within the NADM system and made available to the user. Once the code has been installed the user is provided a test environment where he/she can test the execution of the software against data sets of the user's choosing. When the user is satisfied with the results, he/she can promote their code to the "operational" environment. From here the user can interactively run his/her code on the data available in the EOSDIS data pool. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the EOSDIS data pool. The generated mined data products are then made available for FTP pickup. The NADM system uses the GES DAAC-developed Simple Scalable Script-based Science Processor (S4P) to automate tasks and perform the actual data processing. Users will also have the option of selecting a DAAC-provided data mining algorithm and using it to process the data of their choice.

  5. En torno a la construcción social de la locura: Ian Hacking y la historia cultural de la psiquiatría

    OpenAIRE

    Huertas, Rafael

    2011-01-01

    El objetivo de este trabajo es analizar la contribución del filósofo de la ciencia Ian Hacking a la historia cultural de la psiquiatría. Partiendo de conceptos propuestos por el autor, como "enfermedad mental transitoria" o "inventar/construir gente", se reflexiona en torno a la construcción socio-cultural de la enfermedad mental. Se examinan y discuten los dos estudios de caso propuestos por Hacking: la fuga disociativa y la personalidad múltiple, identificando las fortalezas y debilidades d...

  6. Landuse change detection in a surface coal mine area using multi-temporal high resolution satellite images

    Energy Technology Data Exchange (ETDEWEB)

    Demirel, N.; Duzgun, S.; Kemal Emil, M. [Middle East Technical Univ., Ankara (Turkey). Dept. of Mining Engineering

    2010-07-01

    Changes in the landcover and landuse of a mine area can be caused by surface mining activities, exploitation of ore and stripping and dumping overburden. In order to identify the long-term impacts of mining on the environment and land cover, these changes must be continuously monitored. A facility to regularly observe the progress of surface mining and reclamation is important for effective enforcement of mining and environmental regulations. Remote sensing provides a powerful tool to obtain rigorous data and reduce the need for time-consuming and expensive field measurements. The purpose of this study was to conduct post classification change detection for identifying, quantifying, and analyzing the spatial response of landscape due to surface lignite coal mining activities in Goynuk, Bolu, Turkey, from 2004 to 2008. The paper presented the research algorithm which involved acquiring multi temporal high resolution satellite data; preprocessing the data; performing image classification using maximum likelihood classification algorithm and performing accuracy assessment on the classification results; performing post classification change detection algorithm; and analyzing the results. Specifically, the paper discussed the study area, data and methodology, and image preprocessing using radiometric correction. Image classification and change detection were also discussed. It was concluded that the mine and dump area decreased by 192.5 ha from 2004 to 2008 and was caused by the diminishing reserves in the area and decline in the required production. 5 refs., 2 tabs., 4 figs.

  7. Moment Tensor Inversion with 3D sensor configuration of Mining Induced Seismicity (Kiruna mine, Sweden)

    Science.gov (United States)

    Ma, Ju; Dineva, Savka; Cesca, Simone; Heimann, Sebastian

    2018-03-01

    Mining induced seismicity is an undesired consequence of mining operations, which poses significant hazard to miners and infrastructures and requires an accurate analysis of the rupture process. Seismic moment tensors of mining-induced events help to understand the nature of mining-induced seismicity by providing information about the relationship between the mining, stress redistribution and instabilities in the rock mass. In this work, we adapt and test a waveform-based inversion method on high frequency data recorded by a dense underground seismic system in one of the largest underground mines in the world (Kiruna mine, Sweden). Stable algorithm for moment tensor inversion for comparatively small mining induced earthquakes, resolving both the double couple and full moment tensor with high frequency data is very challenging. Moreover, the application to underground mining system requires accounting for the 3D geometry of the monitoring system. We construct a Green's function database using a homogeneous velocity model, but assuming a 3D distribution of potential sources and receivers. We first perform a set of moment tensor inversions using synthetic data to test the effects of different factors on moment tensor inversion stability and source parameters accuracy, including the network spatial coverage, the number of sensors and the signal-to-noise ratio. The influence of the accuracy of the input source parameters on the inversion results is also tested. Those tests show that an accurate selection of the inversion parameters allows resolving the moment tensor also in presence of realistic seismic noise conditions. Finally, the moment tensor inversion methodology is applied to 8 events chosen from mining block #33/34 at Kiruna mine. Source parameters including scalar moment, magnitude, double couple, compensated linear vector dipole and isotropic contributions as well as the strike, dip, rake configurations of the double couple term were obtained. The orientations

  8. Moment tensor inversion with three-dimensional sensor configuration of mining induced seismicity (Kiruna mine, Sweden)

    Science.gov (United States)

    Ma, Ju; Dineva, Savka; Cesca, Simone; Heimann, Sebastian

    2018-06-01

    Mining induced seismicity is an undesired consequence of mining operations, which poses significant hazard to miners and infrastructures and requires an accurate analysis of the rupture process. Seismic moment tensors of mining-induced events help to understand the nature of mining-induced seismicity by providing information about the relationship between the mining, stress redistribution and instabilities in the rock mass. In this work, we adapt and test a waveform-based inversion method on high frequency data recorded by a dense underground seismic system in one of the largest underground mines in the world (Kiruna mine, Sweden). A stable algorithm for moment tensor inversion for comparatively small mining induced earthquakes, resolving both the double-couple and full moment tensor with high frequency data, is very challenging. Moreover, the application to underground mining system requires accounting for the 3-D geometry of the monitoring system. We construct a Green's function database using a homogeneous velocity model, but assuming a 3-D distribution of potential sources and receivers. We first perform a set of moment tensor inversions using synthetic data to test the effects of different factors on moment tensor inversion stability and source parameters accuracy, including the network spatial coverage, the number of sensors and the signal-to-noise ratio. The influence of the accuracy of the input source parameters on the inversion results is also tested. Those tests show that an accurate selection of the inversion parameters allows resolving the moment tensor also in the presence of realistic seismic noise conditions. Finally, the moment tensor inversion methodology is applied to eight events chosen from mining block #33/34 at Kiruna mine. Source parameters including scalar moment, magnitude, double-couple, compensated linear vector dipole and isotropic contributions as well as the strike, dip and rake configurations of the double-couple term were obtained

  9. Applying data mining techniques to improve diagnosis in neonatal jaundice

    Directory of Open Access Journals (Sweden)

    Ferreira Duarte

    2012-12-01

    Full Text Available Abstract Background Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies. Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. Methods This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology. This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa – EPE, from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer. Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48 and neural networks (multilayer perceptron. The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. Results The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. Conclusions The findings of our study sustain that, new approaches, such as data mining, may support

  10. Imitating manual curation of text-mined facts in biomedicine.

    Directory of Open Access Journals (Sweden)

    Raul Rodriguez-Esteban

    2006-09-01

    Full Text Available Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations, we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95. Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

  11. Sastra dan Difabel: Menilik Citra Difabel dalam Novel Biola Tak Berdawai dari Sudut Pandang Sosiologi Sastra Ian Watt

    Directory of Open Access Journals (Sweden)

    Mukhanif Yasin Yusuf

    2015-06-01

    Dengan menggunakan pendekatan teori sosiologi sastra Ian Watt, Biola Tak Berdawai tidak jauh berbeda dengan karya-karya Seno lainnya yang melemparkan gagasan kritis terhadap realitas sosial. Latar belakang sosial Seno yang sekaligus sebagai wartawan melemparkan gagasan kritisnya terkait kondisi difabel yang masih mendapat stigma negatif dari masyarakat. Cerminan sosial dalam novel tidak jauh berbeda dengan realitas yang terjadi di Indonesia, dimana ideologi kenormalan menyumbangkan berbagai bentuk ketidakadilan terhadap difabel. Difabel masih dianggap sebagai individu yang cacat, sebagai kutukan Tuhan, dan sebagai sumber aib bagi keluarga. Pendobrakan terhadap realitas yang ada, dilakukan pengarang lewat tokoh utama “Aku” yang difabel dengan menyajikan fakta bahwa difabel memiliki kemampuan yang berbeda, tetapi masyarakat masih belum memahaminya karena sudah terlanjur terjebak pada stigma negatif terhadap difabel.

  12. An Improved Algorithm Research on the PrefixSpan Based on the Server Session Constraint

    Directory of Open Access Journals (Sweden)

    Cai Hong-Guo

    2017-01-01

    Full Text Available When we mine long sequential pattern and discover knowledge by the PrefixSpan algorithm in Web Usage Mining (WUM.The elements and the suffix sequences are much more may cause the problem of the calculation, such as the space explosion. To further solve the problem a more effective way is that. Firstly, a server session-based server log file format is proposed. Then the improved algorithm on the PrefixSpan based on server session constraint is discussed for mining frequent Sequential patterns on the website. Finally, the validity and superiority of the method are presented by the experiment in the paper.

  13. Development of energy-saving technologies providing comfortable microclimate conditions for mining

    Directory of Open Access Journals (Sweden)

    Б. П. Казаков

    2017-03-01

    Full Text Available The paper contains analysis of natural and technogenic factors influencing properties of mine atmosphere, defining level of mining safety and probability of emergencies. Main trends in development of energy-saving technologies providing comfortable microclimate conditions are highlighted. A complex of methods and mathematical models has been developed to carry out aerologic and thermophysical calculations. Main ways of improvement for existing calculation methods of stationary and non-stationary air distribution have been defined: use of ejection draught sources to organize recirculation ventilation; accounting of depression losses at working intersections; inertance impact of  air streams and mined-out spaces for modeling transitory emergency scenarios. Based on the calculation algorithm of airflow rate distribution in the mine network, processing method has been developed for the results of air-depressive surveys under conditions of data shortage. Processes of dust transfer have been modeled in view of its coagulation and settlement, as well as interaction with water drops in case of wet dust prevention. A method to calculate intensity of water evaporation and condensation has been suggested, which allows to forecast time, duration and quantity of precipitation and its migration inside the mine during winter season. Solving the problem of heat exchange between mine airflow and timbering of the ventilation shaft in a conjugation formulation permits to estimate depression value of natural draught and conditions of convective balance between air streams. Normalization of microclimatic parameters for mine atmosphere is forecasted for the use of heat-exchange units either heating or cooling and dehumidifying ventilation air. Algorithms are presented that permit to minimize ventilation energy demands at the stages of mine design and exploitation.

  14. PROGRAMS WITH DATA MINING CAPABILITIES

    Directory of Open Access Journals (Sweden)

    Ciobanu Dumitru

    2012-03-01

    Full Text Available The fact that the Internet has become a commodity in the world has created a framework for anew economy. Traditional businesses migrate to this new environment that offers many features and options atrelatively low prices. However competitiveness is fierce and successful Internet business is tied to rigorous use of allavailable information. The information is often hidden in data and for their retrieval is necessary to use softwarecapable of applying data mining algorithms and techniques. In this paper we want to review some of the programswith data mining capabilities currently available in this area.We also propose some classifications of this softwareto assist those who wish to use such software.

  15. Data Mining Methods for Recommender Systems

    Science.gov (United States)

    Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

    In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.

  16. Text mining in cancer gene and pathway prioritization.

    Science.gov (United States)

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  17. Privacy Preservation in Distributed Subgradient Optimization Algorithms

    OpenAIRE

    Lou, Youcheng; Yu, Lean; Wang, Shouyang

    2015-01-01

    Privacy preservation is becoming an increasingly important issue in data mining and machine learning. In this paper, we consider the privacy preserving features of distributed subgradient optimization algorithms. We first show that a well-known distributed subgradient synchronous optimization algorithm, in which all agents make their optimization updates simultaneously at all times, is not privacy preserving in the sense that the malicious agent can learn other agents' subgradients asymptotic...

  18. Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

    Directory of Open Access Journals (Sweden)

    Dinesh J. Prajapati

    2017-06-01

    Full Text Available Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA and Fast Distributed Mining (FDM algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

  19. Data warehousing and data mining: A case study

    Directory of Open Access Journals (Sweden)

    Suknović Milija

    2005-01-01

    Full Text Available This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS, University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.

  20. Application and Exploration of Big Data Mining in Clinical Medicine

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-01-01

    Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378

  1. Assessing semantic similarity of texts - Methods and algorithms

    Science.gov (United States)

    Rozeva, Anna; Zerkova, Silvia

    2017-12-01

    Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.

  2. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  3. Mining Social and Affective Data for Recommendation of Student Tutors

    Directory of Open Access Journals (Sweden)

    Elisa Boff

    2013-03-01

    Full Text Available This paper presents a learning environment where a mining algorithm is used to learn patterns of interaction with the user and to represent these patterns in a scheme called item descriptors. The learning environment keeps theoretical information about subjects, as well as tools and exercises where the student can put into practice the knowledge gained. One of the main purposes of the project is to stimulate collaborative learning through the interaction of students with different levels of knowledge. The students' actions, as well as their interactions, are monitored by the system and used to find patterns that can guide the search for students that may play the role of a tutor. Such patterns are found with a particular learning algorithm and represented in item descriptors. The paper presents the educational environment, the representation mechanism and learning algorithm used to mine social-affective data in order to create a recommendation model of tutors.

  4. Community Mining Method of Label Propagation Based on Dense Pairs

    Directory of Open Access Journals (Sweden)

    WENG Wei

    2014-03-01

    Full Text Available In recent years, with the popularity of handheld Internet equipments like mobile phones, increasing numbers of people are becoming involved in the virtual social network. Because of its large amount of data and complex structure, the network faces new challenges of community mining. A label propagation algorithm with low time complexity and without prior parameters deals easily with a large networks. This study explored a new method of community mining, based on label propagation with two stages. The first stage involved identifying closely linked nodes according to their local adjacency relations that gave rise to a micro-community. The second stage involved expanding and adjusting this community through a label propagation algorithm (LPA to finally obtain the community structure of the entire social network. This algorithm reduced the number of initial labels and avoided the merging of small communities in general LPAs. Thus, the quality of community discovery was improved, and the linear time complexity of the LPA was maintained.

  5. A Dynamic Fuzzy Cluster Algorithm for Time Series

    Directory of Open Access Journals (Sweden)

    Min Ji

    2013-01-01

    clustering time series by introducing the definition of key point and improving FCM algorithm. The proposed algorithm works by determining those time series whose class labels are vague and further partitions them into different clusters over time. The main advantage of this approach compared with other existing algorithms is that the property of some time series belonging to different clusters over time can be partially revealed. Results from simulation-based experiments on geographical data demonstrate the excellent performance and the desired results have been obtained. The proposed algorithm can be applied to solve other clustering problems in data mining.

  6. On-Board Mining in the Sensor Web

    Science.gov (United States)

    Tanner, S.; Conover, H.; Graves, S.; Ramachandran, R.; Rushing, J.

    2004-12-01

    On-board data mining can contribute to many research and engineering applications, including natural hazard detection and prediction, intelligent sensor control, and the generation of customized data products for direct distribution to users. The ability to mine sensor data in real time can also be a critical component of autonomous operations, supporting deep space missions, unmanned aerial and ground-based vehicles (UAVs, UGVs), and a wide range of sensor meshes, webs and grids. On-board processing is expected to play a significant role in the next generation of NASA, Homeland Security, Department of Defense and civilian programs, providing for greater flexibility and versatility in measurements of physical systems. In addition, the use of UAV and UGV systems is increasing in military, emergency response and industrial applications. As research into the autonomy of these vehicles progresses, especially in fleet or web configurations, the applicability of on-board data mining is expected to increase significantly. Data mining in real time on board sensor platforms presents unique challenges. Most notably, the data to be mined is a continuous stream, rather than a fixed store such as a database. This means that the data mining algorithms must be modified to make only a single pass through the data. In addition, the on-board environment requires real time processing with limited computing resources, thus the algorithms must use fixed and relatively small amounts of processing time and memory. The University of Alabama in Huntsville is developing an innovative processing framework for the on-board data and information environment. The Environment for On-Board Processing (EVE) and the Adaptive On-board Data Processing (AODP) projects serve as proofs-of-concept of advanced information systems for remote sensing platforms. The EVE real-time processing infrastructure will upload, schedule and control the execution of processing plans on board remote sensors. These plans

  7. First and second derivatives of two electron integrals over Cartesian Gaussians using Rys polynomials

    International Nuclear Information System (INIS)

    Schlegel, H.B.; Binkley, J.S.; Pople, J.A.

    1984-01-01

    Formulas are developed for the first and second derivatives of two electron integrals over Cartesian Gaussians. Integrals and integral derivatives are evaluated by the Rys polynomial method. Higher angular momentum functions are not used to calculate the integral derivatives; instead the integral formulas are differentiated directly to produce compact and efficient expressions for the integral derivatives. The use of this algorithm in the ab initio molecular orbital programs gaussIan 80 and gaussIan 82 is discussed. Representative timings for some small molecules with several basis sets are presented. This method is compared with previously published algorithms and its computational merits are discussed

  8. Kajian Data Mining Customer Relationship Management pada Lembaga Keuangan Mikro

    Directory of Open Access Journals (Sweden)

    Tikaridha Hardiani

    2016-01-01

    Full Text Available Companies are required to be ready to face the competition will be intense with other companies, including micro-finance institutions. Faced more intense competition, has led to many businesses in microfinance institutions find profitable strategy to distinguish from the others. Strategy that can be applied is implementing Customer Relationship Management (CRM and data mining. Data mining can be used to microfinance institutions that have a large enough data. Determine the potential customers with customer segmentation can help the decision-making marketing strategy that will be implemented . This paper discusses several data mining techniques that can be used for customer segmentation. Proposed method of data mining technique is fuzzy clustering with fuzzy C-Means algorithm and fuzzy RFM. Keywords : Customer relationship management; Data mining; Fuzzy clustering; Micro-finance institutions; Fuzzy C-Means; Fuzzy RFM

  9. The need for a process mining evaluation framework in research and practice

    NARCIS (Netherlands)

    Rozinat, A.; Alves De Medeiros, A.K.; Günther, C.W.; Weijters, A.J.M.M.; Aalst, van der W.M.P.; Hofstede, ter A.H.M.; Benatallah, B.; Paik, H.Y.

    2008-01-01

    Although there has been much progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we motivate the need for such an evaluation mechanism, and outline

  10. Tensor Completion Algorithms in Big Data Analytics

    OpenAIRE

    Song, Qingquan; Ge, Hancheng; Caverlee, James; Hu, Xia

    2017-01-01

    Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in areas like data mining, computer vision, signal processing, and neuroscience. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data an...

  11. Algorithms and data structures for automated change detection and classification of sidescan sonar imagery

    Science.gov (United States)

    Gendron, Marlin Lee

    During Mine Warfare (MIW) operations, MIW analysts perform change detection by visually comparing historical sidescan sonar imagery (SSI) collected by a sidescan sonar with recently collected SSI in an attempt to identify objects (which might be explosive mines) placed at sea since the last time the area was surveyed. This dissertation presents a data structure and three algorithms, developed by the author, that are part of an automated change detection and classification (ACDC) system. MIW analysts at the Naval Oceanographic Office, to reduce the amount of time to perform change detection, are currently using ACDC. The dissertation introductory chapter gives background information on change detection, ACDC, and describes how SSI is produced from raw sonar data. Chapter 2 presents the author's Geospatial Bitmap (GB) data structure, which is capable of storing information geographically and is utilized by the three algorithms. This chapter shows that a GB data structure used in a polygon-smoothing algorithm ran between 1.3--48.4x faster than a sparse matrix data structure. Chapter 3 describes the GB clustering algorithm, which is the author's repeatable, order-independent method for clustering. Results from tests performed in this chapter show that the time to cluster a set of points is not affected by the distribution or the order of the points. In Chapter 4, the author presents his real-time computer-aided detection (CAD) algorithm that automatically detects mine-like objects on the seafloor in SSI. The author ran his GB-based CAD algorithm on real SSI data, and results of these tests indicate that his real-time CAD algorithm performs comparably to or better than other non-real-time CAD algorithms. The author presents his computer-aided search (CAS) algorithm in Chapter 5. CAS helps MIW analysts locate mine-like features that are geospatially close to previously detected features. A comparison between the CAS and a great circle distance algorithm shows that the

  12. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Amir Hossein Azadnia

    2013-01-01

    Full Text Available One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  13. Induction and pruning of classification rules for prediction of microseismic hazards in coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Sikora, M. [Silesian Technical University, Gliwice (Poland)

    2011-06-15

    The paper presents results of application of a rule induction and pruning algorithm for classification of a microseismic hazard state in coal mines. Due to imbalanced distribution of examples describing states 'hazardous' and 'safe', the special algorithm was used for induction and rule pruning. The algorithm selects optimal parameters' values influencing rule induction and pruning based on training and tuning sets. A rule quality measure which decides about a form and classification abilities of rules that are induced is the basic parameter of the algorithm. The specificity and sensitivity of a classifier were used to evaluate its quality. Conducted tests show that the admitted method of rules induction and classifier's quality evaluation enables to get better results of classification of microseismic hazards than by methods currently used in mining practice. Results obtained by the rules-based classifier were also compared with results got by a decision tree induction algorithm and by a neuro-fuzzy system.

  14. Supporting Solar Physics Research via Data Mining

    Science.gov (United States)

    Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

    2012-05-01

    In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.

  15. Utility Independent Privacy Preserving Data Mining - Horizontally Partitioned Data

    Directory of Open Access Journals (Sweden)

    E Poovammal

    2010-06-01

    Full Text Available Micro data is a valuable source of information for research. However, publishing data about individuals for research purposes, without revealing sensitive information, is an important problem. The main objective of privacy preserving data mining algorithms is to obtain accurate results/rules by analyzing the maximum possible amount of data without unintended information disclosure. Data sets for analysis may be in a centralized server or in a distributed environment. In a distributed environment, the data may be horizontally or vertically partitioned. We have developed a simple technique by which horizontally partitioned data can be used for any type of mining task without information loss. The partitioned sensitive data at 'm' different sites are transformed using a mapping table or graded grouping technique, depending on the data type. This transformed data set is given to a third party for analysis. This may not be a trusted party, but it is still allowed to perform mining operations on the data set and to release the results to all the 'm' parties. The results are interpreted among the 'm' parties involved in the data sharing. The experiments conducted on real data sets prove that our proposed simple transformation procedure preserves one hundred percent of the performance of any data mining algorithm as compared to the original data set while preserving privacy.

  16. Littoral Assessment of Mine Burial Signatures (LAMBS) buried land mine/background spectral signature analyses

    Science.gov (United States)

    Kenton, A.C.; Geci, D.M.; Ray, K.J.; Thomas, C.M.; Salisbury, J.W.; Mars, J.C.; Crowley, J.K.; Witherspoon, N.H.; Holloway, J.H.; Harmon R.S.Broach J.T.Holloway, Jr. J.H.

    2004-01-01

    The objective of the Office of Naval Research (ONR) Rapid Overt Reconnaissance (ROR) program and the Airborne Littoral Reconnaissance Technologies (ALRT) project's LAMBS effort is to determine if electro-optical spectral discriminants exist that are useful for the detection of land mines in littoral regions. Statistically significant buried mine overburden and background signature data were collected over a wide spectral range (0.35 to 14 ??m) to identify robust spectral features that might serve as discriminants for new airborne sensor concepts. LAMBS has expanded previously collected databases to littoral areas - primarily dry and wet sandy soils - where tidal, surf, and wind conditions can severely modify spectral signatures. At AeroSense 2003, we reported completion of three buried mine collections at an inland bay, Atlantic and Gulf of Mexico beach sites.1 We now report LAMBS spectral database analyses results using metrics which characterize the detection performance of general types of spectral detection algorithms. These metrics include mean contrast, spectral signal-to-clutter, covariance, information content, and spectral matched filter analyses. Detection performance of the buried land mines was analyzed with regard to burial age, background type, and environmental conditions. These analyses considered features observed due to particle size differences, surface roughness, surface moisture, and compositional differences.

  17. On the classification techniques in data mining for microarray data classification

    Science.gov (United States)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  18. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.

  19. Anchor-Free Localization Method for Mobile Targets in Coal Mine Wireless Sensor Networks

    OpenAIRE

    Pei, Zhongmin; Deng, Zhidong; Xu, Shuo; Xu, Xiao

    2009-01-01

    Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS) and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes’ location. Fi...

  20. An IPSO-SVM algorithm for security state prediction of mine production logistics system

    Science.gov (United States)

    Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang

    2017-06-01

    A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.

  1. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    Science.gov (United States)

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  2. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  3. Fingerprinting Localization Method Based on TOA and Particle Filtering for Mines

    Directory of Open Access Journals (Sweden)

    Boming Song

    2017-01-01

    Full Text Available Accurate target localization technology plays a very important role in ensuring mine safety production and higher production efficiency. The localization accuracy of a mine localization system is influenced by many factors. The most significant factor is the non-line of sight (NLOS propagation error of the localization signal between the access point (AP and the target node (Tag. In order to improve positioning accuracy, the NLOS error must be suppressed by an optimization algorithm. However, the traditional optimization algorithms are complex and exhibit poor optimization performance. To solve this problem, this paper proposes a new method for mine time of arrival (TOA localization based on the idea of comprehensive optimization. The proposed method utilizes particle filtering to reduce the TOA data error, and the positioning results are further optimized with fingerprinting based on the Manhattan distance. This proposed method combines the advantages of particle filtering and fingerprinting localization. It reduces algorithm complexity and has better error suppression performance. The experimental results demonstrate that, as compared to the symmetric double-sided two-way ranging (SDS-TWR method or received signal strength indication (RSSI based fingerprinting method, the proposed method has a significantly improved localization performance, and the environment adaptability is enhanced.

  4. Gas Concentration Prediction Based on the Measured Data of a Coal Mine Rescue Robot

    Directory of Open Access Journals (Sweden)

    Xiliang Ma

    2016-01-01

    Full Text Available The coal mine environment is complex and dangerous after gas accident; then a timely and effective rescue and relief work is necessary. Hence prediction of gas concentration in front of coal mine rescue robot is an important significance to ensure that the coal mine rescue robot carries out the exploration and search and rescue mission. In this paper, a gray neural network is proposed to predict the gas concentration 10 meters in front of the coal mine rescue robot based on the gas concentration, temperature, and wind speed of the current position and 1 meter in front. Subsequently the quantum genetic algorithm optimization gray neural network parameters of the gas concentration prediction method are proposed to get more accurate prediction of the gas concentration in the roadway. Experimental results show that a gray neural network optimized by the quantum genetic algorithm is more accurate for predicting the gas concentration. The overall prediction error is 9.12%, and the largest forecasting error is 11.36%; compared with gray neural network, the gas concentration prediction error increases by 55.23%. This means that the proposed method can better allow the coal mine rescue robot to accurately predict the gas concentration in the coal mine roadway.

  5. An Improved Biclustering Algorithm and Its Application to Gene Expression Spectrum Analysis

    OpenAIRE

    Qu, Hua; Wang, Liu-Pu; Liang, Yan-Chun; Wu, Chun-Guo

    2016-01-01

    Cheng and Church algorithm is an important approach in biclustering algorithms. In this paper, the process of the extended space in the second stage of Cheng and Church algorithm is improved and the selections of two important parameters are discussed. The results of the improved algorithm used in the gene expression spectrum analysis show that, compared with Cheng and Church algorithm, the quality of clustering results is enhanced obviously, the mining expression models are better, and the d...

  6. Action Rules Mining

    CERN Document Server

    Dardzinska, Agnieszka

    2013-01-01

    We are surrounded by data, numerical, categorical and otherwise, which must to be analyzed and processed to convert it into information that instructs, answers or aids understanding and decision making. Data analysts in many disciplines such as business, education or medicine, are frequently asked to analyze new data sets which are often composed of numerous tables possessing different properties. They try to find completely new correlations between attributes and show new possibilities for users.   Action rules mining discusses some of data mining and knowledge discovery principles and then describe representative concepts, methods and algorithms connected with action. The author introduces the formal definition of action rule, notion of a simple association action rule and a representative action rule, the cost of association action rule, and gives a strategy how to construct simple association action rules of a lowest cost. A new approach for generating action rules from datasets with numerical attributes...

  7. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  8. Data mining application in industrial energy audit for lighting

    Energy Technology Data Exchange (ETDEWEB)

    Maricar, N.M.; Kim, G.C.; Jamal, N. [Kolej Univ., Melaka (Malaysia). Faculty of Electrical Engineering

    2005-07-01

    A data mining application for lighting energy audits at industrial sites was presented. Data collection was based on the parameters needed for the analysis part of the audit. Data collection included the activity for which the room was used; its dimension; light level readings in lux; the number of luminaries; the number of lamps per luminaries; lamp fixtures; and lamp wattage. The lumen method was used to calculate the recommended numbers of luminaries in the room. The number was then compared with the existing system's luminaries. The installed load efficacy ratio (ILER) was then used to determine proper retrofit action to maximize energy usage. The difference between the calculated lux and the standard lux was used to create data subsets. A data mining algorithm was used to determine that the ILER plays an important role in calculating the efficiency of lighting systems. It was also concluded that the method can be used to minimize the time needed to analyze large amounts of lighting data. The results of case studies were also used to show that the combined data mining algorithm provided accurate assessments using existing calculated data. 7 refs., 8 tabs., 5 figs.

  9. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  10. Earth Science Mining Web Services

    Science.gov (United States)

    Pham, Long; Lynnes, Christopher; Hegde, Mahabaleshwa; Graves, Sara; Ramachandran, Rahul; Maskey, Manil; Keiser, Ken

    2008-01-01

    To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have developed a system to mine data at the source without the need of network transfers. The system has been constructed by linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements (S4PM), a processing engine at he GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow composer; and the EOS Clearinghouse (ECHO). XBaya is used to construct an analysis workflow at UAH using ADam components, which are also installed remotely at the GES DISC, wrapped as Web Services. The S4PM processing engine searches ECHO for data using space-time criteria, staging them to cache, allowing the ActiveBPEL engine to remotely orchestras the processing workflow within S4PM. As mining is completed, the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, while mining data at the data source using the server's resources rather than transferring the full volume over the internet. These diverse technologies have been infused into a functioning, distributed system with only minor changes to the underlying technologies. The key to the infusion is the loosely coupled, Web-Services based architecture: All of the participating components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services.

  11. Automatic detection of referral patients due to retinal pathologies through data mining.

    Science.gov (United States)

    Quellec, Gwenolé; Lamard, Mathieu; Erginay, Ali; Chabouis, Agnès; Massin, Pascale; Cochener, Béatrice; Cazuguel, Guy

    2016-04-01

    With the increased prevalence of retinal pathologies, automating the detection of these pathologies is becoming more and more relevant. In the past few years, many algorithms have been developed for the automated detection of a specific pathology, typically diabetic retinopathy, using eye fundus photography. No matter how good these algorithms are, we believe many clinicians would not use automatic detection tools focusing on a single pathology and ignoring any other pathology present in the patient's retinas. To solve this issue, an algorithm for characterizing the appearance of abnormal retinas, as well as the appearance of the normal ones, is presented. This algorithm does not focus on individual images: it considers examination records consisting of multiple photographs of each retina, together with contextual information about the patient. Specifically, it relies on data mining in order to learn diagnosis rules from characterizations of fundus examination records. The main novelty is that the content of examination records (images and context) is characterized at multiple levels of spatial and lexical granularity: 1) spatial flexibility is ensured by an adaptive decomposition of composite retinal images into a cascade of regions, 2) lexical granularity is ensured by an adaptive decomposition of the feature space into a cascade of visual words. This multigranular representation allows for great flexibility in automatically characterizing normality and abnormality: it is possible to generate diagnosis rules whose precision and generalization ability can be traded off depending on data availability. A variation on usual data mining algorithms, originally designed to mine static data, is proposed so that contextual and visual data at adaptive granularity levels can be mined. This framework was evaluated in e-ophtha, a dataset of 25,702 examination records from the OPHDIAT screening network, as well as in the publicly-available Messidor dataset. It was successfully

  12. Time delay and profit accumulation effect on a mine-based uranium market clearing model

    International Nuclear Information System (INIS)

    Auzans, Aris; Teder, Allan; Tkaczyk, Alan H.

    2016-01-01

    Highlights: • Improved version of a mine-based uranium market clearing model for the front-end uranium market and enrichment industries is proposed. • A profit accumulation algorithm and time delay function provides more realistic uranium mine decision making process. • Operational decision delay increased uranium market price volatility. - Abstract: The mining industry faces a number of challenges such as market volatility, investment safety, issues surrounding employment and productivity. Therefore, computer simulations are highly relevant in order to reduce financial risks associated with these challenges. In the mining industry, each firm must compete with other mines and the basic target is profit maximization. The aim of this paper is to evaluate the world uranium (U) supply by simulating financial management challenges faced by an individual U mine that are caused by a variety of regulation issues. In this paper front-end nuclear fuel cycle tool is used to simulate market conditions and the effects they have on the stability of U supply. An individual U mine’s exit or entry in the market might cause changes in the U supply side which can increase or decrease the market price. In this paper we offer a more advanced version of a mine-based U market clearing model. The existing U market model incorporates the market of primary U from uranium mines with secondary uranium (depleted uranium DU), enriched uranium (HEU) and enrichment services. In the model each uranium mine acts as an independent agent that is able to make operational decisions based on the market price. This paper introduces a more realistic decision making algorithm of individual U mine that adds constraints to production decisions. The authors added an accumulated profit model, which allows for the profits accumulated to cover any possible future economic losses and the time-delay algorithm to simulate delayed process of reopening a U mine. The U market simulation covers time period 2010

  13. Time delay and profit accumulation effect on a mine-based uranium market clearing model

    Energy Technology Data Exchange (ETDEWEB)

    Auzans, Aris [Institute of Physics, University of Tartu, Ostwaldi 1, EE-50411 Tartu (Estonia); Teder, Allan [School of Economics and Business Administration, University of Tartu, Narva mnt 4, EE-51009 Tartu (Estonia); Tkaczyk, Alan H., E-mail: alan@ut.ee [Institute of Physics, University of Tartu, Ostwaldi 1, EE-50411 Tartu (Estonia)

    2016-12-15

    Highlights: • Improved version of a mine-based uranium market clearing model for the front-end uranium market and enrichment industries is proposed. • A profit accumulation algorithm and time delay function provides more realistic uranium mine decision making process. • Operational decision delay increased uranium market price volatility. - Abstract: The mining industry faces a number of challenges such as market volatility, investment safety, issues surrounding employment and productivity. Therefore, computer simulations are highly relevant in order to reduce financial risks associated with these challenges. In the mining industry, each firm must compete with other mines and the basic target is profit maximization. The aim of this paper is to evaluate the world uranium (U) supply by simulating financial management challenges faced by an individual U mine that are caused by a variety of regulation issues. In this paper front-end nuclear fuel cycle tool is used to simulate market conditions and the effects they have on the stability of U supply. An individual U mine’s exit or entry in the market might cause changes in the U supply side which can increase or decrease the market price. In this paper we offer a more advanced version of a mine-based U market clearing model. The existing U market model incorporates the market of primary U from uranium mines with secondary uranium (depleted uranium DU), enriched uranium (HEU) and enrichment services. In the model each uranium mine acts as an independent agent that is able to make operational decisions based on the market price. This paper introduces a more realistic decision making algorithm of individual U mine that adds constraints to production decisions. The authors added an accumulated profit model, which allows for the profits accumulated to cover any possible future economic losses and the time-delay algorithm to simulate delayed process of reopening a U mine. The U market simulation covers time period 2010

  14. Reading the Anthropocene through science and apocalypse in the selected contemporary fiction of J.G. Ballard, Kurt Vonnegut, Cormac McCarthy and Ian McEwan

    OpenAIRE

    Fevyer, David

    2016-01-01

    This thesis examines how six contemporary novels variously intervene in the current crisis of climate change. Through close readings of J G Ballard’s The Drowned World (1962) and Hello America (1981); Cormac McCarthy’s The Road (2006); Kurt Vonnegut’s Cat’s Cradle (1963) and Galapagos (1985); and Ian McEwan’s Solar (2010), the thesis aims to identify how the narrative and generic resources of contemporary fiction might help readers to think through and beyond the consequences of anthropocentr...

  15. Mine drivage in hydraulic mines

    Energy Technology Data Exchange (ETDEWEB)

    Ehkber, B Ya

    1983-09-01

    From 20 to 25% of labor cost in hydraulic coal mines falls on mine drivage. Range of mine drivage is high due to the large number of shortwalls mined by hydraulic monitors. Reducing mining cost in hydraulic mines depends on lowering drivage cost by use of new drivage systems or by increasing efficiency of drivage systems used at present. The following drivage methods used in hydraulic mines are compared: heading machines with hydraulic haulage of cut rocks and coal, hydraulic monitors with hydraulic haulage, drilling and blasting with hydraulic haulage of blasted rocks. Mining and geologic conditions which influence selection of the optimum mine drivage system are analyzed. Standardized cross sections of mine roadways driven by the 3 methods are shown in schemes. Support systems used in mine roadways are compared: timber supports, roof bolts, roof bolts with steel elements, and roadways driven in rocks without a support system. Heading machines (K-56MG, GPKG, 4PU, PK-3M) and hydraulic monitors (GMDTs-3M, 12GD-2) used for mine drivage are described. Data on mine drivage in hydraulic coal mines in the Kuzbass are discussed. From 40 to 46% of roadways are driven by heading machines with hydraulic haulage and from 12 to 15% by hydraulic monitors with hydraulic haulage.

  16. Web mining in soft computing framework: relevance, state of the art and future directions.

    Science.gov (United States)

    Pal, S K; Talwar, V; Mitra, P

    2002-01-01

    The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.

  17. Sex Complexity and Politics in Black Dogs by Ian McEwan

    Science.gov (United States)

    Abbasiyannejad, Mina; Talif, Rosli

    Ian McEwan's Black Dogs (BD) is a story of socio-political conflict during the critical era of the Cold War. Black Dogs is riddled with party (political) domination and its outcomes in society. Europe is still suffering the consequences of the Second World War, perhaps the biggest war of the twentieth century. In the aftermath of such worldwide upheaval, the conflicts that were in tandem with the scramble for political domination emerged in diverse ways, affecting nations and their human populations. Systematic sexual assault during the war years showed that sex was used both for intimidation and humiliation. This study attempts to picture the multidimensional aspects of politics which are practically related to the most intimate human relationship, that is, sex. It pictures how personal is equated with the political and vice versa. The theory of sexual politics is the theoretical framework used to scrutinize power-structure relationship. By reviewing the major conflicts in such a scenario, as the Cold War, and societal restriction, this study concludes that conflict in the macrocosm (world and society) affects the microcosm (individual) in McEwan's Black Dogs. It provides a rather broad picture of politics and sexuality and highlights the stresses of wider society on human dysfunctional relationships. Rape as a tactic of war for a political goal demonstrates another aspect of sex. Reviewing the period in which the story takes place and relating it to the conflicts in society, the study goes beyond simple cause and effect problems among individuals and portrays a holistic view of sexuality and society.

  18. Automated system of monitoring and positioning of functional units of mining technological machines for coal-mining enterprises

    Directory of Open Access Journals (Sweden)

    Meshcheryakov Yaroslav

    2018-01-01

    Full Text Available This article is show to the development of an automated monitoring and positioning system for functional nodes of mining technological machines. It describes the structure, element base, algorithms for identifying the operating states of a walking excavator; various types of errors in the functioning of microelectromechanical gyroscopes and accelerometers, as well as methods for their correction based on the Madgwick fusion filter. The results of industrial tests of an automated monitoring and positioning system for functional units on one of the opencast coal mines of Kuzbass are presented. This work is addressed to specialists working in the fields of the development of embedded systems and control systems, radio electronics, mechatronics, and robotics.

  19. Methodology of simulation of underground working in metal mines. Application to a uranium deposit in Australia

    International Nuclear Information System (INIS)

    Deraisme, J.; de Fouquet, C.; Fraisse, H.

    1983-01-01

    For the Ben Lomond (Northern Queensland Australia) underground uranium mining project, studies were carried out to compare the feasibility of different mining methods according to their cost per ton and selectivity, i.e. cut and fill, sublevel stopping and both mixed. First, a geostatistical orebody model was built. The ore grade variability of this model results from the drillhole structural analysis. Working on two dimensional vertical cross sections, the usual hand drawing stope reserve estimate obtained with computer assisted design for each of the three different mining methods is compared with the results obtained with automatic algorithms allocated to the characteristics of each mining method. These algorithms use mathematical morphology to reproduce the geometrical constraints connected with each mining method and/or dynamic programmation. These techniques lead to fully automatic of optimal economical stope design. Comparison is positive: automatic stopes designs are in agreement with hand made drawings, but they can be defined faster through interactive questionning of the computer, and the total maximum profit obtained is a least as high as the best profit found through hand designed projects [fr

  20. Mathematical model for water quality impact assessment and its computer application in coal mine water

    International Nuclear Information System (INIS)

    Sundararajan, M.; Chakraborty, M.K.; Gupta, J.P.; Saxena, N.C.; Dhar, B.B.

    1994-01-01

    This paper presents a mathematical model to assess the Water Quality Impact in coal mine or in river system by accurate and rational method. Algorithm, flowchart and computer programme have been developed upon this model to assess the quality of coal mine water. 3 refs., 2 figs., 2 tabs

  1. Chaotically encoded particle swarm optimization algorithm and its applications

    International Nuclear Information System (INIS)

    Alatas, Bilal; Akin, Erhan

    2009-01-01

    This paper proposes a novel particle swarm optimization (PSO) algorithm, chaotically encoded particle swarm optimization algorithm (CENPSOA), based on the notion of chaos numbers that have been recently proposed for a novel meaning to numbers. In this paper, various chaos arithmetic and evaluation measures that can be used in CENPSOA have been described. Furthermore, CENPSOA has been designed to be effectively utilized in data mining applications.

  2. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  3. Improvement on LEACH Agreement of Mine Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Yun-xiang Liu

    2017-05-01

    Full Text Available Based on the characteristics of wireless sensor network communication in mine, LEACH protocol clustering is optimized, and the factors of energy and distance are considered fully. The selection of cluster head nodes is optimized, and a routing algorithm based on K-means ++ clustering is proposed. The problem of uneven distribution of cluster head nodes, uneven energy consumption and network stability in LEACH algorithm is improved effectively. Simulation results show that the proposed algorithm can improve the energy consumption of the whole network and improve the energy utilization rate, extending the network life cycle effectively.

  4. Gain ratio based fuzzy weighted association rule mining classifier for ...

    Indian Academy of Sciences (India)

    association rule mining algorithm for extracting both association rules and member- .... The disadvantage of this work is in considering the generalization at each ... If the new attribute is entered, the generalization process does not consider the ...

  5. An Adaptive Sensor Mining Framework for Pervasive Computing Applications

    Science.gov (United States)

    Rashidi, Parisa; Cook, Diane J.

    Analyzing sensor data in pervasive computing applications brings unique challenges to the KDD community. The challenge is heightened when the underlying data source is dynamic and the patterns change. We introduce a new adaptive mining framework that detects patterns in sensor data, and more importantly, adapts to the changes in the underlying model. In our framework, the frequent and periodic patterns of data are first discovered by the Frequent and Periodic Pattern Miner (FPPM) algorithm; and then any changes in the discovered patterns over the lifetime of the system are discovered by the Pattern Adaptation Miner (PAM) algorithm, in order to adapt to the changing environment. This framework also captures vital context information present in pervasive computing applications, such as the startup triggers and temporal information. In this paper, we present a description of our mining framework and validate the approach using data collected in the CASAS smart home testbed.

  6. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  7. Data mining with SPSS modeler theory, exercises and solutions

    CERN Document Server

    Wendler, Tilo

    2016-01-01

    Introducing the IBM SPSS Modeler, this book guides readers through data mining processes and presents relevant statistical methods. There is a special focus on step-by-step tutorials and well-documented examples that help demystify complex mathematical algorithms and computer programs. The variety of exercises and solutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. While intended for students, the simplicity of the Modeler makes the book useful for anyone wishing to learn about basic and more advanced data mining, and put this knowledge into practice.

  8. Neutronic calculations in core conversion of the IAN-R1 research reactor from MTR HEU to TRIGA LEU fuel

    International Nuclear Information System (INIS)

    Sarta Fuentes, Jose A.; Castiblanco, L.A.

    2003-01-01

    With cooperation of the International Atomic Energy Agency (IAEA), neutronic calculations were carried out for conversion of the Ian-R1 Reactor from MTR-HEU fuel to TRIGA-LEU fuel. In order to establish a staff for neutronic calculation at the Instituto de Cancan's Nucleares y Energia s Alternatives (INEA) a program was established. This program included training, acquisition of hardware, software and calculation for the core with MTR-HEU fuel , enriched nominally to 93% and calculation for several arrangements with the TRIGA-LEU fuel, enriched to 19.7%. The results were verified and compared with several groups of calculation at the Instituto Nacional de Investigaciones Nucleares (ININ) in Mexico, and General Atomics (GA) in United States. As a result of this program, several technical reports have been wrote. (author)

  9. AN INNOVATIVE WEB MINING APPLICATION ON BLOGS - A LAYOUT

    Directory of Open Access Journals (Sweden)

    S. Prakash

    2012-01-01

    Full Text Available Blogs and Web services agree to express user’s opinions and interests, in the form of small text messages which gives abbreviated and highly personalized remarks in real-time. Recognizing emotion is really significant for a text-based communication tool such as blogs. Nowadays, user opinions in the structure of comments, reviews in blogs have been utilized by researchers for various purposes. Among them the application of sentiment analysis techniques to these opinions is an interesting one. This paper deals with a proposal of a software structural design for constructing Web mining applications in the blog world. The design includes blog crawling and data mining algorithms, to offer a full-fledged and flexible key for constructing general-purpose Web mining applications. The structural design allocates some significant customizations, such as the construction of adapters for reading text from different blogs, and the utilization of different pre-processing methods and data mining procedures. The core of this paper is on explaining the innovative software structural design of the general framework offering thorough information about the data mining sub-framework.

  10. PRESEE: an MDL/MML algorithm to time-series stream segmenting.

    Science.gov (United States)

    Xu, Kaikuo; Jiang, Yexi; Tang, Mingjie; Yuan, Changan; Tang, Changjie

    2013-01-01

    Time-series stream is one of the most common data types in data mining field. It is prevalent in fields such as stock market, ecology, and medical care. Segmentation is a key step to accelerate the processing speed of time-series stream mining. Previous algorithms for segmenting mainly focused on the issue of ameliorating precision instead of paying much attention to the efficiency. Moreover, the performance of these algorithms depends heavily on parameters, which are hard for the users to set. In this paper, we propose PRESEE (parameter-free, real-time, and scalable time-series stream segmenting algorithm), which greatly improves the efficiency of time-series stream segmenting. PRESEE is based on both MDL (minimum description length) and MML (minimum message length) methods, which could segment the data automatically. To evaluate the performance of PRESEE, we conduct several experiments on time-series streams of different types and compare it with the state-of-art algorithm. The empirical results show that PRESEE is very efficient for real-time stream datasets by improving segmenting speed nearly ten times. The novelty of this algorithm is further demonstrated by the application of PRESEE in segmenting real-time stream datasets from ChinaFLUX sensor networks data stream.

  11. An Interactive Personalized Recommendation System Using the Hybrid Algorithm Model

    Directory of Open Access Journals (Sweden)

    Yan Guo

    2017-10-01

    Full Text Available With the rapid development of e-commerce, the contradiction between the disorder of business information and customer demand is increasingly prominent. This study aims to make e-commerce shopping more convenient, and avoid information overload, by an interactive personalized recommendation system using the hybrid algorithm model. The proposed model first uses various recommendation algorithms to get a list of original recommendation results. Combined with the customer’s feedback in an interactive manner, it then establishes the weights of corresponding recommendation algorithms. Finally, the synthetic formula of evidence theory is used to fuse the original results to obtain the final recommendation products. The recommendation performance of the proposed method is compared with that of traditional methods. The results of the experimental study through a Taobao online dress shop clearly show that the proposed method increases the efficiency of data mining in the consumer coverage, the consumer discovery accuracy and the recommendation recall. The hybrid recommendation algorithm complements the advantages of the existing recommendation algorithms in data mining. The interactive assigned-weight method meets consumer demand better and solves the problem of information overload. Meanwhile, our study offers important implications for e-commerce platform providers regarding the design of product recommendation systems.

  12. Multi-objective optimization of HVAC system with an evolutionary computation algorithm

    International Nuclear Information System (INIS)

    Kusiak, Andrew; Tang, Fan; Xu, Guanglin

    2011-01-01

    A data-mining approach for the optimization of a HVAC (heating, ventilation, and air conditioning) system is presented. A predictive model of the HVAC system is derived by data-mining algorithms, using a dataset collected from an experiment conducted at a research facility. To minimize the energy while maintaining the corresponding IAQ (indoor air quality) within a user-defined range, a multi-objective optimization model is developed. The solutions of this model are set points of the control system derived with an evolutionary computation algorithm. The controllable input variables - supply air temperature and supply air duct static pressure set points - are generated to reduce the energy use. The results produced by the evolutionary computation algorithm show that the control strategy saves energy by optimizing operations of an HVAC system. -- Highlights: → A data-mining approach for the optimization of a heating, ventilation, and air conditioning (HVAC) system is presented. → The data used in the project has been collected from an experiment conducted at an energy research facility. → The approach presented in the paper leads to accomplishing significant energy savings without compromising the indoor air quality. → The energy savings are accomplished by computing set points for the supply air temperature and the supply air duct static pressure.

  13. An algorithm, implementation and execution ontology design pattern

    NARCIS (Netherlands)

    Lawrynowicz, A.; Esteves, D.; Panov, P.; Soru, T.; Dzeroski, S.; Vanschoren, J.

    2016-01-01

    This paper describes an ontology design pattern for modeling algorithms, their implementations and executions. This pattern is derived from the research results on data mining/machine learning ontologies, but is more generic. We argue that the proposed pattern will foster the development of

  14. Mining and mining authorities in Saarland 2016. Mining economy, mining technology, occupational safety, environmental protection, statistics, mining authority activities. Annual report

    International Nuclear Information System (INIS)

    2016-01-01

    The annual report of the Saarland Upper Mining Authority provides an insight into the activities of mining authorities. Especially, the development of the black coal mining, safety and technology of mining as well as the correlation between mining and environment are stressed.

  15. AN EFFICIENT DATA MINING METHOD TO FIND FREQUENT ITEM SETS IN LARGE DATABASE USING TR- FCTM

    Directory of Open Access Journals (Sweden)

    Saravanan Suba

    2016-01-01

    Full Text Available Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items using candidate generation with more number of data base scans. FP-tree uses two database scans to find significant frequent items without using candidate generation. This proposed TR-FCTM (Transaction Reduction- Frequency Count Table Method discovers significant frequent items by generating full candidates once to form frequency count table with one database scan. Experimental results of TR-FCTM shows that this algorithm outperforms than Apriori and FP-tree.

  16. Risk of hepatotoxicity associated with the use of telithromycin: a signal detection using data mining algorithms.

    Science.gov (United States)

    Chen, Yan; Guo, Jeff J; Healy, Daniel P; Lin, Xiaodong; Patel, Nick C

    2008-12-01

    With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. To detect the safety signal regarding the reporting of hepatotoxicity associated with the use of telithromycin using 4 commonly employed data mining algorithms (DMAs). Based on the Adverse Events Reporting System (AERS) database of the Food and Drug Administration, 4 DMAs, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the information component (IC), and the Gamma Poisson Shrinker (GPS), were applied to examine the association between the reporting of hepatotoxicity and the use of telithromycin. The study period was from the first quarter of 2004 to the second quarter of 2006. The reporting of hepatotoxicity was identified using the preferred terms indexed in the Medical Dictionary for Regulatory Activities. The drug name was used to identify reports regarding the use of telithromycin. A total of 226 reports describing hepatotoxicity associated with the use of telithromycin were recorded in the AERS. A safety problem of telithromycin associated with increased reporting of hepatotoxicity was clearly detected by 4 algorithms as early as 2005, signaling the problem in the first quarter by the ROR and the IC, in the second quarter by the PRR, and in the fourth quarter by the GPS. A safety signal was indicated by the 4 DMAs suggesting an association between the reporting of hepatotoxicity and the use of telithromycin. Given the wide use of telithromycin and serious consequences of hepatotoxicity, clinicians should be cautious when selecting telithromycin for treatment of an infection. In addition, further observational studies are required to evaluate the utility of signal detection systems for early recognition of serious, life-threatening, low-frequency drug-induced adverse events.

  17. Determining a pre-mining radiological baseline from historic airborne gamma surveys: A case study

    International Nuclear Information System (INIS)

    Bollhöfer, Andreas; Beraldo, Annamarie; Pfitzner, Kirrilly; Esparon, Andrew; Doering, Che

    2014-01-01

    Knowing the baseline level of radioactivity in areas naturally enriched in radionuclides is important in the uranium mining context to assess radiation doses to humans and the environment both during and after mining. This information is particularly useful in rehabilitation planning and developing closure criteria for uranium mines as only radiation doses additional to the natural background are usually considered ‘controllable’ for radiation protection purposes. In this case study we have tested whether the method of contemporary groundtruthing of a historic airborne gamma survey could be used to determine the pre-mining radiological conditions at the Ranger mine in northern Australia. The airborne gamma survey was flown in 1976 before mining started and groundtruthed using ground gamma dose rate measurements made between 2007 and 2009 at an undisturbed area naturally enriched in uranium (Anomaly 2) located nearby the Ranger mine. Measurements of 226 Ra soil activity concentration and 222 Rn exhalation flux density at Anomaly 2 were made concurrent with the ground gamma dose rate measurements. Algorithms were developed to upscale the ground gamma data to the same spatial resolution as the historic airborne gamma survey data using a geographic information system, allowing comparison of the datasets. Linear correlation models were developed to estimate the pre-mining gamma dose rates, 226 Ra soil activity concentrations, and 222 Rn exhalation flux densities at selected areas in the greater Ranger region. The modelled levels agreed with measurements made at the Ranger Orebodies 1 and 3 before mining started, and at environmental sites in the region. The conclusion is that our approach can be used to determine baseline radiation levels, and provide a benchmark for rehabilitation of uranium mines or industrial sites where historical airborne gamma survey data are available and an undisturbed radiological analogue exists to groundtruth the data. - Highlights:

  18. Effect of Temporal Relationships in Associative Rule Mining for Web Log Data

    Science.gov (United States)

    Mohd Khairudin, Nazli; Mustapha, Aida

    2014-01-01

    The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality. PMID:24587757

  19. Image-based Proof of Work Algorithm for the Incentivization of Blockchain Archival of Interesting Images

    OpenAIRE

    Billings, Jake

    2017-01-01

    A new variation of blockchain proof of work algorithm is proposed to incentivize the timely execution of image processing algorithms. A sample image processing algorithm is proposed to determine interesting images using analysis of the entropy of pixel subsets within images. The efficacy of the image processing algorithm is examined using two small sets of training and test data. The interesting image algorithm is then integrated into a simplified blockchain mining proof of work algorithm bas...

  20. Data mining in bioinformatics using Weka.

    Science.gov (United States)

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  1. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  2. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    Science.gov (United States)

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  3. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    Science.gov (United States)

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  4. Optimization of C4.5 algorithm-based particle swarm optimization for breast cancer diagnosis

    Science.gov (United States)

    Muslim, M. A.; Rukmana, S. H.; Sugiharti, E.; Prasetiyo, B.; Alimah, S.

    2018-03-01

    Data mining has become a basic methodology for computational applications in the field of medical domains. Data mining can be applied in the health field such as for diagnosis of breast cancer, heart disease, diabetes and others. Breast cancer is most common in women, with more than one million cases and nearly 600,000 deaths occurring worldwide each year. The most effective way to reduce breast cancer deaths was by early diagnosis. This study aims to determine the level of breast cancer diagnosis. This research data uses Wisconsin Breast Cancer dataset (WBC) from UCI machine learning. The method used in this research is the algorithm C4.5 and Particle Swarm Optimization (PSO) as a feature option and to optimize the algorithm. C4.5. Ten-fold cross-validation is used as a validation method and a confusion matrix. The result of this research is C4.5 algorithm. The particle swarm optimization C4.5 algorithm has increased by 0.88%.

  5. Data Mining for Anomaly Detection

    Science.gov (United States)

    Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

    2013-01-01

    The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.

  6. Mine Water Treatment in Hongai Coal Mines

    Science.gov (United States)

    Dang, Phuong Thao; Dang, Vu Chi

    2018-03-01

    Acid mine drainage (AMD) is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine drainage treatment in Hongai coal mines. In addition, selection and criteria for the design of the treatment systems have been presented.

  7. VRLane: a desktop virtual safety management program for underground coal mine

    Science.gov (United States)

    Li, Mei; Chen, Jingzhu; Xiong, Wei; Zhang, Pengpeng; Wu, Daozheng

    2008-10-01

    VR technologies, which generate immersive, interactive, and three-dimensional (3D) environments, are seldom applied to coal mine safety work management. In this paper, a new method that combined the VR technologies with underground mine safety management system was explored. A desktop virtual safety management program for underground coal mine, called VRLane, was developed. The paper mainly concerned about the current research advance in VR, system design, key techniques and system application. Two important techniques were introduced in the paper. Firstly, an algorithm was designed and implemented, with which the 3D laneway models and equipment models can be built on the basis of the latest mine 2D drawings automatically, whereas common VR programs established 3D environment by using 3DS Max or the other 3D modeling software packages with which laneway models were built manually and laboriously. Secondly, VRLane realized system integration with underground industrial automation. VRLane not only described a realistic 3D laneway environment, but also described the status of the coal mining, with functions of displaying the run states and related parameters of equipment, per-alarming the abnormal mining events, and animating mine cars, mine workers, or long-wall shearers. The system, with advantages of cheap, dynamic, easy to maintenance, provided a useful tool for safety production management in coal mine.

  8. Coal and Open-pit surface mining impacts on American Lands (COAL)

    Science.gov (United States)

    Brown, T. A.; McGibbney, L. J.

    2017-12-01

    Mining is known to cause environmental degradation, but software tools to identify its impacts are lacking. However, remote sensing, spectral reflectance, and geographic data are readily available, and high-performance cloud computing resources exist for scientific research. Coal and Open-pit surface mining impacts on American Lands (COAL) provides a suite of algorithms and documentation to leverage these data and resources to identify evidence of mining and correlate it with environmental impacts over time.COAL was originally developed as a 2016 - 2017 senior capstone collaboration between scientists at the NASA Jet Propulsion Laboratory (JPL) and computer science students at Oregon State University (OSU). The COAL team implemented a free and open-source software library called "pycoal" in the Python programming language which facilitated a case study of the effects of coal mining on water resources. Evidence of acid mine drainage associated with an open-pit coal mine in New Mexico was derived by correlating imaging spectrometer data from the JPL Airborne Visible/InfraRed Imaging Spectrometer - Next Generation (AVIRIS-NG), spectral reflectance data published by the USGS Spectroscopy Laboratory in the USGS Digital Spectral Library 06, and GIS hydrography data published by the USGS National Geospatial Program in The National Map. This case study indicated that the spectral and geospatial algorithms developed by COAL can be used successfully to analyze the environmental impacts of mining activities.Continued development of COAL has been promoted by a Startup allocation award of high-performance computing resources from the Extreme Science and Engineering Discovery Environment (XSEDE). These resources allow the team to undertake further benchmarking, evaluation, and experimentation using multiple XSEDE resources. The opportunity to use computational infrastructure of this caliber will further enable the development of a science gateway to continue foundational COAL

  9. Mine Water Treatment in Hongai Coal Mines

    OpenAIRE

    Dang Phuong Thao; Dang Vu Chi

    2018-01-01

    Acid mine drainage (AMD) is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine ...

  10. EOQ estimation for imperfect quality items using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-09-01

    Full Text Available Timely identification of newly emerging trends is needed in business process. Data mining techniques like clustering, association rule mining, classification, etc. are very important for business support and decision making. This paper presents a method for redesigning the ordering policy by including cross-selling effect. Initially, association rules are mined on the transactional database and EOQ is estimated with revenue earned. Then, transactions are clustered to obtain homogeneous clusters and association rules are mined in each cluster to estimate EOQ with revenue earned for each cluster. Further, this paper compares ordering policy for imperfect quality items which is developed by applying rules derived from apriori algorithm viz. a without clustering the transactions, and b after clustering the transactions. A numerical example is illustrated to validate the results.

  11. Data Mining Tools Make Flights Safer, More Efficient

    Science.gov (United States)

    2014-01-01

    A small data mining team at Ames Research Center developed a set of algorithms ideal for combing through flight data to find anomalies. Dallas-based Southwest Airlines Co. signed a Space Act Agreement with Ames in 2011 to access the tools, helping the company refine its safety practices, improve its safety reviews, and increase flight efficiencies.

  12. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  13. Mine Water Treatment in Hongai Coal Mines

    Directory of Open Access Journals (Sweden)

    Dang Phuong Thao

    2018-01-01

    Full Text Available Acid mine drainage (AMD is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine drainage treatment in Hongai coal mines. In addition, selection and criteria for the design of the treatment systems have been presented.

  14. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  15. An efficient communication strategy for mobile agent based distributed spatial data mining application

    Science.gov (United States)

    Han, Guodong; Wang, Jiazhen

    2005-11-01

    An efficient communication strategy is proposed in this paper, which aims to improve the response time and availability of mobile agent based distributed spatial data mining applications. When dealing with decomposed complex data mining tasks or On-Line Analytical Processing (OLAP), mobile agents authorized by the specified user need to coordinate and cooperate with each other by employing given communication method to fulfill the subtasks delegated to them. Agent interactive behavior, e.g. messages passing, intermediate results exchanging and final results merging, must happen after the specified path is determined by executing given routing selection algorithm. Most of algorithms exploited currently run in time that grows approximately quadratic with the size of the input nodes where mobile agents migrate between. In order to gain enhanced communication performance by reducing the execution time of the decision algorithm, we propose an approach to reduce the number of nodes involved in the computation. In practice, hosts in the system are reorganized into groups in terms of the bandwidth between adjacent nodes. Then, we find an optimal node for each group with high bandwidth and powerful computing resources, which is managed by an agent dispatched by agent home node. With that, the communication pattern can be implemented at a higher level of abstraction and contribute to improving the overall performance of mobile agent based distributed spatial data mining applications.

  16. Algorithms that Defy the Gravity of Learning Curve

    Science.gov (United States)

    2017-04-28

    yield the best perform- ing 1NN ensembles There is no magic to the gravity-defiant algorithms such as aNNE and iNNE which mani- fest that small data...isolation using nearest neighbour en- semble. Proceedings of the 2014 IEEE international conference on data mining, work- shop on incremental

  17. A systematic review of data mining and machine learning for air pollution epidemiology.

    Science.gov (United States)

    Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

    2017-11-28

    Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air

  18. Mining Outlier Data in Mobile Internet-Based Large Real-Time Databases

    Directory of Open Access Journals (Sweden)

    Xin Liu

    2018-01-01

    Full Text Available Mining outlier data guarantees access security and data scheduling of parallel databases and maintains high-performance operation of real-time databases. Traditional mining methods generate abundant interference data with reduced accuracy, efficiency, and stability, causing severe deficiencies. This paper proposes a new mining outlier data method, which is used to analyze real-time data features, obtain magnitude spectra models of outlier data, establish a decisional-tree information chain transmission model for outlier data in mobile Internet, obtain the information flow of internal outlier data in the information chain of a large real-time database, and cluster data. Upon local characteristic time scale parameters of information flow, the phase position features of the outlier data before filtering are obtained; the decision-tree outlier-classification feature-filtering algorithm is adopted to acquire signals for analysis and instant amplitude and to achieve the phase-frequency characteristics of outlier data. Wavelet transform threshold denoising is combined with signal denoising to analyze data offset, to correct formed detection filter model, and to realize outlier data mining. The simulation suggests that the method detects the characteristic outlier data feature response distribution, reduces response time, iteration frequency, and mining error rate, improves mining adaptation and coverage, and shows good mining outcomes.

  19. Ian Taylor MBE MP Chairman Parliamentary and Scientific Committee, United Kingdom (second from left) with (from left to right) CMS Technical Coordinator A. Ball, CMS Spokesperson Tejinder (Jim) Virdee and Adviser to the Director-General J. Ellis on 2 November 2009.

    CERN Multimedia

    Maximilien Brice; CMS

    2009-01-01

    Ian Taylor MBE MP Chairman Parliamentary and Scientific Committee, United Kingdom (second from left) with (from left to right) CMS Technical Coordinator A. Ball, CMS Spokesperson Tejinder (Jim) Virdee and Adviser to the Director-General J. Ellis on 2 November 2009.

  20. A Clustering Approach Using Cooperative Artificial Bee Colony Algorithm

    Directory of Open Access Journals (Sweden)

    Wenping Zou

    2010-01-01

    Full Text Available Artificial Bee Colony (ABC is one of the most recently introduced algorithms based on the intelligent foraging behavior of a honey bee swarm. This paper presents an extended ABC algorithm, namely, the Cooperative Article Bee Colony (CABC, which significantly improves the original ABC in solving complex optimization problems. Clustering is a popular data analysis and data mining technique; therefore, the CABC could be used for solving clustering problems. In this work, first the CABC algorithm is used for optimizing six widely used benchmark functions and the comparative results produced by ABC, Particle Swarm Optimization (PSO, and its cooperative version (CPSO are studied. Second, the CABC algorithm is used for data clustering on several benchmark data sets. The performance of CABC algorithm is compared with PSO, CPSO, and ABC algorithms on clustering problems. The simulation results show that the proposed CABC outperforms the other three algorithms in terms of accuracy, robustness, and convergence speed.

  1. International mining forum 2004, new technologies in underground mining, safety in mines proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Jerzy Kicki; Eugeniusz Sobczyk (eds.)

    2004-01-15

    The book comprises technical papers that were presented at the International Mining Forum 2004. This event aims to bring together scientists and engineers in mining, rock mechanics, and computer engineering, with a view to explore and discuss international developments in the field. Topics discussed in this book are: trends in the mining industry; new solutions and tendencies in underground mines; rock engineering problems in underground mines; utilization and exploitation of methane; prevention measures for the control of rock bursts in Polish mines; and current problems in Ukrainian coal mines.

  2. Personnel Audit Using a Forensic Mining Technique

    OpenAIRE

    Adesesan B. Adeyemo; Oluwafemi Oriola

    2010-01-01

    This paper applies forensic data mining to determine the true status of employees and thereafter provide useful evidences for proper administration of administrative rules in a Typical Nigerian Teaching Service. The conventional technique of personnel audit was studied and a new technique for personnel audit was modeled using Artificial Neural Networks and Decision Tree algorithms. Atwo-layer classifier architecture was modeled. The outcome of the experiment proved that Radial Basis Function ...

  3. Application of XGBoost algorithm in hourly PM2.5 concentration prediction

    Science.gov (United States)

    Pan, Bingyue

    2018-02-01

    In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.

  4. Sustainable Mining Environment: Technical Review of Post-mining Plans

    Directory of Open Access Journals (Sweden)

    Restu Juniah

    2017-12-01

    Full Text Available The mining industry exists because humans need mining commodities to meet their daily needs such as motor vehicles, mobile phones, electronic equipment and others. Mining commodities as mentioned in Government Regulation No. 23 of 2010 on Implementation of Mineral and Coal Mining Business Activities are radioactive minerals, metal minerals, nonmetallic minerals, rocks and coal. Mineral and coal mining is conducted to obtain the mining commodities through production operations. Mining and coal mining companies have an obligation to ensure that the mining environment in particular after the post production operation or post mining continues. The survey research aims to examine technically the post-mining plan in coal mining of PT Samantaka Batubara in Indragiri Hulu Regency of Riau Province towards the sustainability of the mining environment. The results indicate that the post-mining plan of PT Samantaka Batubara has met the technical aspects required in post mining planning for a sustainable mining environment. Postponement of post-mining land of PT Samantaka Batubara for garden and forest zone. The results of this study are expected to be useful and can be used by stakeholders, academics, researchers, practitioners and associations of mining, and the environment.

  5. Classification Identification of Acoustic Emission Signals from Underground Metal Mine Rock by ICIMF Classifier

    Directory of Open Access Journals (Sweden)

    Hongyan Zuo

    2014-01-01

    Full Text Available To overcome the drawback that fuzzy classifier was sensitive to noises and outliers, Mamdani fuzzy classifier based on improved chaos immune algorithm was developed, in which bilateral Gaussian membership function parameters were set as constraint conditions and the indexes of fuzzy classification effectiveness and number of correct samples of fuzzy classification as the subgoal of fitness function. Moreover, Iris database was used for simulation experiment, classification, and recognition of acoustic emission signals and interference signals from stope wall rock of underground metal mines. The results showed that Mamdani fuzzy classifier based on improved chaos immune algorithm could effectively improve the prediction accuracy of classification of data sets with noises and outliers and the classification accuracy of acoustic emission signal and interference signal from stope wall rock of underground metal mines was 90.00%. It was obvious that the improved chaos immune Mamdani fuzzy (ICIMF classifier was useful for accurate diagnosis of acoustic emission signal and interference signal from stope wall rock of underground metal mines.

  6. Contract Mining versus Owner Mining

    African Journals Online (AJOL)

    Owner

    mining companies can concentrate on their core businesses while using specialists for ... 2 Definition of Contract and Owner. Mining ... equipment maintenance, scheduling and budgeting ..... No. Region. Amount Spent on. Contract Mining. ($ billion). Percent of. Total. 1 ... cost and productivity data based on a large range.

  7. Optimization of mining design of Hongwei uranium mine

    International Nuclear Information System (INIS)

    Wu Sanmao; Yuan Baixiang

    2012-01-01

    Combined with the mining conditions of Hongwei uranium mine, optimization schemes for hoisting cage, mine drainge,ore transport, mine wastewater treatment, power-supply system,etc are put forward in the mining design of the mine. Optimized effects are analyzed from the aspects of technique, economy, and energy saving and reducing emissions. (authors)

  8. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  9. The Application of Text Mining in Business Research

    DEFF Research Database (Denmark)

    Preuss, Bjørn

    2017-01-01

    The aim of this paper is to present a methodological concept in business research that has the potential to become one of the most powerful methods in the upcoming years when it comes to research qualitative phenomena in business and society. It presents a selection of algorithms as well elaborat...... on potential use cases for a text mining based approach to qualitative data analysis....

  10. A High-Order CFS Algorithm for Clustering Big Data

    Directory of Open Access Journals (Sweden)

    Fanyu Bu

    2016-01-01

    Full Text Available With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learning model, whose functionality rests on three pillars: (i an adaptive dropout deep learning model to learn features from each type of data, (ii a feature tensor model to capture the correlations of heterogeneous data, and (iii a tensor distance-based high-order CFS algorithm to cluster heterogeneous data. Furthermore, we verify our proposed algorithm on different datasets, by comparison with other two clustering schemes, that is, HOPCM and CFS. Results confirm the effectiveness of the proposed algorithm in clustering heterogeneous data.

  11. The improved Apriori algorithm based on matrix pruning and weight analysis

    Science.gov (United States)

    Lang, Zhenhong

    2018-04-01

    This paper uses the matrix compression algorithm and weight analysis algorithm for reference and proposes an improved matrix pruning and weight analysis Apriori algorithm. After the transactional database is scanned for only once, the algorithm will construct the boolean transaction matrix. Through the calculation of one figure in the rows and columns of the matrix, the infrequent item set is pruned, and a new candidate item set is formed. Then, the item's weight and the transaction's weight as well as the weight support for items are calculated, thus the frequent item sets are gained. The experimental result shows that the improved Apriori algorithm not only reduces the number of repeated scans of the database, but also improves the efficiency of data correlation mining.

  12. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  13. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    Science.gov (United States)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  14. Accurate Prediction of Coronary Artery Disease Using Bioinformatics Algorithms

    Directory of Open Access Journals (Sweden)

    Hajar Shafiee

    2016-06-01

    Full Text Available Background and Objectives: Cardiovascular disease is one of the main causes of death in developed and Third World countries. According to the statement of the World Health Organization, it is predicted that death due to heart disease will rise to 23 million by 2030. According to the latest statistics reported by Iran’s Minister of health, 3.39% of all deaths are attributed to cardiovascular diseases and 19.5% are related to myocardial infarction. The aim of this study was to predict coronary artery disease using data mining algorithms. Methods: In this study, various bioinformatics algorithms, such as decision trees, neural networks, support vector machines, clustering, etc., were used to predict coronary heart disease. The data used in this study was taken from several valid databases (including 14 data. Results: In this research, data mining techniques can be effectively used to diagnose different diseases, including coronary artery disease. Also, for the first time, a prediction system based on support vector machine with the best possible accuracy was introduced. Conclusion: The results showed that among the features, thallium scan variable is the most important feature in the diagnosis of heart disease. Designation of machine prediction models, such as support vector machine learning algorithm can differentiate between sick and healthy individuals with 100% accuracy.

  15. Event metadata records as a testbed for scalable data mining

    International Nuclear Information System (INIS)

    Gemmeren, P van; Malon, D

    2010-01-01

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  16. Prediction of buried mine-like target radar signatures using wideband electromagnetic modeling

    Energy Technology Data Exchange (ETDEWEB)

    Warrick, A.L.; Azevedo, S.G.; Mast, J.E.

    1998-04-06

    Current ground penetrating radars (GPR) have been tested for land mine detection, but they have generally been costly and have poor performance. Comprehensive modeling and experimentation must be done to predict the electromagnetic (EM) signatures of mines to access the effect of clutter on the EM signature of the mine, and to understand the merit and limitations of using radar for various mine detection scenarios. This modeling can provide a basis for advanced radar design and detection techniques leading to superior performance. Lawrence Livermore National Laboratory (LLNL) has developed a radar technology that when combined with comprehensive modeling and detection methodologies could be the basis of an advanced mine detection system. Micropower Impulse Radar (MIR) technology exhibits a combination of properties, including wideband operation, extremely low power consumption, extremely small size and low cost, array configurability, and noise encoded pulse generation. LLNL is in the process of developing an optimal processing algorithm to use with the MIR sensor. In this paper, we use classical numerical models to obtain the signature of mine-like targets and examine the effect of surface roughness on the reconstructed signals. These results are then qualitatively compared to experimental data.

  17. Implementation of Chaid Algorithm: A Hotel Case

    Directory of Open Access Journals (Sweden)

    Celal Hakan Kagnicioglu

    2016-01-01

    Full Text Available Today, companies are planning their own activities depending on efficiency and effectiveness. In order to have plans for the future activities they need historical data coming from outside and inside of the companies. However, this data is in huge amounts to understand easily. Since, this huge amount of data creates complexity in business for many industries like hospitality industry, reliable, accurate and fast access to this data is to be one of the greatest problems. Besides, management of this data is another big problem. In order to analyze this huge amount of data, Data Mining (DM tools, can be used effectively. In this study, after giving brief definition about fundamentals of data mining, Chi Squared Automatic Interaction Detection (CHAID algorithm, one of the mostly used DM tool, will be introduced. By CHAID algorithm, the most used materials in room cleaning process and the relations of these materials based on in a five star hotel data are tried to be determined. At the end of the analysis, it is seen that while some variables have strong relation with the number of rooms cleaned in the hotel, the others have no or weak relation.

  18. Implementation of Chaid Algorithm: A Hotel Case

    Directory of Open Access Journals (Sweden)

    Celal Hakan Kağnicioğlu

    2014-11-01

    Full Text Available Today, companies are planning their own activities depending on efficiency and effectiveness. In order to have plans for the future activities they need historical data coming from outside and inside of the companies. However, this data is in huge amounts to understand easily. Since, this huge amount of data creates complexity in business for many industries like hospitality industry, reliable, accurate and fast access to this data is to be one of the greatest problems. Besides, management of this data is another big problem. In order to analyze this huge amount of data, Data Mining (DM tools, can be used effectively. In this study, after giving brief definition about fundamentals of data mining, Chi Squared Automatic Interaction Detection (CHAID algorithm, one of the mostly used DM tool, will be introduced. By CHAID algorithm, the most used materials in room cleaning process and the relations of these materials based on in a five star hotel data are tried to be determined. At the end of the analysis, it is seen that while some variables have strong relation with the number of rooms cleaned in the hotel, the others have no or weak relation.

  19. Surface Mines, Other - Longwall Mining Panels

    Data.gov (United States)

    NSGIC Education | GIS Inventory — Coal mining has occurred in Pennsylvania for over a century. A method of coal mining known as Longwall Mining has become more prevalent in recent decades. Longwall...

  20. Application of Data Mining in Library-Based Personalized Learning

    Directory of Open Access Journals (Sweden)

    Lin Luo

    2017-12-01

    Full Text Available this paper expounds to mine up data with the DBSCAN algorithm in order to help teachers and students find which books they expect in the sea of library. In the first place, the model that DBSCAN algorithm applies in library data miner is proposed, followed by the DBSCAN algorithm improved on demands. In the end, an experiment is cited herein to validate this algorithm. The results show that the book price and the inventory level in the library produce a less impact on the resultant aggregation than the classification of books and the frequency of book borrowings. Library procurers should therefore purchase and subscribe data based on the results from cluster analysis thereby to improve hierarchies and structure distribution of library resources, forging on the library resources to be more scientific and reasonable, while it is also conducive to arousing readers' borrowing interest.

  1. Using text-mining techniques in electronic patient records to identify ADRs from medicine use

    DEFF Research Database (Denmark)

    Warrer, Pernille; Hansen, Ebba Holme; Jensen, Lars Juhl

    2012-01-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We...... included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs......, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text...

  2. Prediction of customer behaviour analysis using classification algorithms

    Science.gov (United States)

    Raju, Siva Subramanian; Dhandayudam, Prabha

    2018-04-01

    Customer Relationship management plays a crucial role in analyzing of customer behavior patterns and their values with an enterprise. Analyzing of customer data can be efficient performed using various data mining techniques, with the goal of developing business strategies and to enhance the business. In this paper, three classification models (NB, J48, and MLPNN) are studied and evaluated for our experimental purpose. The performance measures of the three classifications are compared using three different parameters (accuracy, sensitivity, specificity) and experimental results expose J48 algorithm has better accuracy with compare to NB and MLPNN algorithm.

  3. Application for trackless mining technique in Benxi uranium mine

    International Nuclear Information System (INIS)

    Chen Bingguo

    1998-01-01

    The author narrates the circumstances achieving constructional target in Benxi Uranium Mine under relying on advance of science and technology and adopting small trackless mining equipment, presents the application of trackless mining equipment at mining small mine and complex mineral deposit and discusses the unique superiority of trackless mining technique in development work, mining preparation work and backstoping

  4. Mining engineer requirements in a German coal mine

    Energy Technology Data Exchange (ETDEWEB)

    Rauhut, F J

    1985-10-01

    Basic developments in German coal mines, new definitions of working areas of mining engineers, and groups of requirements in education are discussed. These groups include: requirements of hard-coal mining at great depth and in extended collieries; application of process technology and information systems in semi-automated mines; thinking in processes and systems; organizational changes; future requirements of mining engineers; responsibility of the mining engineer for employees and society.

  5. Sentiment analysis enhancement with target variable in Kumar’s Algorithm

    Science.gov (United States)

    Arman, A. A.; Kawi, A. B.; Hurriyati, R.

    2016-04-01

    Sentiment analysis (also known as opinion mining) refers to the use of text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews discussion that is being talked in social media for many purposes, ranging from marketing, customer service, or public opinion of public policy. One of the popular algorithm for Sentiment Analysis implementation is Kumar algorithm that developed by Kumar and Sebastian. Kumar algorithm can identify the sentiment score of the statement, sentence or tweet, but cannot determine the relationship of the object or target related to the sentiment being analysed. This research proposed solution for that challenge by adding additional component that represent object or target to the existing algorithm (Kumar algorithm). The result of this research is a modified algorithm that can give sentiment score based on a given object or target.

  6. Data mining scenarios for the discovery of subtypes and the comparison of algorithms

    NARCIS (Netherlands)

    Colas, Fabrice Pierre Robert

    2009-01-01

    A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug

  7. Detecting Plastic PFM-1 Butterfly Mines Using Thermal Infrared Sensing

    Science.gov (United States)

    Baur, J.; de Smet, T.; Nikulin, A.

    2017-12-01

    Remnant plastic-composite landmines, such as the mass-produced PFM-1, represent an ongoing humanitarian threat aggravated by high costs associated with traditional demining efforts. These particular unexploded ordnance (UXO) devices pose a challenge to conventional geophysical detection methods, due their plastic-body design and small size. Additionally, the PFM-1s represent a particularly heinous UXO, due to their low mass ( 25 lb) trigger limit and "butterfly" wing design, earning them the reputation of a "toy mine" - disproportionally impacting children across post-conflict areas. We developed a detection algorithm based on data acquired by a thermal infrared camera mounted to a commercial UAV to detect time-variable temperature difference between the PFM-1 and the surrounding environment. We present results of a field study focused on thermal detection and identification of the PFM-1 anti-personnel landmines from a remotely operated unmanned aerial vehicle (UAV). We conducted a series of field detection experiments meant to simulate the mountainous terrains where PFM-1 mines were historically deployed and remain in place. In our tests, 18 inert PFM-1 mines along with the aluminum KSF-1 casing were randomly dispersed to mimic an ellipsoidal minefield of 8-10 x 18-20 m dimensions in a de-vegetated rubble yard at Chenango Valley State Park (New York State). We collected multiple thermal infrared imagery datasets focused on these model minefields with the FLIR Vue Pro R attached to the 3DR Solo UAV flying at approximately at 2 m. We identified different environmental variables to constrain the optimal time of day and daily temperature variations to reveal presence of these plastic UXOs. We show that in the early-morning hours when thermal inertia is greatest, the PFM-1 mines can be detected based on their differential thermal inertia. Because the mines have statistically different temperatures than background and a characteristic shape, we were able to train a

  8. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Tewfik Ahmed H

    2006-01-01

    Full Text Available Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

  9. Power to the People! Meta-algorithmic modelling in applied data science

    NARCIS (Netherlands)

    Spruit, M.; Jagesar, R.

    2016-01-01

    This position paper first defines the research field of applied data science at the intersection of domain expertise, data mining, and engineering capabilities, with particular attention to analytical applications. We then propose a meta-algorithmic approach for applied data science with societal

  10. Mining Together : Large-Scale Mining Meets Artisanal Mining, A Guide for Action

    OpenAIRE

    World Bank

    2009-01-01

    The present guide mining together-when large-scale mining meets artisanal mining is an important step to better understanding the conflict dynamics and underlying issues between large-scale and small-scale mining. This guide for action not only points to some of the challenges that both parties need to deal with in order to build a more constructive relationship, but most importantly it sh...

  11. Mining social networks and security informatics

    CERN Document Server

    Özyer, Tansel; Rokne, Jon; Khoury, Suheil

    2013-01-01

    Crime, terrorism and security are in the forefront of current societal concerns. This edited volume presents research based on social network techniques showing how data from crime and terror networks can be analyzed and how information can be extracted. The topics covered include crime data mining and visualization; organized crime detection; crime network visualization; computational criminology; aspects of terror network analyses and threat prediction including cyberterrorism and the related area of dark web; privacy issues in social networks; security informatics; graph algorithms for soci

  12. Text Mining Metal-Organic Framework Papers.

    Science.gov (United States)

    Park, Sanghoon; Kim, Baekjun; Choi, Sihoon; Boyd, Peter G; Smit, Berend; Kim, Jihan

    2018-02-26

    We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m 2 /g, cm 3 /g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.

  13. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  14. Effective approach toward Intrusion Detection System using data mining techniques

    Directory of Open Access Journals (Sweden)

    G.V. Nadiammai

    2014-03-01

    Full Text Available With the tremendous growth of the usage of computers over network and development in application running on various platform captures the attention toward network security. This paradigm exploits security vulnerabilities on all computer systems that are technically difficult and expensive to solve. Hence intrusion is used as a key to compromise the integrity, availability and confidentiality of a computer resource. The Intrusion Detection System (IDS plays a vital role in detecting anomalies and attacks in the network. In this work, data mining concept is integrated with an IDS to identify the relevant, hidden data of interest for the user effectively and with less execution time. Four issues such as Classification of Data, High Level of Human Interaction, Lack of Labeled Data, and Effectiveness of Distributed Denial of Service Attack are being solved using the proposed algorithms like EDADT algorithm, Hybrid IDS model, Semi-Supervised Approach and Varying HOPERAA Algorithm respectively. Our proposed algorithm has been tested using KDD Cup dataset. All the proposed algorithm shows better accuracy and reduced false alarm rate when compared with existing algorithms.

  15. Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

    NARCIS (Netherlands)

    He, Qiwei; Veldkamp, Bernard P.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful

  16. Patriachy, indoctrination and education. The therapeutic power of creation in women's literature. Two case studies: Carmen Mart??n Gaite's "The back room" (1978) and Ian McEwan's "Atonnement" (2001)

    OpenAIRE

    Garc??a Lara, Antonia

    2011-01-01

    My dissertation deals with the patriarchal education that indoctrinates women and forces them to be domestic angels and how this education provokes a trauma that they try to heal by creation in the form of writing. I will use two case studies to illustrate my point: Carmen Mart??n Gaite???s C., who is Mart??n Gaite???s alter ego in El Cuarto de Atr??s [The Back Room] (1978), and Briony Tallis, who is a fictional character in Ian McEwan???s Atonement (2001). Briony is actually the one who writ...

  17. Attributed community mining using joint general non-negative matrix factorization with graph Laplacian

    Science.gov (United States)

    Chen, Zigang; Li, Lixiang; Peng, Haipeng; Liu, Yuhong; Yang, Yixian

    2018-04-01

    Community mining for complex social networks with link and attribute information plays an important role according to different application needs. In this paper, based on our proposed general non-negative matrix factorization (GNMF) algorithm without dimension matching constraints in our previous work, we propose the joint GNMF with graph Laplacian (LJGNMF) to implement community mining of complex social networks with link and attribute information according to different application needs. Theoretical derivation result shows that the proposed LJGNMF is fully compatible with previous methods of integrating traditional NMF and symmetric NMF. In addition, experimental results show that the proposed LJGNMF can meet the needs of different community minings by adjusting its parameters, and the effect is better than traditional NMF in the community vertices attributes entropy.

  18. QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

    Science.gov (United States)

    Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890

  19. Booster fans : some considerations for their usage in underground coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Gillies, S.; Slaughter, C. [Missouri Univ. of Science and Technology, Rolla, MO (United States); Calizaya, F. [Utah Univ., Salt Lake City, UT (United States); Wu, H.W. [Gillies Wu Mining Technology Pty Ltd., Brisbane, QLD (Australia)

    2010-07-01

    This paper reported on a study that investigated the conditions under which booster fans can be used safely and efficiently in underground coal mines. Booster fans are installed in series with a main surface fan and are used to boost the air pressure of the ventilation air passing through it. Several coal mining countries use booster fans, but in the United States, they are only used in metal/non-metal mines due to concerns of uncontrolled recirculation. This study investigated installations of booster fans in non-US underground coal mines where safe and efficient atmospheric conditions are achieved. The purpose was to collect reliable information on airway resistances and flow requirements typical in large US coal mines. The study showed that safe booster fan installations are found in both high and low gas conditions, and sometimes where workings are located at great depths. The interlocking systems within the booster fan can control the underground fans and avoid recirculation when surface fans are unexpectedly turned off. Another purpose of the study was to determine when booster fans become a more viable solution in coal mines due to increases in air requirements at higher production rates. It was concluded that a new fan selection algorithm to produce recirculation-free ventilation designs will be developed to enable US coal mine operators to develop ventilation designs to extract coal seams from depths greater than 1000 m. 17 refs., 1 fig.

  20. Mining Branching Rules from Past Survey Data with an Illustration Using a Geriatric Assessment Survey for Older Adults with Cancer

    Directory of Open Access Journals (Sweden)

    Daniel R. Jeske

    2016-05-01

    Full Text Available We construct a fast data mining algorithm that can be used to identify high-frequency response patterns in historical surveys. Identification of these patterns leads to the derivation of question branching rules that shorten the time required to complete a survey. The data mining algorithm allows the user to control the error rate that is incurred through the use of implied answers that go along with each branching rule. The context considered is binary response questions, which can be obtained from multi-level response questions through dichotomization. The algorithm is illustrated by the analysis of four sections of a geriatric assessment survey used by oncologists. Reductions in the number of questions that need to be asked in these four sections range from 33% to 54%.

  1. Implementation of Paste Backfill Mining Technology in Chinese Coal Mines

    Science.gov (United States)

    Chang, Qingliang; Zhou, Huaqiang; Bai, Jianbiao

    2014-01-01

    Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology and its application are analyzed for paste backfill mining in Daizhuang Coal Mine; a practical implementation shows that paste backfill mining can improve the safety and excavation rate of coal mining, which can effectively resolve surface subsidence problems caused by underground mining activities, by utilizing solid waste such as coal gangues as a resource. Therefore, paste backfill mining is an effective clean coal mining technology, which has widespread application. PMID:25258737

  2. PRIVACY PRESERVING DATA MINING USING MULTIPLE OBJECTIVE OPTIMIZATION

    Directory of Open Access Journals (Sweden)

    V. Shyamala Susan

    2016-10-01

    Full Text Available Privacy preservation is that the most targeted issue in information publication, because the sensitive data shouldn't be leaked. For this sake, several privacy preservation data mining algorithms are proposed. In this work, feature selection using evolutionary algorithm and data masking coupled with slicing is treated as a multiple objective optimisation to preserve privacy. To start with, Genetic Algorithm (GA is carried out over the datasets to perceive the sensitive attributes and prioritise the attributes for treatment as per their determined sensitive level. In the next phase, to distort the data, noise is added to the higher level sensitive value using Hybrid Data Transformation (HDT method. In the following phase slicing algorithm groups the correlated attributes organized and by this means reduces the dimensionality by retaining the Advanced Clustering Algorithm (ACA. With the aim of getting the optimal dimensions of buckets, tuple segregating is accomplished by Metaheuristic Firefly Algorithm (MFA. The investigational consequences imply that the anticipated technique can reserve confidentiality and therefore the information utility is additionally high. Slicing algorithm allows the protection of association and usefulness in which effects in decreasing the information dimensionality and information loss. Performance analysis is created over OCC 7 and OCC 15 and our optimization method proves its effectiveness over two totally different datasets by showing 92.98% and 96.92% respectively.

  3. Evolutionary Data Mining Approach to Creating Digital Logic

    Science.gov (United States)

    2010-01-01

    To deal with this problem a genetic program (GP) based data mining ( DM ) procedure has been invented (Smith 2005). A genetic program is an algorithm...that can operate on the variables. When a GP was used as a DM function in the past to automatically create fuzzy decision trees, the Report...rules represents an approach to the determining the effect of linguistic imprecision, i.e., the inability of experts to provide crisp rules. The

  4. Robust MST-Based Clustering Algorithm.

    Science.gov (United States)

    Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

    2018-06-01

    Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

  5. 30 CFR 819.21 - Auger mining: Protection of underground mining.

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Auger mining: Protection of underground mining. 819.21 Section 819.21 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT... STANDARDS-AUGER MINING § 819.21 Auger mining: Protection of underground mining. Auger holes shall not extend...

  6. Application of data mining techniques for nuclear data and instrumentation

    International Nuclear Information System (INIS)

    Toshniwal, Durga

    2013-01-01

    Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly

  7. Contract Mining versus Owner Mining – The Way Forward | Suglo ...

    African Journals Online (AJOL)

    Ghana Mining Journal ... By contracting out one or more of their mining operations, the mining companies can concentrate on their core businesses. This paper reviews ... The general trends in the mining industry show that contract mining will be the way forward for most mines under various circumstances in the future.

  8. Data clustering algorithms and applications

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea

  9. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  10. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  11. Selection of mining method for No.3 uranium ore body in the independent mining area at a uranium mine

    International Nuclear Information System (INIS)

    Ding Fulong; Ding Dexin; Ye Yongjun

    2010-01-01

    Mining operation in the existed mining area at a uranium mine is near completion and it is necessary to mine the No.3 uranium ore body in another mining area at the mine. This paper, based on the geological conditions, used analogical method for analyzing the feasible methods and the low cost and high efficiency mining method was suggested for the No.3 ore body in the independent mining area at the uranium mine. (authors)

  12. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

    Directory of Open Access Journals (Sweden)

    Yun Xue

    2015-01-01

    Full Text Available Order-preserving submatrices (OPSMs have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  13. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    Science.gov (United States)

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  14. Study and application of data mining and data warehouse in CIMS

    Science.gov (United States)

    Zhou, Lijuan; Liu, Chi; Liu, Daxin

    2003-03-01

    The interest in analyzing data has grown tremendously in recent years. To analyze data, a multitude of technologies is need, namely technologies from the fields of Data Warehouse, Data Mining, On-line Analytical Processing (OLAP). This paper gives a new architecture of data warehouse in CIMS according to CRGC-CIMS application engineering. The data source of this architecture comes from database of CRGC-CIMS system. The data is put in global data set by extracting, filtrating and integrating, and then the data is translated to data warehouse according information request. We have addressed two advantages of the new model in CRGC-CIMS application. In addition, a Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support, OLAP queries or data mining. It is important to select the right view to materialize that answer a given set of queries. In this paper, we also have designed algorithms for selecting a set of views to be materialized in a data warehouse in order to answer the most queries under the constraint of given space. First, we give a cost model for selecting materialized views. Then we give the algorithms that adopt gradually recursive method from bottom to top. We give description and realization of algorithms. Finally, we discuss the advantage and shortcoming of our approach and future work.

  15. A Multi-Agent Framework for Anomalies Detection on Distributed Firewalls Using Data Mining Techniques

    Science.gov (United States)

    Karoui, Kamel; Ftima, Fakher Ben; Ghezala, Henda Ben

    The Agents and Data Mining integration has emerged as a promising area for disributed problems solving. Applying this integration on distributed firewalls will facilitate the anomalies detection process. In this chapter, we present a set of algorithms and mining techniques to analyse, manage and detect anomalies on distributed firewalls' policy rules using the multi-agent approach; first, for each firewall, a static agent will execute a set of data mining techniques to generate a new set of efficient firewall policy rules. Then, a mobile agent will exploit these sets of optimized rules to detect eventual anomalies on a specific firewall (intra-firewalls anomalies) or between firewalls (inter-firewalls anomalies). An experimental case study will be presented to demonstrate the usefulness of our approach.

  16. CUDA-accelerated genetic feedforward-ANN training for data mining

    International Nuclear Information System (INIS)

    Patulea, Catalin; Peace, Robert; Green, James

    2010-01-01

    We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.

  17. CUDA-accelerated genetic feedforward-ANN training for data mining

    Energy Technology Data Exchange (ETDEWEB)

    Patulea, Catalin; Peace, Robert; Green, James, E-mail: cpatulea@sce.carleton.ca, E-mail: rpeace@sce.carleton.ca, E-mail: jrgreen@sce.carleton.ca [School of Systems and Computer Engineering, Carleton University, Ottawa, K1S 5B6 (Canada)

    2010-11-01

    We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.

  18. Clustering performance comparison using K-means and expectation maximization algorithms.

    Science.gov (United States)

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  19. SOMA: A Proposed Framework for Trend Mining in Large UK Diabetic Retinopathy Temporal Databases

    Science.gov (United States)

    Somaraki, Vassiliki; Harding, Simon; Broadbent, Deborah; Coenen, Frans

    In this paper, we present SOMA, a new trend mining framework; and Aretaeus, the associated trend mining algorithm. The proposed framework is able to detect different kinds of trends within longitudinal datasets. The prototype trends are defined mathematically so that they can be mapped onto the temporal patterns. Trends are defined and generated in terms of the frequency of occurrence of pattern changes over time. To evaluate the proposed framework the process was applied to a large collection of medical records, forming part of the diabetic retinopathy screening programme at the Royal Liverpool University Hospital.

  20. Combining complex networks and data mining: Why and how

    Science.gov (United States)

    Zanin, M.; Papo, D.; Sousa, P. A.; Menasalvas, E.; Nicchi, A.; Kubik, E.; Boccaletti, S.

    2016-05-01

    The increasing power of computer technology does not dispense with the need to extract meaningful information out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.

  1. Treatment of mine-water from decommissioning uranium mines

    International Nuclear Information System (INIS)

    Fan Quanhui

    2002-01-01

    Treatment methods for mine-water from decommissioning uranium mines are introduced and classified. The suggestions on optimal treatment methods are presented as a matter of experience with decommissioned Chenzhou Uranium Mine

  2. Data Mining the Internet Archive Collection

    Directory of Open Access Journals (Sweden)

    Caleb McDaniel

    2014-03-01

    Full Text Available The collections of the Internet Archive (IA include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.” In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

  3. DATA MINING IN SPORTS BETTING

    Directory of Open Access Journals (Sweden)

    Cristian Georgescu

    2013-12-01

    Full Text Available n this paper, we have made a brief analysis on how to make decisions in betting on European football with the help of data mining techniques. Whether you refer to betting a few days in advance of the sporting event or live betting, both options have been taken into consideration. By using a clustering algorithm for analyzing both the database containing events from football matches and the odds given by bookmakers, we have obtained graphs indicating the probabilities associated with analyzed events. Given the purely informative aspect of the current paper, we have only analyzed the number of corners from a match.

  4. Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree

    Science.gov (United States)

    Wahyuni, Sri

    2018-03-01

    Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.

  5. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    Science.gov (United States)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  6. DATA MINING IN EDUCATION: CURRENT STATE AND PERSPECTIVES OF DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    Yurii O. Kovalchuk

    2016-01-01

    Full Text Available The main tasks (classification and regression, association rules, clustering and the basic principles of the Data Mining algorithms in the context of their use for a variety of research in the field of education which are the subject of a relatively new independent direction Educational Data Mining are considered. The findings about the most popular topics of research within this area as well as the perspectives of its development are presented. Presentation of the material is illustrated by simple examples. This article is intended for readers who are engaged in research in the field of education at various levels, especially those involved in the use of e-learning systems, but little familiar with this area of data analysis.

  7. Using ontology network structure in text mining.

    Science.gov (United States)

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  8. 30 CFR 77.1712 - Reopening mines; notification; inspection prior to mining.

    Science.gov (United States)

    2010-07-01

    ... to mining. 77.1712 Section 77.1712 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION... prior to mining. Prior to reopening any surface coal mine after it has been abandoned or declared... an authorized representative of the Secretary before any mining operations in such mine are...

  9. Multiobjective Optimization of Irreversible Thermal Engine Using Mutable Smart Bee Algorithm

    Directory of Open Access Journals (Sweden)

    M. Gorji-Bandpy

    2012-01-01

    Full Text Available A new method called mutable smart bee (MSB algorithm proposed for cooperative optimizing of the maximum power output (MPO and minimum entropy generation (MEG of an Atkinson cycle as a multiobjective, multi-modal mechanical problem. This method utilizes mutable smart bee instead of classical bees. The results have been checked with some of the most common optimizing algorithms like Karaboga’s original artificial bee colony, bees algorithm (BA, improved particle swarm optimization (IPSO, Lukasik firefly algorithm (LFFA, and self-adaptive penalty function genetic algorithm (SAPF-GA. According to obtained results, it can be concluded that Mutable Smart Bee (MSB is capable to maintain its historical memory for the location and quality of food sources and also a little chance of mutation is considered for this bee. These features were found as strong elements for mining data in constraint areas and the results will prove this claim.

  10. Mining Building Metadata by Data Stream Comparison

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  11. Acid mine drainage: mining and water pollution issues in British Columbia

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1998-12-31

    The importance of protecting water quality and some of the problems associated with mineral development are described. Negative impacts of mining operations such as sedimentation, water disturbances, and water pollution from waste rock and tailings are considered. Mining wastes, types of water pollution from mining, the legacy of acid mine drainage, predicting acid mine drainage, preventing and mitigating acid mine drainage, examples from the past, and cyanide heap-leaching are discussed. The real costs of mining at the Telkwa open pit coal mine are assessed. British Columbia mines that are known for or are potentially acid generating are shown on a map. 32 refs., 10 figs.

  12. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    Science.gov (United States)

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.

  13. Responsible Mining: A Human Resources Strategy for Mine Development Project

    OpenAIRE

    Sampathkumar, Sriram (Ram)

    2012-01-01

    Mining is a global industry. Most mining companies operate internationally, often in remote, challenging environments and consequently frequently have respond to unusual and demanding Human Resource (HR) requirements. It is my opinion that the strategic imperative behind success in mining industry is responsible mining. The purpose of this paper is to examine how an effective HR strategy can be a competitive advantage that contributes to the success of a mining project in the global mining in...

  14. Using remote sensing imagery to monitoring sea surface pollution cause by abandoned gold-copper mine

    Science.gov (United States)

    Kao, H. M.; Ren, H.; Lee, Y. T.

    2010-08-01

    The Chinkuashih Benshen mine was the largest gold-copper mine in Taiwan before the owner had abandoned the mine in 1987. However, even the mine had been closed, the mineral still interacts with rain and underground water and flowed into the sea. The polluted sea surface had appeared yellow, green and even white color, and the pollutants had carried by the coast current. In this study, we used the optical satellite images to monitoring the sea surface. Several image processing algorithms are employed especial the subpixel technique and linear mixture model to estimate the concentration of pollutants. The change detection approach is also applied to track them. We also conduct the chemical analysis of the polluted water to provide the ground truth validation. By the correlation analysis between the satellite observation and the ground truth chemical analysis, an effective approach to monitoring water pollution could be established.

  15. Implementation of a Multi-Robot Coverage Algorithm on a Two-Dimensional, Grid-Based Environment

    Science.gov (United States)

    2017-06-01

    different approaches is based on “ behavior -based” versus “system theory based” approaches to the problem. 1. Behavior –Based Approach Most behavior -based...otherwise, it performs the behavior of IG. This kind of algorithm could be classified as a system theory based approach since the change in the...systems, robot agents are likely to take over mine countermeasure (MCM) missions one day. The path planning coverage algorithm is an essential topic for

  16. Variants of Evolutionary Algorithms for Real-World Applications

    CERN Document Server

    Weise, Thomas; Michalewicz, Zbigniew

    2012-01-01

    Evolutionary Algorithms (EAs) are population-based, stochastic search algorithms that mimic natural evolution. Due to their ability to find excellent solutions for conventionally hard and dynamic problems within acceptable time, EAs have attracted interest from many researchers and practitioners in recent years. This book “Variants of Evolutionary Algorithms for Real-World Applications” aims to promote the practitioner’s view on EAs by providing a comprehensive discussion of how EAs can be adapted to the requirements of various applications in the real-world domains. It comprises 14 chapters, including an introductory chapter re-visiting the fundamental question of what an EA is and other chapters addressing a range of real-world problems such as production process planning, inventory system and supply chain network optimisation, task-based jobs assignment, planning for CNC-based work piece construction, mechanical/ship design tasks that involve runtime-intense simulations, data mining for the predictio...

  17. Study on the Detection of Moving Target in the Mining Method Based on Hybrid Algorithm for Sports Video Analysis

    Directory of Open Access Journals (Sweden)

    Huang Tian

    2014-10-01

    Full Text Available Moving object detection and tracking is the computer vision and image processing is a hot research direction, based on the analysis of the moving target detection and tracking algorithm in common use, focus on the sports video target tracking non rigid body. In sports video, non rigid athletes often have physical deformation in the process of movement, and may be associated with the occurrence of moving target under cover. Media data is surging to fast search and query causes more difficulties in data. However, the majority of users want to be able to quickly from the multimedia data to extract the interested content and implicit knowledge (concepts, rules, rules, models and correlation, retrieval and query quickly to take advantage of them, but also can provide the decision support problem solving hierarchy. Based on the motion in sport video object as the object of study, conducts the system research from the theoretical level and technical framework and so on, from the layer by layer mining between low level motion features to high-level semantic motion video, not only provides support for users to find information quickly, but also can provide decision support for the user to solve the problem.

  18. Extending mine life

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Mine layouts, new machines and techniques, research into problem areas of ground control and so on, are highlighted in this report on extending mine life. The main resources taken into account are coal mining, uranium mining, molybdenum and gold mining

  19. SegMine workflows for semantic microarray data analysis in Orange4WS

    Directory of Open Access Journals (Sweden)

    Kulovesi Kimmo

    2011-10-01

    Full Text Available Abstract Background In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, knowledge discovery from diverse distributed data and knowledge sources (such as GO, KEGG, PubMed, and experimental databases. Specifically, cutting-edge data analysis approaches, such as semantic data mining, link discovery, and visualization, have not yet been made available to researchers investigating complex biological datasets. Results We present a new methodology, SegMine, for semantic analysis of microarray data by exploiting general biological knowledge, and a new workflow environment, Orange4WS, with integrated support for web services in which the SegMine methodology is implemented. The SegMine methodology consists of two main steps. First, the semantic subgroup discovery algorithm is used to construct elaborate rules that identify enriched gene sets. Then, a link discovery service is used for the creation and visualization of new biological hypotheses. The utility of SegMine, implemented as a set of workflows in Orange4WS, is demonstrated in two microarray data analysis applications. In the analysis of senescence in human stem cells, the use of SegMine resulted in three novel research hypotheses that could improve understanding of the underlying mechanisms of senescence and identification of candidate marker genes. Conclusions Compared to the available data analysis systems, SegMine offers improved hypothesis generation and data interpretation for bioinformatics in an easy-to-use integrated workflow environment.

  20. MONITORING OF MINING

    Directory of Open Access Journals (Sweden)

    Berislav Šebečić

    1996-12-01

    Full Text Available The way mining was monitored in the past depended on knowledge, interest and the existing legal regulations. Documentary evidence about this work can be found in archives, libraries and museums. In particular, there is the rich archival material (papers and books concerning the work of the one-time Imperial and Royal Mining Captaincies in Zagreb, Zadar, Klagenfurt and Split, A minor part of the documentation has not yet been transferred to Croatia. From mining handbooks and books we can also find out about mining in Croatia. In the context of Austro-Hungary. For example, we can find out that the first governorships in Zagreb and Zadar headed the Ban, Count Jelacic and Baron Mamula were also the top mining authorities, though this, probably from political motives, was suppressed in the guides and inventories or the Mining Captaincies. At the end of the 1850s, Croatia produced 92-94% of sea salt, up to 8.5% of sulphur, 19.5% of asphalt and 100% of oil for the Austro-Hungarian empire. From data about mining in the Split Mining Captaincy, prepared for the Philadephia Exhibition, it can be seen that in the exploratory mining operations in which there were 33,372 independent mines declared in 1925 they were looking mainly for bauxite (60,0%, then dark coal (19,0%, asphalts (10.3% and lignites (62%. In 1931, within the area covered by the same captaincy, of 74 declared mines, only 9 were working. There were five coal mines, three bauxite mines and one for asphalt. I suggest that within state institution, the Mining Captaincy or Authority be renewed, or that a Mining and Geological Authority be set ap, which would lead to the more complete affirmation of Croatian mining (the paper is published in Croatian.

  1. A New Feedback-Analysis based Reputation Algorithm for E-Commerce Communities

    Directory of Open Access Journals (Sweden)

    Hasnae Rahimi

    2014-12-01

    Full Text Available Dealing with the ever-growing content generated by users in the e-commerce applications, Trust Reputation Systems (TRS are widely used online to provide the trust reputation of each product using the customers’ ratings. However, there is also a good number of online customer reviews and feedback that must be used by the TRS. As a result, we propose in this work a new architecture for TRS in e-commerce application which includes feedback’ mining in order to calculate reputation scores. This architecture is based on an intelligent layer that proposes to each user (i.e. “feedback provider” who has already given his recommendation, a collection of prefabricated feedback to like or dislike. Then the proposed reputation algorithm calculates the trust degree of the user, the feedback’s trustworthiness and generates the global reputation score of the product according to his ‘likes’ and ‘dislikes’. In this work, we present also a state of the art of text mining tools and algorithms that can be used to generate the prefabricated feedback and to classify them into different categories.

  2. Issues of Exploitation of Induction Motors in the Course of Underground Mining Operations

    Science.gov (United States)

    Gumula, Stanisław; Hudy, Wiktor; Piaskowska-Silarska, Malgorzata; Pytel, Krzysztof

    2017-09-01

    Mining industry is one of the most important customers of electric motors. The most commonly used in the contemporary mining industry is alternating current machines used for processing electrical energy into mechanical energy. The operating problems and the influence of qualitative interference acting on the inputs of individual regulators to field-oriented system in the course of underground mining operations has been presented in the publication. The object of controlling the speed is a slip-ring induction motor. Settings of regulators were calculated using an evolutionary algorithm. Examination of system dynamics was performed by a computer with the use of the MATLAB / Simulink software. According to analyzes, large distortion of input signals of regulators adversely affects the rotational speed that pursued by the control system, which may cause a large vibration of the whole system and, consequently, its much faster destruction. Designed system is characterized by a significantly better resistance to interference. The system is stable with the properly selected settings of regulators, which is particularly important during the operation of machinery used in underground mining.

  3. Fast Ss-Ilm a Computationally Efficient Algorithm to Discover Socially Important Locations

    Science.gov (United States)

    Dokuz, A. S.; Celik, M.

    2017-11-01

    Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.

  4. FAST SS-ILM: A COMPUTATIONALLY EFFICIENT ALGORITHM TO DISCOVER SOCIALLY IMPORTANT LOCATIONS

    Directory of Open Access Journals (Sweden)

    A. S. Dokuz

    2017-11-01

    Full Text Available Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.

  5. Gold-Mining

    DEFF Research Database (Denmark)

    Raaballe, J.; Grundy, B.D.

    2002-01-01

      Based on standard option pricing arguments and assumptions (including no convenience yield and sustainable property rights), we will not observe operating gold mines. We find that asymmetric information on the reserves in the gold mine is a necessary and sufficient condition for the existence...... of operating gold mines. Asymmetric information on the reserves in the mine implies that, at a high enough price of gold, the manager of high type finds the extraction value of the company to be higher than the current market value of the non-operating gold mine. Due to this under valuation the maxim of market...

  6. The mining methods at the Fraisse mine

    International Nuclear Information System (INIS)

    Heurley, P.; Vervialle, J.P.

    1985-01-01

    The Fraisse mine is one of the four underground mines of the La Crouzille mining divisions of Cogema. Faced with the necessity to mechanize its workings, this mine also had to satisfy a certain number of stringent demands. This has led to concept of four different mining methods for the four workings at present in active operation at this pit, which nevertheless preserve the basic ideas of the methods of top slicing under concrete slabs (TSS) or horizontal cut-and-fill stopes (CFS). An electric scooptram is utilized. With this type of vehicle the stringent demands for the introduction of means for fire fighting and prevention are reduced to a minimum. Finally, the dimensions of the vehicles and the operation of these methods result in a net-to-gross tonnages of close to 1, i.e. a maximum output, combined with a minimum of contamination [fr

  7. Opinion mining on book review using CNN-L2-SVM algorithm

    Science.gov (United States)

    Rozi, M. F.; Mukhlash, I.; Soetrisno; Kimura, M.

    2018-03-01

    Review of a product can represent quality of a product itself. An extraction to that review can be used to know sentiment of that opinion. Process to extract useful information of user review is called Opinion Mining. Review extraction model that is enhancing nowadays is Deep Learning model. This Model has been used by many researchers to obtain excellent performance on Natural Language Processing. In this research, one of deep learning model, Convolutional Neural Network (CNN) is used for feature extraction and L2 Support Vector Machine (SVM) as classifier. These methods are implemented to know the sentiment of book review data. The result of this method shows state-of-the art performance in 83.23% for training phase and 64.6% for testing phase.

  8. Application of data mining techniques to explore predictors of HCC in Egyptian patients with HCV-related chronic liver disease.

    Science.gov (United States)

    Omran, Dalia Abd El Hamid; Awad, AbuBakr Hussein; Mabrouk, Mahasen Abd El Rahman; Soliman, Ahmad Fouad; Aziz, Ashraf Omar Abdel

    2015-01-01

    Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts. This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC. The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC. Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.

  9. Parameters of Solidifying Mixtures Transporting at Underground Ore Mining

    Directory of Open Access Journals (Sweden)

    Golik Vladimir

    2017-01-01

    Full Text Available The article is devoted to the problem of providing mining enterprises with solidifying filling mixtures at underground mining. The results of analytical studies using the data of foreign and domestic practice of solidifying mixtures delivery to stopes are given. On the basis of experimental practice the parameters of transportation of solidifying filling mixtures are given with an increase in their quality due to the effect of vibration in the pipeline. The mechanism of the delivery process and the procedure for determining the parameters of the forced oscillations of the pipeline, the characteristics of the transporting processes, the rigidity of the elastic elements of pipeline section supports and the magnitude of vibrator’ driving force are detailed. It is determined that the quality of solidifying filling mixtures can be increased due to the rational use of technical resources during the transportation of mixtures, and as a result the mixtures are characterized by a more even distribution of the aggregate. The algorithm for calculating the parameters of the pipe vibro-transport of solidifying filling mixtures can be in demand in the design of mineral deposits underground mining technology.

  10. Parameters of Solidifying Mixtures Transporting at Underground Ore Mining

    Science.gov (United States)

    Golik, Vladimir; Dmitrak, Yury

    2017-11-01

    The article is devoted to the problem of providing mining enterprises with solidifying filling mixtures at underground mining. The results of analytical studies using the data of foreign and domestic practice of solidifying mixtures delivery to stopes are given. On the basis of experimental practice the parameters of transportation of solidifying filling mixtures are given with an increase in their quality due to the effect of vibration in the pipeline. The mechanism of the delivery process and the procedure for determining the parameters of the forced oscillations of the pipeline, the characteristics of the transporting processes, the rigidity of the elastic elements of pipeline section supports and the magnitude of vibrator' driving force are detailed. It is determined that the quality of solidifying filling mixtures can be increased due to the rational use of technical resources during the transportation of mixtures, and as a result the mixtures are characterized by a more even distribution of the aggregate. The algorithm for calculating the parameters of the pipe vibro-transport of solidifying filling mixtures can be in demand in the design of mineral deposits underground mining technology.

  11. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    Science.gov (United States)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  12. Mined-out land

    International Nuclear Information System (INIS)

    Reinsalu, Enno; Toomik, Arvi; Valgma, Ingo

    2002-01-01

    Estonian mineral resources are deposited in low depth and mining fields are large, therefore vast areas are affected by mining. There are at least 800 deposits with total area of 6,000 km 2 and about the same number of underground mines, surface mines, peat fields, quarries, and sand and gravel pits. The deposits cover more than 10% of Estonian mainland. The total area of operating mine claims exceeds 150 km 2 that makes 0.3 % of Estonian area. The book is written mainly for the people who are living or acting in the area influenced by mining. The observations and research could benefit those who are interested in geography and environment, who follow formation and look of mined-out landscapes. The book contains also warnings for careless people on and under the surface of the mined-out land. Part of the book contains results of the research made in 1968-1993 by the first two authors working at the Estonian branch of A.Skochinsky Institute of Mining. Since 1990, Arvi Toomik continued this study at the Northeastern section of the Institute of Ecology of Tallinn Pedagogical University. Enno Reinsalu studied aftereffects of mining at the Mining Department of Tallinn Technical University from 1998 to 2000. Geographical Information System for Mining was studied by Ingo Valgma within his doctoral dissertation, and this book is one of the applications of his study

  13. Trust Mines

    Science.gov (United States)

    The United States and the Navajo Nation entered into settlement agreements that provide funds to conduct investigations and any needed cleanup at 16 of the 46 priority mines, including six mines in the Northern Abandoned Uranium Mine Region.

  14. Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov

    Directory of Open Access Journals (Sweden)

    Eric Wen Su

    2017-03-01

    Full Text Available Drug repositioning (i.e., drug repurposing is the process of discovering new uses for marketed drugs. Historically, such discoveries were serendipitous. However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically identify drugs with the potential to be repurposed. Described here is a novel method of drug repositioning by mining ClinicalTrials.gov. The text mining tools I2E (Linguamatics and PolyAnalyst (Megaputer were utilized. An I2E query extracts “Serious Adverse Events” (SAE data from randomized trials in ClinicalTrials.gov. Through a statistical algorithm, a PolyAnalyst workflow ranks the drugs where the treatment arm has fewer predefined SAEs than the control arm, indicating that potentially the drug is reducing the level of SAE. Hypotheses could then be generated for the new use of these drugs based on the predefined SAE that is indicative of disease (for example, cancer.

  15. Web Mining

    Science.gov (United States)

    Fürnkranz, Johannes

    The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

  16. Mining with communities

    International Nuclear Information System (INIS)

    Veiga, Marcello M.; Scoble, Malcolm; McAllister, Mary Louise

    2001-01-01

    To be considered as sustainable, a mining community needs to adhere to the principles of ecological sustainability, economic vitality and social equity. These principles apply over a long time span, covering both the life of the mine and post-mining closure. The legacy left by a mine to the community after its closure is emerging as a significant aspect of its planning. Progress towards sustainability is made when value is added to a community with respect to these principles by the mining operation during its life cycle. This article presents a series of cases to demonstrate the diverse potential challenges to achieving a sustainable mining community. These case studies of both new and old mining communities are drawn mainly from Canada and from locations abroad where Canadian companies are now building mines. The article concludes by considering various approaches that can foster sustainable mining communities and the role of community consultation and capacity building. (author)

  17. Ideate about building green mine of uranium mining and metallurgy

    International Nuclear Information System (INIS)

    Shi Zuyuan

    2012-01-01

    Analysing the current situation of uranium mining and metallurgy; Setting up goals for green uranium mining and metallurgy, its fundamental conditions, Contents and measures. Putting forward an idea to combine green uranium mining and metallurgy with the state target for green mining, and keeping its own characteristics. (author)

  18. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  19. Mining and mining authorities in Saarland 2016. Mining economy, mining technology, occupational safety, environmental protection, statistics, mining authority activities. Annual report; Bergbau und Bergbehoerden im Saarland 2016. Bergwirtschaft, Bergtechnik, Arbeitsschutz, Umweltschutz, Statistiken, Taetigkeiten der Bergbehoerden. Jahresbericht

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2016-07-01

    The annual report of the Saarland Upper Mining Authority provides an insight into the activities of mining authorities. Especially, the development of the black coal mining, safety and technology of mining as well as the correlation between mining and environment are stressed.

  20. Using Decision Trees in Data Mining for Predicting Factors Influencing of Heart Disease

    Directory of Open Access Journals (Sweden)

    Moloud Abdar

    2015-12-01

    Full Text Available Statistics from the World Health Organization (WHO shows that heart disease is one of the leading causes of mortality all over the world. Because of the importance of heart disease, in recent years, many studies have been conducted on this disease using data mining. The main objective of this study is to find a better decision tree algorithm and then use the algorithm for extracting rules in predicting heart disease. Cleveland data, including 303 records are used for this study. These data include 13 features and we have categorized them into five classes. In this paper, C5.0 algorithm with a accuracy value of 85.33% has a better performance compared to the rest of the algorithms used in this study. Considering the rules created by this algorithm, the attributes of Trestbps, Restecg, Thalach, Slope, Oldpeak, and CP were extracted as the most influential causes in predicting heart disease.

  1. Hybrid Type II fuzzy system & data mining approach for surface finish

    Directory of Open Access Journals (Sweden)

    Tzu-Liang (Bill Tseng

    2015-07-01

    Full Text Available In this study, a new methodology in predicting a system output has been investigated by applying a data mining technique and a hybrid type II fuzzy system in CNC turning operations. The purpose was to generate a supplemental control function under the dynamic machining environment, where unforeseeable changes may occur frequently. Two different types of membership functions were developed for the fuzzy logic systems and also by combining the two types, a hybrid system was generated. Genetic algorithm was used for fuzzy adaptation in the control system. Fuzzy rules are automatically modified in the process of genetic algorithm training. The computational results showed that the hybrid system with a genetic adaptation generated a far better accuracy. The hybrid fuzzy system with genetic algorithm training demonstrated more effective prediction capability and a strong potential for the implementation into existing control functions.

  2. 75 FR 17529 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines

    Science.gov (United States)

    2010-04-06

    ... High-Voltage Continuous Mining Machine Standard for Underground Coal Mines AGENCY: Mine Safety and... of high-voltage continuous mining machines in underground coal mines. It also revises MSHA's design...-- Underground Coal Mines III. Section-by-Section Analysis A. Part 18--Electric Motor-Driven Mine Equipment and...

  3. A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques

    Science.gov (United States)

    Uswatun Khasanah, Annisa; Harwati

    2017-06-01

    Student’s performance prediction is essential to be conducted for a university to prevent student fail. Number of student drop out is one of parameter that can be used to measure student performance and one important point that must be evaluated in Indonesia university accreditation. Data Mining has been widely used to predict student’s performance, and data mining that applied in this field usually called as Educational Data Mining. This study conducted Feature Selection to select high influence attributes with student performance in Department of Industrial Engineering Universitas Islam Indonesia. Then, two popular classification algorithm, Bayesian Network and Decision Tree, were implemented and compared to know the best prediction result. The outcome showed that student’s attendance and GPA in the first semester were in the top rank from all Feature Selection methods, and Bayesian Network is outperforming Decision Tree since it has higher accuracy rate.

  4. SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM

    Directory of Open Access Journals (Sweden)

    S. Khoshahval

    2017-09-01

    Full Text Available Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user’s visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users’ behaviour in a system and can be utilized in various location-based applications.

  5. Mine water treatment

    Energy Technology Data Exchange (ETDEWEB)

    Komissarov, S V

    1980-10-01

    This article discusses composition of chemical compounds dissolved or suspended in mine waters in various coal basins of the USSR: Moscow basin, Kuzbass, Pechora, Kizelovsk, Karaganda, Donetsk and Chelyabinsk basins. Percentage of suspended materials in water depending on water source (water from water drainage system of dust suppression system) is evaluated. Pollution of mine waters with oils and coli bacteria is also described. Recommendations on construction, capacity of water settling tanks, and methods of mine water treatment are presented. In mines where coal seams 2 m or thicker are mined a system of two settling tanks should be used: in the upper one large grains are settled, in the lower one finer grains. The upper tank should be large enough to store mine water discharged during one month, and the lower one to store water discharged over two months. Salty waters from coal mines mining thin coal seams should be treated in a system of water reservoirs from which water evaporates (if climatic conditions permit). Mine waters from mines with thin coal seams but without high salt content can be treated in a system of long channels with water plants, which increase amount of oxygen in treated water. System of biological treatment of waste waters from mine wash-houses and baths is also described. Influence of temperature, sunshine and season of the year on efficiency of mine water treatment is also assessed. (In Russian)

  6. Mining the protein data bank to differentiate error from structural variation in clustered static structures: an examination of HIV protease.

    Science.gov (United States)

    Venkatakrishnan, Balasubramanian; Palii, Miorel-Lucian; Agbandje-McKenna, Mavis; McKenna, Robert

    2012-03-01

    The Protein Data Bank (PDB) contains over 71,000 structures. Extensively studied proteins have hundreds of submissions available, including mutations, different complexes, and space groups, allowing for application of data-mining algorithms to analyze an array of static structures and gain insight about a protein's structural variation and possibly its dynamics. This investigation is a case study of HIV protease (PR) using in-house algorithms for data mining and structure superposition through generalized formulæ that account for multiple conformations and fractional occupancies. Temperature factors (B-factors) are compared with spatial displacement from the mean structure over the entire study set and separately over bound and ligand-free structures, to assess the significance of structural deviation in a statistical context. Space group differences are also examined.

  7. Imaging and detection of mines from acoustic measurements

    Science.gov (United States)

    Witten, Alan J.; DiMarzio, Charles A.; Li, Wen; McKnight, Stephen W.

    1999-08-01

    A laboratory-scale acoustic experiment is described where a buried target, a hockey puck cut in half, is shallowly buried in a sand box. To avoid the need for source and receiver coupling to the host sand, an acoustic wave is generated in the subsurface by a pulsed laser suspended above the air-sand interface. Similarly, an airborne microphone is suspended above this interface and moved in unison with the laser. After some pre-processing of the data, reflections for the target, although weak, could clearly be identified. While the existence and location of the target can be determined by inspection of the data, its unique shape can not. Since target discrimination is important in mine detection, a 3D imaging algorithm was applied to the acquired acoustic data. This algorithm yielded a reconstructed image where the shape of the target was resolved.

  8. Data Mining for ISHM of Liquid Rocket Propulsion Status Update

    Science.gov (United States)

    Srivastava, Ashok; Schwabacher, Mark; Oza, Nijunj; Martin, Rodney; Watson, Richard; Matthews, Bryan

    2006-01-01

    This document consists of presentation slides that review the current status of data mining to support the work with the Integrated Systems Health Management (ISHM) for the systems associated with Liquid Rocket Propulsion. The aim of this project is to have test stand data from Rocketdyne to design algorithms that will aid in the early detection of impending failures during operation. These methods will be extended and improved for future platforms (i.e., CEV/CLV).

  9. Uranium mining

    International Nuclear Information System (INIS)

    2008-01-01

    Full text: The economic and environmental sustainability of uranium mining has been analysed by Monash University researcher Dr Gavin Mudd in a paper that challenges the perception that uranium mining is an 'infinite quality source' that provides solutions to the world's demand for energy. Dr Mudd says information on the uranium industry touted by politicians and mining companies is not necessarily inaccurate, but it does not tell the whole story, being often just an average snapshot of the costs of uranium mining today without reflecting the escalating costs associated with the process in years to come. 'From a sustainability perspective, it is critical to evaluate accurately the true lifecycle costs of all forms of electricity production, especially with respect to greenhouse emissions, ' he says. 'For nuclear power, a significant proportion of greenhouse emissions are derived from the fuel supply, including uranium mining, milling, enrichment and fuel manufacture.' Dr Mudd found that financial and environmental costs escalate dramatically as the uranium ore is used. The deeper the mining process required to extract the ore, the higher the cost for mining companies, the greater the impact on the environment and the more resources needed to obtain the product. I t is clear that there is a strong sensitivity of energy and water consumption and greenhouse emissions to ore grade, and that ore grades are likely to continue to decline gradually in the medium to long term. These issues are critical to the current debate over nuclear power and greenhouse emissions, especially with respect to ascribing sustainability to such activities as uranium mining and milling. For example, mining at Roxby Downs is responsible for the emission of over one million tonnes of greenhouse gases per year and this could increase to four million tonnes if the mine is expanded.'

  10. Internet technologies in the mining industry. Towards unattended mining systems

    Energy Technology Data Exchange (ETDEWEB)

    Krzykawski, Michal [FAMUR Group, Katowice (Poland)

    2009-08-27

    Global suppliers of longwall systems focus mainly on maximising the efficiency of the equipment they manufacture. Given the fact that, since 2004, coal demand on world markets has been constantly on the increase, even during an economic downturn, this endeavour seems fully justified. However, it should be remembered that maximum efficiency must be accompanied by maximum safety of all underground operations. This statement is based on the belief that the mining industry, which exploits increasingly deep and dangerous coal beds, faces the necessity to implement comprehensive IT systems for managing all mining processes and, in the near future, to use unmanned mining systems, fully controllable from the mine surface. The computerisation of mines is an indispensable element of the development of the world mining industry, a belief which has been put into practice with e-mine, developed by the FAMUR Group. (orig.)

  11. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

    Science.gov (United States)

    Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J

    2017-08-01

    Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic

  12. A study on eco-environmental vulnerability of mining cities: a case study of Panzhihua city of Sichuan province in China

    Science.gov (United States)

    Shao, Huaiyong; Xian, Wei; Yang, Wunian

    2009-07-01

    The large-scale and super-strength development of mineral resources in mining cities in long term has made great contributions to China's economic construction and development, but it has caused serious damage to the ecological environment even ecological imbalance at the same time because the neglect of the environmental impact even to the expense of the environment to some extent. In this study, according to the characteristics of mining cities, the scientific and practical eco-environmental vulnerability evaluation index system of mining cities had been established. Taking Panzhihua city of Sichuan province as an example, using remote sensing and GIS technology, applying various types of remote sensing image (TM, SPOT5, IKONOS) and Statistical data, the ecological environment evaluation data of mining cities was extracted effectively. For the non-linear relationship between the evaluation indexes and the degree of eco-environmental vulnerability in mining cities, this study innovative took the evaluation of eco-environmental vulnerability of the study area by using artificial neural network whose training used SCE-UA algorithm that well overcome the slow learning and difficult convergence of traditional neural network algorithm. The results of ecoenvironmental vulnerability evaluation of the study area were objective, reasonable and the credibility was high. The results showed that the area distribution of five eco-environmental vulnerability grade types was basically normal, and the overall ecological environment situation of Panzhihua city was in the middle level, the degree of eco-environmental vulnerability in the south was higher than the north, and mining activities were dominant factors to cause ecoenvironmental damage and eco-environmental Vulnerability. In this study, a comprehensive theory and technology system of regional eco-environmental vulnerability evaluation which included the establishment of eco-environmental vulnerability evaluation index

  13. Multiple-Feature Extracting Modules Based Leak Mining System Design

    Directory of Open Access Journals (Sweden)

    Ying-Chiang Cho

    2013-01-01

    mining system that is equipped with SQL injection vulnerability detection, by means of an algorithm developed for the web crawler. In addition, we analyze portal sites of the governments of various countries or regions in order to investigate the information leaking status of each site. Subsequently, we analyze the database structure and content of each site, using the data collected. Thus, we make use of practical verification in order to focus on information security and privacy through black-box testing.

  14. DrugQuest - a text mining workflow for drug association discovery.

    Science.gov (United States)

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

    2016-06-06

    Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

  15. Requirements and opportunities for mining engineers in the mining industry abroad

    Energy Technology Data Exchange (ETDEWEB)

    Albrecht, E

    1987-04-09

    The decline of the German mining industry and the increasing industrialization of mining is forcing ever greater numbers of young German mining graduates to build their careers abroad. The requirements for this - apart from the technical qualifications are a good knowledge of foreign languages and a readiness to leave Germany for a long time, even for ever. If the young mining graduate accepts these conditions, numerous professional opportunities will open up for him, both with German mining companies with interests abroad, in mining supply companies and consultancy firms and with foreign companies. 6 references.

  16. Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Directory of Open Access Journals (Sweden)

    Lam Nina SN

    2006-02-01

    Full Text Available Abstract Background The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. Results Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. Conclusion Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention.

  17. Effect of stage development of mining operations on maximization the net present value in long-term planning of open pits

    OpenAIRE

    Kržanović, Daniel; Rajković, Radmilo; Mikić, Miomir; Ljubojev, Milenko

    2014-01-01

    Long-term planning in the mining industry has one main goal: maximizing the value that is realized by excavation and processing of mineral resources. When designing the open pits, determining the stages of development the mining operations, (eng. Pushback) is one of the important factors in the process of long-term production planning. Using the different scientific methods and mathematical algorithms underlying the operation of modern software for strategic planning of production, it is poss...

  18. PREVENTION OF ACID MINE DRAINAGE GENERATION FROM OPEN-PIT MINE HIGHWALLS

    Science.gov (United States)

    Exposed, open pit mine highwalls contribute significantly to the production of acid mine drainage (AMD) thus causing environmental concerns upon closure of an operating mine. Available information on the generation of AMD from open-pit mine highwalls is very limit...

  19. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization

    Directory of Open Access Journals (Sweden)

    Krasnogor Natalio

    2009-10-01

    Full Text Available Abstract Background Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

  20. A direction of developing a mining method and mining complexes

    Energy Technology Data Exchange (ETDEWEB)

    Gabov, V.V.; Efimov, I.A. [St. Petersburg State Mining Institute, St. Petersburg (Russian Federation). Vorkuta Branch

    1996-12-31

    The analyses of a mining method as a main factor determining the development stages of mining units is presented. The paper suggests a perspective mining method which differs from the known ones by following peculiarities: the direction selectivity of cuts with regard to coal seams structure; the cutting speed, thickness and succession of dusts. This method may be done by modulate complexes (a shield carrying a cutting head for coal mining), their mining devices being supplied with hydraulic drive. An experimental model of the module complex has been developed. 2 refs.

  1. An improved algorithm for MFR fragment assembly

    International Nuclear Information System (INIS)

    Kontaxis, Georg

    2012-01-01

    A method for generating protein backbone models from backbone only NMR data is presented, which is based on molecular fragment replacement (MFR). In a first step, the PDB database is mined for homologous peptide fragments using experimental backbone-only data i.e. backbone chemical shifts (CS) and residual dipolar couplings (RDC). Second, this fragment library is refined against the experimental restraints. Finally, the fragments are assembled into a protein backbone fold using a rigid body docking algorithm using the RDCs as restraints. For improved performance, backbone nuclear Overhauser effects (NOEs) may be included at that stage. Compared to previous implementations of MFR-derived structure determination protocols this model-building algorithm offers improved stability and reliability. Furthermore, relative to CS-ROSETTA based methods, it provides faster performance and straightforward implementation with the option to easily include further types of restraints and additional energy terms.

  2. Real -time dispatching modelling for trucks with different capacities in open pit mines / Modelowanie w czasie rzeczywistym przewozów ciężarówek o różnej ładowności w kopalni odkrywkowej

    Science.gov (United States)

    Ahangaran, Daryoush Kaveh; Yasrebi, Amir Bijan; Wetherelt, Andy; Foster, Patrick

    2012-10-01

    Application of fully automated systems for truck dispatching plays a major role in decreasing the transportation costs which often represent the majority of costs spent on open pit mining. Consequently, the application of a truck dispatching system has become fundamentally important in most of the world's open pit mines. Recent experiences indicate that by decreasing a truck's travelling time and the associated waiting time of its associated shovel then due to the application of a truck dispatching system the rate of production will be considerably improved. Computer-based truck dispatching systems using algorithms, advanced and accurate software are examples of these innovations. Developing an algorithm of a computer- based program appropriated to a specific mine's conditions is considered as one of the most important activities in connection with computer-based dispatching in open pit mines. In this paper the changing trend of programming and dispatching control algorithms and automation conditions will be discussed. Furthermore, since the transportation fleet of most mines use trucks with different capacities, innovative methods, operational optimisation techniques and the best possible methods for developing the required algorithm for real-time dispatching are selected by conducting research on mathematical-based planning methods. Finally, a real-time dispatching model compatible with the requirement of trucks with different capacities is developed by using two techniques of flow networks and integer programming.

  3. Large Scale Frequent Pattern Mining using MPI One-Sided Model

    Energy Technology Data Exchange (ETDEWEB)

    Vishnu, Abhinav; Agarwal, Khushbu

    2015-09-08

    In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.

  4. Multiplicative algorithms for constrained non-negative matrix factorization

    KAUST Repository

    Peng, Chengbin

    2012-12-01

    Non-negative matrix factorization (NMF) provides the advantage of parts-based data representation through additive only combinations. It has been widely adopted in areas like item recommending, text mining, data clustering, speech denoising, etc. In this paper, we provide an algorithm that allows the factorization to have linear or approximatly linear constraints with respect to each factor. We prove that if the constraint function is linear, algorithms within our multiplicative framework will converge. This theory supports a large variety of equality and inequality constraints, and can facilitate application of NMF to a much larger domain. Taking the recommender system as an example, we demonstrate how a specialized weighted and constrained NMF algorithm can be developed to fit exactly for the problem, and the tests justify that our constraints improve the performance for both weighted and unweighted NMF algorithms under several different metrics. In particular, on the Movielens data with 94% of items, the Constrained NMF improves recall rate 3% compared to SVD50 and 45% compared to SVD150, which were reported as the best two in the top-N metric. © 2012 IEEE.

  5. A New Method for Haul Road Design in Open-Pit Mines to Support Efficient Truck Haulage Operations

    Directory of Open Access Journals (Sweden)

    Jieun Baek

    2017-07-01

    Full Text Available The design of a haul road for an open-pit mine can significantly affect the cost associated with hauling ore and waste to the surface. This study proposes a new method for haul road design in open-pit mines to support efficient truck haulage operations. The road layout in open-pit mines was optimized by using raster-based least-cost path analysis, and the resulting zigzag road sections were simplified by applying the Douglas-Peucker algorithm. In addition, the road layout was modified by reflecting the radius of curvature suggested in the road design guides. Finally, a three-dimensional model reflecting the results of the road design was created by combining the road layout modification result with the slope of the open-pit mine and the bench design result. The application of the proposed method to an area containing gold deposits made it possible to design a haul road for open-pit mines such that it supported efficient truck haulage operations; furthermore, the time required for truck movement along the road could be estimated. The proposed method is expected to be useful for planning and designing open-pit mines and to facilitate the improvement of the road design function of existing mining software applications.

  6. 30 CFR 780.27 - Reclamation plan: Surface mining near underground mining.

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Reclamation plan: Surface mining near underground mining. 780.27 Section 780.27 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SURFACE COAL MINING AND RECLAMATION OPERATIONS PERMITS AND COAL...

  7. Minimizing the Impact of Mining Activities for Sustainable Mined-Out ...

    African Journals Online (AJOL)

    Minimizing the Impact of Mining Activities for Sustainable Mined-Out Area ... sensing and Geographical Information System (GIS) in assessing environmental impact of ... Keywords: Solid mineral, Impact assessment, Mined-out area utilization, ...

  8. Archveyor{trademark} automated mining system - implementation at the Conant mine

    Energy Technology Data Exchange (ETDEWEB)

    Hofmann, W.J. [Arch of Illinois, Percy, IL (United States)

    1997-12-01

    Arch Mineral Corporation, through the Arch Technology Department, has developed an automated continuous haulage mining system called the `Archveyor{trademark}`. The original technology came from a Russian patent. Kloeckner-Becorit (K-B) further developed the system and called it the `Mobile Conveyor`. This system was utilized in both coal and trona mines in the United States and Canada. Consolidation Coal designed their version of this continuous haulage system, called the `Tramveyor`. The Tramveyor is presently operating in their Dilworth Mine, in Pennsylvania. This system has no computer guidance system related to the continuous miner or the Tramveyor. Arch Mineral Corporation has further developed this continuous haulage mining system. Their system is a programmable, logic-controlled (PLC) automated mining system. A highwall version of the Archveyor{trademark} is being operated at Arch of Wyoming near Hanna, Wyoming. This paper introduces the first underground version of Archveyor{trademark} to be implemented at Conant Mine in southern Illinois. During the development process, the Archveyor{trademark} mining system consists of a continuous miner, a bolter car, the Archveyor{trademark} (itself), a stageloader, and an operator`s cab. During the secondary mining process the bolter car is taken out of the system.

  9. A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization

    OpenAIRE

    Tuikkala, Johannes; Vähämaa, Heidi; Salmela, Pekka; Nevalainen, Olli S; Aittokallio, Tero

    2012-01-01

    Abstract Background Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, suc...

  10. Adaptive optimization as a design and management methodology for coal-mining enterprise in uncertain and volatile market environment - the conceptual framework

    Science.gov (United States)

    Mikhalchenko, V. V.; Rubanik, Yu T.

    2016-10-01

    The work is devoted to the problem of cost-effective adaptation of coal mines to the volatile and uncertain market conditions. Conceptually it can be achieved through alignment of the dynamic characteristics of the coal mining system and power spectrum of market demand for coal product. In practical terms, this ensures the viability and competitiveness of coal mines. Transformation of dynamic characteristics is to be done by changing the structure of production system as well as corporate, logistics and management processes. The proposed methods and algorithms of control are aimed at the development of the theoretical foundations of adaptive optimization as basic methodology for coal mine enterprise management in conditions of high variability and uncertainty of economic and natural environment. Implementation of the proposed methodology requires a revision of the basic principles of open coal mining enterprises design.

  11. Design of data warehouse in teaching state based on OLAP and data mining

    Science.gov (United States)

    Zhou, Lijuan; Wu, Minhua; Li, Shuang

    2009-04-01

    The data warehouse and the data mining technology is one of information technology research hot topics. At present the data warehouse and the data mining technology in aspects and so on commercial, financial industry as well as enterprise's production, market marketing obtained the widespread application, but is relatively less in educational fields' application. Over the years, the teaching and management have been accumulating large amounts of data in colleges and universities, while the data can not be effectively used, in the light of social needs of the university development and the current status of data management, the establishment of data warehouse in university state, the better use of existing data, and on the basis dealing with a higher level of disposal --data mining are particularly important. In this paper, starting from the decision-making needs design data warehouse structure of university teaching state, and then through the design structure and data extraction, loading, conversion create a data warehouse model, finally make use of association rule mining algorithm for data mining, to get effective results applied in practice. Based on the data analysis and mining, get a lot of valuable information, which can be used to guide teaching management, thereby improving the quality of teaching and promoting teaching devotion in universities and enhancing teaching infrastructure. At the same time it can provide detailed, multi-dimensional information for universities assessment and higher education research.

  12. Improved estimation of electricity demand function by integration of fuzzy system and data mining approach

    International Nuclear Information System (INIS)

    Azadeh, A.; Saberi, M.; Ghaderi, S.F.; Gitiforouz, A.; Ebrahimipour, V.

    2008-01-01

    This study presents an integrated fuzzy system, data mining and time series framework to estimate and predict electricity demand for seasonal and monthly changes in electricity consumption especially in developing countries such as China and Iran with non-stationary data. Furthermore, it is difficult to model uncertain behavior of energy consumption with only conventional fuzzy system or time series and the integrated algorithm could be an ideal substitute for such cases. To construct fuzzy systems, a rule base is needed. Because a rule base is not available, for the case of demand function, look up table which is one of the extracting rule methods is used to extract the rule base. This system is defined as FLT. Also, decision tree method which is a data mining approach is similarly utilized to extract the rule base. This system is defined as FDM. Preferred time series model is selected from linear (ARMA) and nonlinear model. For this, after selecting preferred ARMA model, McLeod-Li test is applied to determine nonlinearity condition. When, nonlinearity condition is satisfied, preferred nonlinear model is selected and compare with preferred ARMA model and finally one of this is selected as time series model. At last, ANOVA is used for selecting preferred model from fuzzy models and time series model. Also, the impact of data preprocessing and postprocessing on the fuzzy system performance is considered by the algorithm. In addition, another unique feature of the proposed algorithm is utilization of autocorrelation function (ACF) to define input variables, whereas conventional methods which use trial and error method. Monthly electricity consumption of Iran from 1995 to 2005 is considered as the case of this study. The MAPE estimation of genetic algorithm (GA), artificial neural network (ANN) versus the proposed algorithm shows the appropriateness of the proposed algorithm

  13. Management of mining-related damages in abandoned underground coal mine areas using GIS

    International Nuclear Information System (INIS)

    Lee, U.J.; Kim, J.A.; Kim, S.S.; Kim, W.K.; Yoon, S.H.; Choi, J.K.

    2005-01-01

    The mining-related damages such as ground subsidence, acid mine drainage (AMD), and deforestation in the abandoned underground coal mine areas become an object of public concern. Therefore, the system to manage the mining-related damages is needed for the effective drive of rehabilitation activities. The management system for Abandoned Underground Coal Mine using GIS includes the database about mining record and information associated with the mining-related damages and application programs to support mine damage prevention business. Also, this system would support decision-making policy for rehabilitation and provide basic geological data for regional construction works in abandoned underground coal mine areas. (authors)

  14. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  15. Surface mining

    Science.gov (United States)

    Robert Leopold; Bruce Rowland; Reed Stalder

    1979-01-01

    The surface mining process consists of four phases: (1) exploration; (2) development; (3) production; and (4) reclamation. A variety of surface mining methods has been developed, including strip mining, auger, area strip, open pit, dredging, and hydraulic. Sound planning and design techniques are essential to implement alternatives to meet the myriad of laws,...

  16. Uranium mining

    International Nuclear Information System (INIS)

    Lange, G.

    1975-01-01

    The winning of uranium ore is the first stage of the fuel cycle. The whole complex of questions to be considered when evaluating the profitability of an ore mine is shortly outlined, and the possible mining techniques are described. Some data on uranium mining in the western world are also given. (RB) [de

  17. Geospatial Image Mining For Nuclear Proliferation Detection: Challenges and New Opportunities

    Energy Technology Data Exchange (ETDEWEB)

    Vatsavai, Raju [ORNL; Bhaduri, Budhendra L [ORNL; Cheriyadat, Anil M [ORNL; Arrowood, Lloyd [Y-12 National Security Complex; Bright, Eddie A [ORNL; Gleason, Shaun Scott [ORNL; Diegert, Carl [Sandia National Laboratories (SNL); Katsaggelos, Aggelos K [ORNL; Pappas, Thrasos N [ORNL; Porter, Reid [Los Alamos National Laboratory (LANL); Bollinger, Jim [Savannah River National Laboratory (SRNL); Chen, Barry [Lawrence Livermore National Laboratory (LLNL); Hohimer, Ryan [Pacific Northwest National Laboratory (PNNL)

    2010-01-01

    With increasing understanding and availability of nuclear technologies, and increasing persuasion of nuclear technologies by several new countries, it is increasingly becoming important to monitor the nuclear proliferation activities. There is a great need for developing technologies to automatically or semi-automatically detect nuclear proliferation activities using remote sensing. Images acquired from earth observation satellites is an important source of information in detecting proliferation activities. High-resolution remote sensing images are highly useful in verifying the correctness, as well as completeness of any nuclear program. DOE national laboratories are interested in detecting nuclear proliferation by developing advanced geospatial image mining algorithms. In this paper we describe the current understanding of geospatial image mining techniques and enumerate key gaps and identify future research needs in the context of nuclear proliferation.

  18. Privacy Preserving Association Rule Mining Revisited: Privacy Enhancement and Resources Efficiency

    Science.gov (United States)

    Mohaisen, Abedelaziz; Jho, Nam-Su; Hong, Dowon; Nyang, Daehun

    Privacy preserving association rule mining algorithms have been designed for discovering the relations between variables in data while maintaining the data privacy. In this article we revise one of the recently introduced schemes for association rule mining using fake transactions (FS). In particular, our analysis shows that the FS scheme has exhaustive storage and high computation requirements for guaranteeing a reasonable level of privacy. We introduce a realistic definition of privacy that benefits from the average case privacy and motivates the study of a weakness in the structure of FS by fake transactions filtering. In order to overcome this problem, we improve the FS scheme by presenting a hybrid scheme that considers both privacy and resources as two concurrent guidelines. Analytical and empirical results show the efficiency and applicability of our proposed scheme.

  19. Data Mining for Business Intelligence Concepts, Techniques, and Applications in Microsoft Office Excel(r) with XLMiner(r)

    CERN Document Server

    Shmueli, Galit; Bruce, Peter C

    2011-01-01

    Data Mining for Business Intelligence, Second Edition uses real data and actual cases to illustrate the applicability of data mining (DM) intelligence in the development of successful business models. Featuring complimentary access to XLMiner, the Microsoft Office Excel add-in, this book allows readers to follow along and implement algorithms at their own speed, with a minimal learning curve. In addition, students and practitioners of DM techniques are presented with hands-on, business-oriented applications. An abundant amount of exercises and examples, now doubled in number in the second edit

  20. DATA MINING WORKSPACE AS AN OPTIMIZATION PREDICTION TECHNIQUE FOR SOLVING TRANSPORT PROBLEMS

    Directory of Open Access Journals (Sweden)

    Anastasiia KUPTCOVA

    2016-09-01

    Full Text Available This article addresses the study related to forecasting with an actual high-speed decision making under careful modelling of time series data. The study uses data-mining modelling for algorithmic optimization of transport goals. Our finding brings to the future adequate techniques for the fitting of a prediction model. This model is going to be used for analyses of the future transaction costs in the frontiers of the Czech Republic. Time series prediction methods for the performance of prediction models in the package of Statistics are Exponential, ARIMA and Neural Network approaches. The primary target for a predictive scenario in the data mining workspace is to provide modelling data faster and with more versatility than the other management techniques.

  1. Realizatinon of “zero emission” of mining water effluents from Sasa mine

    OpenAIRE

    Mirakovski, Dejan; Doneva, Nikolinka; Hadzi-Nikolova, Marija; Gocevski, Borce

    2015-01-01

    Sasa mine continuously takes actions to minimize the environmental impact of mining activities, in order to fulfill the national legislation in the field of environmental protection which comply with European legislation. This paper shows the drainage system of the horizon 830, which is performed in order to prevent free leakage of mining groundwater, as a part of these actions. This system provides a zero emission of mining water in the environment from Sasa mine. Key words: mining water...

  2. Sustainable rehabilitation of mining waste and acid mine drainage using geochemistry, mine type, mineralogy, texture, ore extraction and climate knowledge.

    Science.gov (United States)

    Anawar, Hossain Md

    2015-08-01

    The oxidative dissolution of sulfidic minerals releases the extremely acidic leachate, sulfate and potentially toxic elements e.g., As, Ag, Cd, Cr, Cu, Hg, Ni, Pb, Sb, Th, U, Zn, etc. from different mine tailings and waste dumps. For the sustainable rehabilitation and disposal of mining waste, the sources and mechanisms of contaminant generation, fate and transport of contaminants should be clearly understood. Therefore, this study has provided a critical review on (1) recent insights in mechanisms of oxidation of sulfidic minerals, (2) environmental contamination by mining waste, and (3) remediation and rehabilitation techniques, and (4) then developed the GEMTEC conceptual model/guide [(bio)-geochemistry-mine type-mineralogy- geological texture-ore extraction process-climatic knowledge)] to provide the new scientific approach and knowledge for remediation of mining wastes and acid mine drainage. This study has suggested the pre-mining geological, geochemical, mineralogical and microtextural characterization of different mineral deposits, and post-mining studies of ore extraction processes, physical, geochemical, mineralogical and microbial reactions, natural attenuation and effect of climate change for sustainable rehabilitation of mining waste. All components of this model should be considered for effective and integrated management of mining waste and acid mine drainage. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    Science.gov (United States)

    Stolzer, Alan J.; Halford, Carl

    2007-01-01

    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.

  4. Philippine Mining Capitalism: The Changing Terrains of Struggle in the Neoliberal Mining Regime

    Directory of Open Access Journals (Sweden)

    Alvin A. Camba

    2016-06-01

    Full Text Available This article analyzes how the mining sector and anti-mining groups compete for mining outcomes in the Philippines. I argue that the transition to a neoliberal mineral regime has empowered the mining sector and weakened the mining groups by shifting the terrains of struggle onto the domains of state agencies and scientific networks. Since the neoliberal era, the mining sector has come up with two strategies. First, technologies of subjection elevate various public institutions to elect and select the processes aimed at making mining accountable and sensitive to the demands of local communities. However, they often refuse or lack the capacity to intervene effectively. Second, technologies of subjectivities allow a selective group of industry experts to single-handedly determine the environmental viability of mining projects. Mining consultants, specialists, and scientists chosen by mining companies determine the potential environmental damage on water bodies, air pollution, and soil erosion. Because of the mining capital’s access to economic and legal resources, anti-mining communities across the Philippines have been forced to compete on an unequal terrain for a meaningful social dialogue and mining outcomes.

  5. Gesture Recognition from Data Streams of Human Motion Sensor Using Accelerated PSO Swarm Search Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2015-01-01

    Full Text Available Human motion sensing technology gains tremendous popularity nowadays with practical applications such as video surveillance for security, hand signing, and smart-home and gaming. These applications capture human motions in real-time from video sensors, the data patterns are nonstationary and ever changing. While the hardware technology of such motion sensing devices as well as their data collection process become relatively mature, the computational challenge lies in the real-time analysis of these live feeds. In this paper we argue that traditional data mining methods run short of accurately analyzing the human activity patterns from the sensor data stream. The shortcoming is due to the algorithmic design which is not adaptive to the dynamic changes in the dynamic gesture motions. The successor of these algorithms which is known as data stream mining is evaluated versus traditional data mining, through a case of gesture recognition over motion data by using Microsoft Kinect sensors. Three different subjects were asked to read three comic strips and to tell the stories in front of the sensor. The data stream contains coordinates of articulation points and various positions of the parts of the human body corresponding to the actions that the user performs. In particular, a novel technique of feature selection using swarm search and accelerated PSO is proposed for enabling fast preprocessing for inducing an improved classification model in real-time. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms and incorporation of the novel improved feature selection technique with a scenario where different gesture patterns are to be recognized from streaming sensor data.

  6. Comparative Data Mining Analysis for Information Retrieval of MODIS Images: Monitoring Lake Turbidity Changes at Lake Okeechobee, Florida

    Science.gov (United States)

    In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates mig...

  7. Technological highwall mining

    Energy Technology Data Exchange (ETDEWEB)

    Davison, I. [Highwall Systems (United States)

    2006-09-15

    The paper explores the issues facing highwall mining. Based in Chilhowie, Virginia, American Highwall Systems has developed a highwall mining system that will allow the mining of coal seams from 26 in to 10 ft in thickness. The first production model, AH51, began mining in August 2006. Technologies incorporated into the company's mining machines to improve the performance, enhance the efficiency, and improve the reliability of the highwall mining equipment incorporate technologies from many disciplines. Technology as applied to design engineering, manufacturing and fabrication engineering, control and monitoring computer hardware and software has played an important role in the evolution of the American Highwall Systems design concept. 5 photos.

  8. Ask and Ye Shall Receive? Automated Text Mining of Michigan Capital Facility Finance Bond Election Proposals to Identify Which Topics Are Associated with Bond Passage and Voter Turnout

    Science.gov (United States)

    Bowers, Alex J.; Chen, Jingjing

    2015-01-01

    The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…

  9. Kiruna research mine

    Energy Technology Data Exchange (ETDEWEB)

    Oestensen, A

    1983-12-01

    The research mine at Kiruna is the first large-scale mining research project sponsored by the Swedish government. Under the leadership of the Swedish Mining Research Foundation, a five-year project involving development of new mining systems and machinery will be carried out in cooperation with the Lulea Institute of Technology and a number of Swedish industrial companies.

  10. Virtual Observatories, Data Mining, and Astroinformatics

    Science.gov (United States)

    Borne, Kirk

    The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential

  11. Mathematical models of flat linear induction motors used in mining drives

    Energy Technology Data Exchange (ETDEWEB)

    Tall, M

    1984-01-01

    Design parameters are calculated for electric flat linear induction motors, widely employed in the coal and ore mining industries in Poland. A mathematical model of this motor with a single-layer ferromagnetic secondary part is presented. A three-dimensional electromagnetic field analysis is carried out, taking relative magnetic permeability variation, discrete winding distribution, influence of armature grooving and pulsating field influence into account. A computer calculation algorithm is proposed for determining motor characteristics. 17 refs.

  12. G2D: a tool for mining genes associated with disease

    OpenAIRE

    Perez-Iratxeta, Carolina; Wjst, Matthias; Bork, Peer; Andrade, Miguel A

    2005-01-01

    Abstract Background Human inherited diseases can be associated by genetic linkage with one or more genomic regions. The availability of the complete sequence of the human genome allows examining those locations for an associated gene. We previously developed an algorithm to prioritize genes on a chromosomal region according to their possible relation to an inherited disease using a combination of data mining on biomedical databases and gene sequence analysis. Results We have implemented this ...

  13. Overview of mine drainage geochemistry at historical mines, Humboldt River basin and adjacent mining areas, Nevada. Chapter E.

    Science.gov (United States)

    Nash, J. Thomas; Stillings, Lisa L.

    2004-01-01

    Reconnaissance hydrogeochemical studies of the Humboldt River basin and adjacent areas of northern Nevada have identified local sources of acidic waters generated by historical mine workings and mine waste. The mine-related acidic waters are rare and generally flow less than a kilometer before being neutralized by natural processes. Where waters have a pH of less than about 3, particularly in the presence of sulfide minerals, the waters take on high to extremely high concentrations of many potentially toxic metals. The processes that create these acidic, metal-rich waters in Nevada are the same as for other parts of the world, but the scale of transport and the fate of metals are much more localized because of the ubiquitous presence of caliche soils. Acid mine drainage is rare in historical mining districts of northern Nevada, and the volume of drainage rarely exceeds about 20 gpm. My findings are in close agreement with those of Price and others (1995) who estimated that less than 0.05 percent of inactive and abandoned mines in Nevada are likely to be a concern for acid mine drainage. Most historical mining districts have no draining mines. Only in two districts (Hilltop and National) does water affected by mining flow into streams of significant size and length (more than 8 km). Water quality in even the worst cases is naturally attenuated to meet water-quality standards within about 1 km of the source. Only a few historical mines release acidic water with elevated metal concentrations to small streams that reach the Humboldt River, and these contaminants and are not detectable in the Humboldt. These reconnaissance studies offer encouraging evidence that abandoned mines in Nevada create only minimal and local water-quality problems. Natural attenuation processes are sufficient to compensate for these relatively small sources of contamination. These results may provide useful analogs for future mining in the Humboldt River basin, but attention must be given to

  14. Object-oriented spatial-temporal association rules mining on ocean remote sensing imagery

    International Nuclear Information System (INIS)

    Xue, C J; Dong, Q; Ma, W X

    2014-01-01

    Using the long term marine remote sensing imagery, we develop an object-oriented spatial-temporal association rules mining framework to explore the association rules mining among marine environmental elements. Within the framework, two key issues are addressed. They are how to effectively deal with the related lattices and how to reduce the related dimensions? To deal with the first key issues, this paper develops an object-oriented method for abstracting marine sensitive objects from raster pixels and for representing them with a quadruple. To deal with the second key issues, by embedding the mutual information theory, we construct the direct association pattern tree to reduce the related elements at the first step, and then the Apriori algorithm is used to discover the spatio-temporal associated rules. Finally, Pacific Ocean is taken as a research area and multi- marine remote sensing imagery in recent three decades is used as a case study. The results show that the object-oriented spatio-temporal association rules mining can acquire the associated relationships not only among marine environmental elements in same region, also among the different regions. In addition, the information from association rules mining is much more expressive and informative in space and time than traditional spatio-temporal analysis

  15. Present and future mine effluents management at Zirovski Vrh uranium mine

    International Nuclear Information System (INIS)

    Logar, Z.; Likar, B.; Gantar, I.

    2002-01-01

    Zirovski Vrh uranium mine and its facilities are situated on the northeastern slopes of the Zirovski Vrh ridge (960 m) and on the southern slopes of Crna gora (611 m) respectively. Mine elevation is from 430 m (bottom of the valley) to 580 m (P-1 adit). All effluents from the mine and mill objects flow into the Brebovscica river (with average yearly flow of 0.74 m 3 /s): run off mine water; mine waste pile Jazbec outflow; mill tailings Borst outflows; effluents from mine temporary mine waste piles P-1, P-9, P-36 are of minor significance. The first three effluents and the recipient surface water flows (the Todrascica brook and the Brebovscica river) are monitored extensively. The impact of radioactive polluted outflows on named waters is proved, but far under the maximal permitted limit values. The authorised maximal limits values for mine effluents were obtained in 1996. Detail design will ensure that this values will not be exceeded in the future. The long term planes are to minimise the uranium concentrations in the run off mine water by target underground drilling. The mine waste pile and the mill tailings will be covered by engineered cover system to avoid clean water contamination by weathering and ablution as well. The existing effluents from the mill tailings will diminish after the remediation and consolidation of the tailing. The Government of Slovenia funds the remediation of the uranium production site Zirovski Vrh. Estimated needed funds for remediation of the main objects are shown in the table below. The total investment includes also the costs for effluents control. Area Mio US$ Underground mine remediation 19.00 Mine waste pile remediation 6.50 Mill tailings remediation 2.24 Total investment costs 27.74 Above figures do not include operation costs of the Zirovski Vrh Mine, approximately US$ 2.2 Mio per year nowadays. The last implementation schedule foresights the end of remediation works in year 2005. After that starts trial monitoring of 5 years

  16. Hydrogeochemical assessment of mine-impacted water and sediment of iron ore mining

    Science.gov (United States)

    Nur Atirah Affandi, Fatin; Kusin, Faradiella Mohd; Aqilah Sulong, Nur; Madzin, Zafira

    2018-04-01

    This study was carried out to evaluate the hydrogeochemical behaviour of mine-impacted water and sediment of a former iron ore mining area. Sampling of mine water and sediment were carried out at selected locations within the mine including the former mining ponds, mine tailings and the nearby stream. The water samples were analysed for their hydrochemical facies, major and trace elements including heavy metals. The water in the mining ponds and the mine tailings was characterised as highly acidic (pH 2.54-3.07), but has near-neutral pH in the nearby stream. Results indicated that Fe and Mn in water have exceeded the recommended guidelines values and was also supported by the results of geochemical modelling. The results also indicated that sediments in the mining area were contaminated with Cd and As as shown by the potential ecological risk index values. The total risk index of heavy metals in the sediment were ranked in the order of Cd>As>Pb>Cu>Zn>Cr. Overall, the extent of potential ecological risks of the mining area were categorised as having low to moderate ecological risk.

  17. Collaborative Data Mining

    Science.gov (United States)

    Moyle, Steve

    Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.

  18. 76 FR 63238 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines

    Science.gov (United States)

    2011-10-12

    ... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... Agency's proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in... proposed rule for Proximity Detection Systems on Continuous Mining Machines in Underground Coal Mines. Due...

  19. 76 FR 70075 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines

    Science.gov (United States)

    2011-11-10

    ... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in Underground Coal... Detection Systems for Continuous Mining Machines in Underground Coal Mines. MSHA conducted hearings on...

  20. A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2018-01-01

    Full Text Available Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR, CombineFileInputFormat (CFIF, and Sequence Files (SF, to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP algorithm in efficiency and scalability.