Sample records for stream mining algorithms

  1. A Survey of latest Algorithms for Frequent Itemset Mining in Data Stream



    Full Text Available Association rule mining and finding frequent patterns in data base has been a very old topic. With the advent of Big Data, the need for stream mining has increased. Hence the paper surveys various latest frequent pattern mining algorithms on data streams to understand various problems to be solved, their short comings and advantages over others.

  2. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

    Agustín Ortíz Díaz


    Full Text Available The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE, which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime, handling different types of concept drifts.

  3. Data streams algorithms and applications

    Muthukrishnan, S


    Data stream algorithms as an active research agenda emerged only over the past few years, even though the concept of making few passes over the data for performing computations has been around since the early days of Automata Theory. The data stream agenda now pervades many branches of Computer Science including databases, networking, knowledge discovery and data mining, and hardware systems. Industry is in synch too, with Data Stream Management Systems (DSMSs) and special hardware to deal with data speeds. Even beyond Computer Science, data stream concerns are emerging in physics, atmospheric

  4. Big Data Mining: Tools & Algorithms

    Adeel Shiraz Hashmi


    Full Text Available We are now in Big Data era, and there is a growing demand for tools which can process and analyze it. Big data analytics deals with extracting valuable information from that complex data which can’t be handled by traditional data mining tools. This paper surveys the available tools which can handle large volumes of data as well as evolving data streams. The data mining tools and algorithms which can handle big data have also been summarized, and one of the tools has been used for mining of large datasets using distributed algorithms.

  5. Mining Frequent Itemsets from Online Data Streams: Comparative Study

    HebaTallah Mohamed Nabil


    Full Text Available Online mining of data streams poses many new challenges more than mining static databases. In addition to the one-scan nature, the unbounded memory requirement, the high data arrival rate of data streams and the combinatorial explosion of itemsets exacerbate the mining task. The high complexity of the frequent itemsets mining problem hinders the application of the stream mining techniques. In this review, we present a comparative study among almost all, as we are acquainted, the algorithms for mining frequent itemsets from online data streams. All those techniques immolate with the accuracy of the results due to the relatively limited storage, leading, at all times, to approximated results.

  6. Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams

    Tataw, Oben Moses


    Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications…

  7. Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams

    Tataw, Oben Moses


    Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications…

  8. Mining developer communication data streams

    Connor, Andy M.; Jacqui Finlay; Russel Pears


    This paper explores the concepts of modelling a sof tware development project as a process that results in the creation of a continuous stream of d ata. In terms of the Jazz repository used in this research, one aspect of that stream of data would b e developer communication. Such data can be used to create an evolving social network charac terized by a range of metrics. This paper presents the application of data stream mining tech ni...

  9. Overview of streaming-data algorithms

    Madhulatha, T Soni


    Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other data sources are referred to as data streams. Various data mining tasks can be performed on data streams in search of interesting patterns. This paper studies a particular data mining task, clustering, which can be used as the first step in many knowledge discovery processes. By grouping data streams into homogeneous clusters, data miners can learn about data characteristics which can then be developed into classification models for new data or predictive models for unknown events. Recent research addresses the problem of data-stream mining to deal with applications that require processing huge amounts of data such as sensor data analysis and financial applications. For such analysis, single-pass algorithms that consume a small amount of memory are critical.

  10. Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream%分布式并行化数据流频繁模式挖掘算法

    马可; 李玲娟; 孙杜靖


    为了提高数据流频繁模式挖掘的效率,文中基于经典的数据流频繁模式挖掘算法FP-Stream和分布式并行计算原理,设计了一种分布式并行化数据流频繁模式挖掘算法—DPFP-Stream ( Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream)。该算法将建立频繁模式树的任务分为local和global两部分,并设置了参数“当前时间”;将到达的流数据平均分配到多个不同的local节点,各local节点使用FP-Growth算法产生该单位时间内本节点的候选频繁项集,并按照单位时间将候选频繁项集及其支持度计数打包发送至global节点;global节点按“当前时间”合并各local节点的中间结果并更新模式树Pattern-Tree。在分布式数据流计算平台Storm上进行的算法实现和性能测试结果表明,DPFP-Stream算法的计算效率能够随着local节点或local bolt线程的增加而提高,适用于高效挖掘数据流中的频繁模式。%In order to improve the efficiency of mining frequent pattern on data stream,a Distributed Parallel Algorithm of Mining Fre-quent Pattern on Data Stream,named DPFP-Stream,is designed in this paper based on the ideas of classical FP-Stream and the distribu-ted parallel computing. It divides the task of building frequent pattern tree into two parts:local and global,and introduces a new parameter“current time”. The arrival data will be equally distributed into different local nodes. Then every local node uses FP-Growth algorithm to produce candidate frequent items,and packages them with relevant support count according to unit time,and sends them to the global node. The global node combines the results produced by local nodes according to the“current time” and updates the global Pattern-Tree. The results of implementing DPFP-Stream algorithm and testing its performance on Storm,a distribution data stream computing platform, show that the computing efficiency of DPFP-Stream can

  11. Mining Building Metadata by Data Stream Comparison

    Holmegaard, Emil; Kjærgaard, Mikkel Baun


    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  12. Mining Building Metadata by Data Stream Comparison

    Holmegaard, Emil; Kjærgaard, Mikkel Baun


    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  13. Overview Of Streaming-Data Algorithms

    T. Soni Madhulatha


    Full Text Available Due to recent advances in data collection techniques, massive amounts of data are being collected at anextremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected fromsensors, equipments, and other data sources are referred to as data streams. Various data mining taskscan be performed on data streams in search of interesting patterns. This paper studies a particular datamining task, clustering, which can be used as the first step in many knowledge discovery processes. Bygrouping data streams into homogeneous clusters, data miners can learn about data characteristicswhich can then be developed into classification models for new data or predictive models for unknownevents. Recent research addresses the problem of data-stream mining to deal with applications thatrequire processing huge amounts of data such as sensor data analysis and financial applications. Forsuch analysis, single-pass algorithms that consume a small amount of memory are critical.

  14. Mining Developer Communication Data Streams

    Andy M. Connor


    Full Text Available This paper explores the concepts of modelling a sof tware development project as a process that results in the creation of a continuous stream of d ata. In terms of the Jazz repository used in this research, one aspect of that stream of data would b e developer communication. Such data can be used to create an evolving social network charac terized by a range of metrics. This paper presents the application of data stream mining tech niques to identify the most useful metrics for predicting build outcomes. Results are presented fr om applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN method for detecting concept drift. The results indicate t hat only a small number of the available metrics considered have any significance for predicting the outcome of a build.

  15. An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

    Kranen, Phillip; Assent, Ira; Seidl, Thomas


    Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also...

  16. Streaming Algorithms for Line Simplification

    Abam, Mohammad; de Berg, Mark; Hachenberger, Peter


    this problem in a streaming setting, where we only have a limited amount of storage, so that we cannot store all the points. We analyze the competitive ratio of our algorithms, allowing resource augmentation: we let our algorithm maintain a simplification with 2k (internal) points and compare the error of our...

  17. Stream Deniable-Encryption Algorithms

    N.A. Moldovyan


    Full Text Available A method for stream deniable encryption of secret message is proposed, which is computationally indistinguishable from the probabilistic encryption of some fake message. The method uses generation of two key streams with some secure block cipher. One of the key streams is generated depending on the secret key and the other one is generated depending on the fake key. The key streams are mixed with the secret and fake data streams so that the output ciphertext looks like the ciphertext produced by some probabilistic encryption algorithm applied to the fake message, while using the fake key. When the receiver or/and sender of the ciphertext are coerced to open the encryption key and the source message, they open the fake key and the fake message. To disclose their lie the coercer should demonstrate possibility of the alternative decryption of the ciphertext, however this is a computationally hard problem.

  18. Mining Frequent Itemsets with Weights over Data Stream Using Inverted Matrix

    Long Nguyen Hung


    Full Text Available In recent years, the mining research over data stream has been prominent as they can be applied in many alternative areas in the real worlds. In this paper, we have proposed an algorithm called MFIWDSIM for mining frequent itemsets with weights over a data stream using Inverted Matrix [10]. The main idea is moving data stream to an inverted matrix saved in the computer disks so that the algorithms can mine on it many times with different support thresholds as well as alternative minimum weights. Moreover, this inverted matrix can be accessed to mine in different times for user’s requirements without recalculation. By analyzing and evaluating, the MFIWDSIM can be seen as the better algorithm compared to WSWFP-stream [9] for mining frequent itemsets with weights over data stream.

  19. An analytical framework for data stream mining techniques based on challenges and requirements

    Kholghi, Mahnoosh


    A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples. The imminent need for turning such data into useful information and knowledge augments the development of systems, algorithms and frameworks that address streaming challenges. The storage, querying and mining of such data sets are highly computationally challenging tasks. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Generally, two main challenges are designing fast mining methods for data streams and need to promptly detect changing concepts and data distribution because of highly dynamic nature of data streams. The goal of this article is to analyze and classify the application of diverse data mining techniques in different challenges of data stream mining. In this...

  20. Active Learning in Context-Driven Stream Mining With an Application to Image Mining.

    Tekin, Cem; van der Schaar, Mihaela


    We propose an image stream mining method in which images arrive with contexts (metadata) and need to be processed in real time by the image mining system (IMS), which needs to make predictions and derive actionable intelligence from these streams. After extracting the features of the image by preprocessing, IMS determines online the classifier to use on the extracted features to make a prediction using the context of the image. A key challenge associated with stream mining is that the prediction accuracy of the classifiers is unknown, since the image source is unknown; thus, these accuracies need to be learned online. Another key challenge of stream mining is that learning can only be done by observing the true label, but this is costly to obtain. To address these challenges, we model the image stream mining problem as an active, online contextual experts problem, where the context of the image is used to guide the classifier selection decision. We develop an active learning algorithm and show that it achieves regret sublinear in the number of images that have been observed so far. To further illustrate and assess the performance of our proposed methods, we apply them to diagnose breast cancer from the images of cellular samples obtained from the fine needle aspirate of breast mass. Our findings show that very high diagnosis accuracy can be achieved by actively obtaining only a small fraction of true labels through surgical biopsies. Other applications include video surveillance and video traffic monitoring.

  1. Middle matching mining algorithm

    GUO Ping; CHEN Li


    A new algorithm for fast discovery of sequential patterns to solve the problems of too many candidate sets made by SPADE is presented, which is referred to as middle matching algorithm. Experiments on a large customer transaction database consisting of customer_id, transaction time, and transaction items demonstrate that the proposed algorithm performs better than SPADE attributed to its philosophy to generate a candidate set by matching two sequences in the middle place so as to reduce the number of the candidate sets.

  2. Frequent Itemsets Mining Algorithm Based on Distributed Data Stream of Sensor Network%传感器网络分布式数据流的频繁项集挖掘算法



    This paper mainly studied data stream frequent itemsets mining problem of wireless sensor network. Aiming at the characteristics of sensor networks that centralized static data stream frequent itemset mining method cannot be directly used in sensor network,a frequent itemset mining algorithm FIMDS based on distributed data stream of sensor network was proposed. Based on FP-tree, the algorithm can fast mine the single data stream local frequent Itemsets of sensor nodes,and then through the routing, the local frequent itemsets are uploaded and combined layer-by-layer, and last local frequent itemsets collected on the sink node and global frequent itemsets are got by the top-down efficient pruning strategy. The experimental results show that the algorithm can effectively and greatly reduce candidate item-sets, and reduces the amount of communication traffic in wireless sensor networks, so the algorithm has good performance in time and space.%研究无线传感器网络中数据流频繁项集挖掘问题.针对集中式的静态数据流频繁项集挖掘方法不能在传感器网络中直接使用这一特点,提出基于传感器网络的分布式数据流的频繁项集挖掘算法FIMDS.该算法基于FP-tree快速挖掘出传感器节点上单一数据流的局部频繁项集,然后通过路由将其在无线传感器网络里逐层上传合并,在Sink节点上汇聚后,采用自顶向下的高效剪枝策略挖掘出全局频繁项集.实验结果表明,该算法能有效地大幅度减少候选项集,降低无线传感器网络中的通信量,并有较高的时间和空间效率.

  3. Towards a New Approach for Mining Frequent Itemsets on Data Stream

    Shailendra Jain


    Full Text Available From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Nowadays, many organizations generate and utilize vast data streams (Huang, 2002. Employing data mining schemes on such massive data streams can unearth real-time trends and patterns which can be utilized for dynamic and timely decisions. Mining in such a high speed, enormous data streams significantly differs from traditional data mining in several ways. Firstly, the response time of the mining algorithm should be as small as possible due to the online nature of the data and limited resources dedicated to mining activities (Charikar, 2004. Second, the underlying data is highly volatile and subject to change over period of time (Chang, 2003. Moreover, since there is no time for preprocessing the data in order to remove noise, the streamed data can have noise inherent in it. Due to all aforementioned problems, data stream mining is receiving increasing attention and current research is now focused on the efficient resolution to the problem cited above. Although, the field of data stream mining is being heavily investigated, there is still a lack of a holistic and generic approach for mining association rules from data streams. Thus, this research attempts to fill this gap by integrating ideas from previous work in data stream mining. This investigation focuses on the degree of effectiveness of using a probabilistic approach of sampling in the data stream together with an incremental approach to maintenance of frequent

  4. Mining Concurrent Topical Activity in Microblog Streams

    Panisson, A; Quaggiotto, M; Cattuto, C


    Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time. Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization. We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule. Our results show that, given appropriate techn...

  5. Link mining models, algorithms, and applications

    Yu, Philip S; Faloutsos, Christos


    This book presents in-depth surveys and systematic discussions on models, algorithms and applications for link mining. Link mining is an important field of data mining. Traditional data mining focuses on 'flat' data in which each data object is represented as a fixed-length attribute vector. However, many real-world data sets are much richer in structure, involving objects of multiple types that are related to each other. Hence, recently link mining has become an emerging field of data mining, which has a high impact in various important applications such as text mining, social network analysi

  6. Chopper:Efficient Algorithm for Tree Mining

    Chen Wang; Ming-Sheng Hong; Wei Wang; Bai-Le Shi


    With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, named Chopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperforms TreeMinerV, one of the fastest methods proposed previously, in mining large databases. At the end of this paper,the potential improvement of Chopper is mentioned.

  7. Contrast data mining concepts, algorithms, and applications

    Dong, Guozhu


    A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies

  8. The Top Ten Algorithms in Data Mining

    Wu, Xindong


    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  9. Frequent Pattern Mining Algorithms for Data Clustering

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles


    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  10. URL Mining Using Agglomerative Clustering Algorithm

    Chinmay R. Deshmukh


    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  11. Recovery of a mining-damaged stream ecosystem

    Mebane, Christopher A.; Robert J. Eakins; Brian G. Fraser; William J. Adams


    Abstract This paper presents a 30+ year record of changes in benthic macroinvertebrate communities and fish populations associated with improving water quality in mining-influenced streams. Panther Creek, a tributary to the Salmon River in central Idaho, USA suffered intensive damage from mining and milling operations at the Blackbird Mine that released copper (Cu), arsenic (As), and cobalt (Co) into tributaries. From the 1960s through the 1980s, no fish and few aquatic invertebrates could be...

  12. PRESEE: an MDL/MML algorithm to time-series stream segmenting.

    Xu, Kaikuo; Jiang, Yexi; Tang, Mingjie; Yuan, Changan; Tang, Changjie


    Time-series stream is one of the most common data types in data mining field. It is prevalent in fields such as stock market, ecology, and medical care. Segmentation is a key step to accelerate the processing speed of time-series stream mining. Previous algorithms for segmenting mainly focused on the issue of ameliorating precision instead of paying much attention to the efficiency. Moreover, the performance of these algorithms depends heavily on parameters, which are hard for the users to set. In this paper, we propose PRESEE (parameter-free, real-time, and scalable time-series stream segmenting algorithm), which greatly improves the efficiency of time-series stream segmenting. PRESEE is based on both MDL (minimum description length) and MML (minimum message length) methods, which could segment the data automatically. To evaluate the performance of PRESEE, we conduct several experiments on time-series streams of different types and compare it with the state-of-art algorithm. The empirical results show that PRESEE is very efficient for real-time stream datasets by improving segmenting speed nearly ten times. The novelty of this algorithm is further demonstrated by the application of PRESEE in segmenting real-time stream datasets from ChinaFLUX sensor networks data stream.

  13. Recovery of a mining-damaged stream ecosystem

    Mebane, Christopher A.; Eakins, Robert J.; Fraser, Brian G.; Adams, William J.


    This paper presents a 30+ year record of changes in benthic macroinvertebrate communities and fish populations associated with improving water quality in mining-influenced streams. Panther Creek, a tributary to the Salmon River in central Idaho, USA suffered intensive damage from mining and milling operations at the Blackbird Mine that released copper (Cu), arsenic (As), and cobalt (Co) into tributaries. From the 1960s through the 1980s, no fish and few aquatic invertebrates could be found in 40 km of mine-affected reaches of Panther Creek downstream of the metals contaminated tributaries, Blackbird and Big Deer Creeks.

  14. A Quick Algorithm for Mining Exceptional Rules


    Exceptional rules are often ignored because of their small support. However, they have high confidence, so they are useful sometimes. A new algorithm for mining exceptional rules is presented, which creates a large itemset from a relatively small database and scans the whole database only one time to generate all exceptional rules. This algorithm is proved to be quick and effective through its application in a mushroom database.

  15. A New Algorithm for Mining Frequent Pattern

    李力; 靳蕃


    Mining frequent pattern in transaction database, time-series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori-like candidate set generation-and-test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP-growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP-growth, this paper proposes a concept of equivalent FP-tree and proposes an improved algorithm, denoted as FP-growth*, which is much faster in speed, and easy to realize. FP-growth* adopts a modified structure of FP-tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP-tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP-growth*, is at least two times as fast as that of FP-growth.

  16. Optimal Rate Allocation Algorithm for Multiple Source Video Streaming

    戢彦泓; 郭常杰; 钟玉琢; 孙立峰


    Video streaming is one of the most important applications used in the best-effort Internet.This paper presents a new scheme for multiple source video streaming in which the traditional fine granular scalable coding was rebuilt into a multiple sub-streams based transmission model.A peak signal to noise ratio based stream rate allocation algorithm was then developed based on the transmission model.In tests,the algorithm performance is about 1 dB higher than that of a uniform rate allocation algorithm.Therefore,this scheme can overcome bottlenecks along a single link and smooth jitter to achieve high quality and stable video.

  17. Web Mining Using PageRank Algorithm

    Vignesh. V


    Full Text Available Data mining is extracting and automatic discovering the web based information has been used as web mining. It is one of the most universal and a dominant application on the Internet and it becomes increasing in size and search tools that combine the results of multiple search engines are becoming more valuable. But, almost none of these studies deals with genetic relation algorithm (GRA, where GRA is one of the evolutionary methods with graph structure. GRA was designed to both increase the effectiveness of search engine and improve their efficiency. GRA considers the correlation coefficient between stock brands as strength, which indicates the relation between nodes in each individual of GRA. The reduced number of hyperlinks provided by GRA in the final generation consists of only the most similar hyperlinks with respect to the query. But, the end user’s not satisfied fully. To improve the satisfaction of user by using Page rank algorithm to measure the importance of a page and to prioritize pages returned from a GRA. It will reduce the user’s searching time. PageRank algorithm works to allocate rank for filtered links based on number of keyword occurred in the content.

  18. Enterprise Human Resources Information Mining Based on Improved Apriori Algorithm

    Lei He


    Full Text Available With the unceasing development of information and technology in today’s modern society, enterprises’ demand of human resources information mining is getting bigger and bigger. Based on the enterprise human resources information mining situation, this paper puts forward a kind of improved Apriori algorithm based model on the enterprise human resources information mining, this model introduced data mining technology and traditional Apriori algorithm, and improved on its basis, divided the association rules mining task of the original algorithm into two subtasks of producing frequent item sets and producing rule, using SQL technology to directly generating frequent item sets, and using the method of establishing chart to extract the information which are interested to customers. The experimental results show that the improved Apriori algorithm based model on the enterprise human resources information mining is better in efficiency than the original algorithm, and the practical application test results show that the improved algorithm is practical and effective.

  19. Decomposition of Data Mining Algorithms into Unified Functional Blocks

    Ivan Kholod


    Full Text Available The present paper describes the method of creating data mining algorithms from unified functional blocks. This method splits algorithms into independently functioning blocks. These blocks must have unified interfaces and implement pure functions. The method allows us to create new data mining algorithms from existing blocks and improves the existing algorithms by optimizing single blocks or the whole structure of the algorithms. This becomes possible due to a number of important properties inherent in pure functions and hence functional blocks.

  20. An Association Rule Mining Algorithm Based on a Boolean Matrix

    Hanbing Liu


    Full Text Available Association rule mining is a very important research topic in the field of data mining. Discovering frequent itemsets is the key process in association rule mining. Traditional association rule algorithms adopt an iterative method to discovery, which requires very large calculations and a complicated transaction process. Because of this, a new association rule algorithm called ABBM is proposed in this paper. This new algorithm adopts a Boolean vector "relational calculus" method to discovering frequent itemsets. Experimental results show that this algorithm can quickly discover frequent itemsets and effectively mine potential association rules.

  1. An efficient algorithm for mining closed itemsets.

    Liu, Jun-qiang; Pan, Yun-he


    This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overhead of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.

  2. An efficient algorithm for mining closed itemsets

    LIU Jun-qiang (刘君强); PAN Yun-he (潘云鹤)


    This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overheads of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.

  3. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities

    Clapcott, Joanne E.; Goodwin, Eric O.; Harding, Jon S.


    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  4. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities.

    Clapcott, Joanne E; Goodwin, Eric O; Harding, Jon S


    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  5. A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

    Foo, Brian; van der Schaar, Mihaela


    In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.

  6. Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends

    Lelu, Alain; Cuxac, Pascal


    We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.

  7. An enhanced stream mining approach for network anomaly detection

    Bellaachia, Abdelghani; Bhatt, Rajat


    Network anomaly detection is one of the hot topics in the market today. Currently, researchers are trying to find a way in which machines could automatically learn both normal and anomalous behavior and thus detect anomalies if and when they occur. Most important applications which could spring out of these systems is intrusion detection and spam mail detection. In this paper, the primary focus on the problem and solution of "real time" network intrusion detection although the underlying theory discussed may be used for other applications of anomaly detection (like spam detection or spy-ware detection) too. Since a machine needs a learning process on its own, data mining has been chosen as a preferred technique. The object of this paper is to present a real time clustering system; we call Enhanced Stream Mining (ESM) which could analyze packet information (headers, and data) to determine intrusions.


    Rolly Intan


    Full Text Available Association rule mining searches for interesting relationship among items in a large data set. Market basket analysis, a typical example of association rule mining, analyzes buying habit of customers by finding association between the different items that customers put in their shopping cart (basket. Apriori algorithm is an influential algorithm for mining frequent itemset for generating association rules. For some reasons, Apriori algorithm is not based on human intuitive. To provide a more human-based concept, this paper proposes an alternative algorithm for generating the association rule by utilizing fuzzy sets in the market basket analysis.

  9. Online Feature Extraction Algorithms for Data Streams

    Ozawa, Seiichi

    Along with the development of the network technology and high-performance small devices such as surveillance cameras and smart phones, various kinds of multimodal information (texts, images, sound, etc.) are captured real-time and shared among systems through networks. Such information is given to a system as a stream of data. In a person identification system based on face recognition, for example, image frames of a face are captured by a video camera and given to the system for an identification purpose. Those face images are considered as a stream of data. Therefore, in order to identify a person more accurately under realistic environments, a high-performance feature extraction method for streaming data, which can be autonomously adapted to the change of data distributions, is solicited. In this review paper, we discuss a recent trend on online feature extraction for streaming data. There have been proposed a variety of feature extraction methods for streaming data recently. Due to the space limitation, we here focus on the incremental principal component analysis.

  10. Data mining theories, algorithms, and examples

    Ye, Nong


    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  11. Recovery of a mining-damaged stream ecosystem

    Christopher A. Mebane


    Full Text Available Abstract This paper presents a 30+ year record of changes in benthic macroinvertebrate communities and fish populations associated with improving water quality in mining-influenced streams. Panther Creek, a tributary to the Salmon River in central Idaho, USA suffered intensive damage from mining and milling operations at the Blackbird Mine that released copper (Cu, arsenic (As, and cobalt (Co into tributaries. From the 1960s through the 1980s, no fish and few aquatic invertebrates could be found in 40 km of mine-affected reaches of Panther Creek downstream of the metals contaminated tributaries, Blackbird and Big Deer Creeks. Efforts to restore water quality began in 1995, and by 2002 Cu levels had been reduced by about 90%, with incremental declines since. Rainbow Trout (Oncorhynchus mykiss were early colonizers, quickly expanding their range as areas became habitable when Cu concentrations dropped below about 3X the U.S. Environmental Protection Agency’s biotic ligand model (BLM based chronic aquatic life criterion. Anadromous Chinook Salmon (O. tshawytscha and steelhead (O. mykiss have also reoccupied Panther Creek. Full recovery of salmonid populations occurred within about 12-years after the onset of restoration efforts and about 4-years after the Cu chronic criteria had mostly been met, with recovery interpreted as similarity in densities, biomass, year class strength, and condition factors between reference sites and mining-influenced sites. Shorthead Sculpin (Cottus confusus were slower than salmonids to disperse and colonize. While benthic macroinvertebrate biomass has increased, species richness has plateaued at about 70 to 90% of reference despite the Cu criterion having been met for several years. Different invertebrate taxa had distinctly different recovery trajectories. Among the slowest taxa to recover were Ephemerella, Cinygmula and Rhithrogena mayflies, Enchytraeidae oligochaetes, and Heterlimnius aquatic beetles. Potential

  12. Study on water loss of the surface stream affected by longwall mining

    GUO Wen-bing; Syd S.Peng


    In order to study the effect of longwalI mining on suface stream water,monitoring stations of water flow rate was established.A lot of water flowing data were collected before,during and after longwall mining.Based on monitoring data,the effects of longwall mining on surface stream water were analyzed.The results demonstrate that longwall mining has effects on the surface stream water:and the stream water would be lost and decrease due to longwall mining but never go into underground through fractured zone.Also.the mechanism of water loss due to longwall mining was presented.The stream water can go into the surface cracks in the intersection of stream and surface cracks.longwall mining subsidence can change the surface stream slope and the downstream water flowing status.The results also show the effects of longwall mining on stream water are temporary and about one or two years later,surface stream water can be recovered.

  13. Efficient mining of association rules based on gravitational search algorithm

    Fariba Khademolghorani


    Full Text Available Association rules mining are one of the most used tools to discover relationships among attributes in a database. A lot of algorithms have been introduced for discovering these rules. These algorithms have to mine association rules in two stages separately. Most of them mine occurrence rules which are easily predictable by the users. Therefore, this paper discusses the application of gravitational search algorithm for discovering interesting association rules. This evolutionary algorithm is based on the Newtonian gravity and the laws of motion. Furthermore, contrary to the previous methods, the proposed method in this study is able to mine the best association rules without generating frequent itemsets and is independent of the minimum support and confidence values. The results of applying this method in comparison with the method of mining association rules based upon the particle swarm optimization show that our method is successful.


    D.Kerana Hanirex


    Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.

  15. A Mining Algorithm for Extracting Decision Process Data Models

    Cristina-Claudia DOLEAN


    Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.

  16. A Fast Algorithm for Mining Sequential Patterns from Large Databases

    CHEN Ning; CHEN An; ZHOU Longxiang; LIU Lu


    Mining sequential patterns from large databases has been recognized by many researchers as an attractive task of data mining and knowledge discovery. Previous algorithms scan the databases for many times, which is often unendurable due to the very large amount of databases. In this paper, the authors introduce an effective algorithm for mining sequential patterns from large databases.In the algorithm, the original database is not used at all for counting the support of sequences after the first pass. Rather, a tidlist structure generated in the previous pass is employed for the purpose based on set intersection operations, avoiding the multiple scans of the databases.

  17. Mining The Data From Distributed Database Using An Improved Mining Algorithm

    Renjit, J Arokia


    Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data sets from different sites incurs huge network communication costs. In this paper, an Improved algorithm based on good performance level for data mining is being proposed. In local sites, it runs the application based on the improved LMatrix algorithm, which is used to calculate local support counts. Local Site also finds a centre site to manage every message exchanged to obtain all globally frequent item sets. It also reduces the time of scan of partition database by using LMatrix which increases the performance of the algorithm. Therefore, the research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs, superior running efficiency, and stronger scalability than direct application of a sequential algorithm in d...

  18. Performance of a streaming mesh refinement algorithm.

    Thompson, David C.; Pebay, Philippe Pierre


    In SAND report 2004-1617, we outline a method for edge-based tetrahedral subdivision that does not rely on saving state or communication to produce compatible tetrahedralizations. This report analyzes the performance of the technique by characterizing (a) mesh quality, (b) execution time, and (c) traits of the algorithm that could affect quality or execution time differently for different meshes. It also details the method used to debug the several hundred subdivision templates that the algorithm relies upon. Mesh quality is on par with other similar refinement schemes and throughput on modern hardware can exceed 600,000 output tetrahedra per second. But if you want to understand the traits of the algorithm, you have to read the report!

  19. Algorithms for Constructing Overlay Networks For Live Streaming

    Andreev, Konstantin; Meyerson, Adam; Saks, Jevan; Sitaraman, Ramesh K


    We present a polynomial time approximation algorithm for constructing an overlay multicast network for streaming live media events over the Internet. The class of overlay networks constructed by our algorithm include networks used by Akamai Technologies to deliver live media events to a global audience with high fidelity. We construct networks consisting of three stages of nodes. The nodes in the first stage are the entry points that act as sources for the live streams. Each source forwards each of its streams to one or more nodes in the second stage that are called reflectors. A reflector can split an incoming stream into multiple identical outgoing streams, which are then sent on to nodes in the third and final stage that act as sinks and are located in edge networks near end-users. As the packets in a stream travel from one stage to the next, some of them may be lost. A sink combines the packets from multiple instances of the same stream (by reordering packets and discarding duplicates) to form a single in...

  20. Indirect associations between multiple items and a mining algorithm

    Ni Min; Xu Xiaofei; Deng Shengchun


    Indirect association is a high level relationship between items and frequent itemsets in data. Current research approaches on indirect association mining are limited to indirect association between itempairs, which will discovertoo many rules from dataset. A formal definition of indirect association between multiple items is presented, along with an algorithm, SET-NIA,for mining this kind of indirect associations based on anti-monotonicity of indirect associations and frequent itempair support matrix. While the found rules contain same information as compared to the rules found by indirect association between itempairs mining algorithms, this notion brings space-saving in storage ofthe rules as well as superiority for human to understand and apply the rules. Experiments conducted on two real-word datasets show that SET-NIA can effectively find fewer rules than existing algorithms which mine indirect association between itempairs, the experimental results also prove that SET-NIA has better performance than existing algorithms.

  1. A Study on Selective Data Mining Algorithms

    A. N. Pathak


    Full Text Available Data mining is a field of database application that searches for unknown patterns in data that can be used to predict future behavior. Basically data mining is a technique not to change the presentation but to discover unknown relationships between the data. Data mining is termed as software, which is used to describe data in a new way, which is not true.

  2. A Survey of Association Rule Mining Using Genetic Algorithm

    Anubha Sharma


    Full Text Available Data mining is the analysis step of the "Knowledge Discovery in Databases" process, or KDD. It is the process that results in the discovery of new patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract knowledge from an existing data set and transform it into a human-understandable structure. In data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Genetic algorithm (GA is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms, which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. In previous, many researchers have proposed Genetic Algorithms for mining interesting association rules from quantitative data. In this paper we represent a survey of Association Rule Mining Using Genetic Algorithm. The techniques are categorized based upon different approaches. This paper provides the major advancement in the approaches for association rule mining using genetic algorithms.

  3. Impact of potash mining in streams: the Llobregat basin (northeast Spain as a case study

    Ruben Ladrera


    Full Text Available Potash mining is significantly increasing the salt concentration of rivers and streams due to lixiviates coming from the mine tailings. In the present study, we have focused on the middle Llobregat basin (northeast Spain, where an important potash mining activity exists from the beginning of the XX century. Up to 50 million tonnes of saline waste have been disposed in the area, mainly composed of sodium chloride. We assessed the ecological status of streams adjacent to the mines by studying different physicochemical and hydromorphological variables, as well as aquatic macroinvertebrates. We found extraordinary high values of salinity in the studied streams, reaching conductivities up to 132.4 mS/cm. Salt-polluted streams were characterized by a deterioration of the riparian vegetation and the fluvial habitat. Both macroinvertebrate richness and abundance decreased with increasing salinity. In the most polluted stream only two families of macroinvertebrates were found: Ephydridae and Ceratopogonidae. According to the biotic indices IBMWP and IMMi-T, none of the sites met the requirements of the Water Framework Directive (WFD; i.e., good ecological status. Overall, we can conclude that potash-mining activities have the potential to cause severe ecological damage to their surrounding streams. This is mainly related to an inadequate management of the mine tailings, leading to highly saline runoff and percolates entering surface waters. Thus, we urge water managers and policy makers to take action to prevent, detect and remediate salt pollution of rivers and streams in potash mining areas.

  4. An Efficient Algorithm for Mining Maximal Frequent Item Sets

    A. M.J.M.Z. Rahman


    Full Text Available Problem Statement: In today's life, the mining of frequent patterns is a basic problem in data mining applications. The algorithms which are used to generate these frequent patterns must perform efficiently. The objective was to propose an effective algorithm which generates frequent patterns in less time. Approach: We proposed an algorithm which was based on hashing technique and combines a vertical tidset representation of the database with effective pruning mechanisms. It removes all the non-maximal frequent item-sets to get exact set of MFI directly. It worked efficiently when the number of item-sets and tid-sets is more. Results: The performance of our algorithm had been compared with recently developed MAFIA algorithm and the results show how our algorithm gives better performance. Conclusions: Hence, the proposed algorithm performs effectively and generates frequent patterns faster.

  5. A New Parallel Algorithm for Mining Association Rules

    DING Yan-hui; WANG Hong-guo; GAO Ming; GU Jian-jun


    Mining association rules from large database is very costly.We develop a parallel algorithm for this task on sharedmemory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to scan the database at least two times. In this article, a parallel algorithm Scan Once (SO) has been proposed for SMP,which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm Count Distribution (CD). It adopts bit matrix to store the database information and gets the support of the frequent itemsets by adopting Vector-And-Operation, which greatly improve the efficiency of generating all frequent itemsets.Empirical evaluation shows that the algorithm outperforms the known one CD algorithm.

  6. Effects of coal strip mining on stream water quality and biology, southwestern Washington

    Fuste, L.A.; Meyer, D.F.


    Strip mining for coal in southwestern Washington may be affecting the water quality of streams. To investigate these possible effects, five streams were selected for study of water quality in each of the two coal bearing areas: the Centralia-Chehalis coal district, and Kelso-Castle Rock coal area. In the Centralia-Chehalis coal district, three of the streams have drainage basins in which mines are active. Water in streams that drain unmined basins is typical of western Washington streams and is characterized as a mixed water because calcium, magnesium, sodium, and bicarbonate ions predominate. A change in anionic composition from bicarbonate to sulfate in streams draining mined areas was not sufficient to change the general water composition and thus make the streams acidic. The largest downstream changes in water quality in both mined and unmined drainage basins were observed during summer low-flow conditions, when minimal dilution, increased water temperatures, and low dissolved oxygen concentrations occurred. High dissolved solids were found in the mined drainage basins during this period. High concentrations of iron, manganese, and zinc were present in the bottom sediments of the mined basins. Moderate concentrations of chromium, cobalt, copper, and zinc were also found in the bottom sediments of a few unmined basins. Streams with substrates of gravel-cobble or gravel-coarse sand had the most diverse benthic fauna and a higher number of ubiquitous taxa than streams with sand-silt substrates, which had the most dissimilar fauna. Mayflies, stoneflies, and caddisflies were rare at the site most affected by mining. The erosion potential of a basin appears to be related to the average basin slope and the amount of forested areas. Strip mining for coal in steep basins may lead to massive movements of unconsolidated spoils after vegetal cover is removed if the land disturbed is graded to pre-mining slopes. (Lantz-PTT)

  7. Evaluation of coal-mining impacts using numerical classification of benthic invertebrate data from streams draining a heavily mined basin in eastern Tennessee

    Bradfield, A.D.


    Coal-mining impacts on Smoky Creek, eastern Tennessee were evaluated using water quality and benthic invertebrate data. Data from mined sites were also compared with water quality and invertebrate fauna found at Crabapple Branch, an undisturbed stream in a nearby basin. Although differences in water quality constituent concentrations and physical habitat conditions at sampling sites were apparent, commonly used measures of benthic invertebrate sample data such as number of taxa, sample diversity, number of organisms, and biomass were inadequate for determining differences in stream environments. Clustering algorithms were more useful in determining differences in benthic invertebrate community structure and composition. Normal (collections) and inverse (species) analyses based on presence-absence data of species of Ephemeroptera, Plecoptera, and Tricoptera were compared using constancy, fidelity, and relative abundance of species found at stations with similar fauna. These analyses identified differences in benthic community composition due to seasonal variations in invertebrate life histories. When data from a single season were examined, sites on tributary streams generally clustered separately from sites on Smoky Creek. These analyses compared with differences in water quality, stream size, and substrate characteristics between tributary sites and the more degraded main stem sites, indicated that numerical classification of invertebrate data can provide discharge-independent information useful in rapid evaluations of in-stream environmental conditions. (Author 's abstract)

  8. Gesture Recognition from Data Streams of Human Motion Sensor Using Accelerated PSO Swarm Search Feature Selection Algorithm

    Simon Fong


    Full Text Available Human motion sensing technology gains tremendous popularity nowadays with practical applications such as video surveillance for security, hand signing, and smart-home and gaming. These applications capture human motions in real-time from video sensors, the data patterns are nonstationary and ever changing. While the hardware technology of such motion sensing devices as well as their data collection process become relatively mature, the computational challenge lies in the real-time analysis of these live feeds. In this paper we argue that traditional data mining methods run short of accurately analyzing the human activity patterns from the sensor data stream. The shortcoming is due to the algorithmic design which is not adaptive to the dynamic changes in the dynamic gesture motions. The successor of these algorithms which is known as data stream mining is evaluated versus traditional data mining, through a case of gesture recognition over motion data by using Microsoft Kinect sensors. Three different subjects were asked to read three comic strips and to tell the stories in front of the sensor. The data stream contains coordinates of articulation points and various positions of the parts of the human body corresponding to the actions that the user performs. In particular, a novel technique of feature selection using swarm search and accelerated PSO is proposed for enabling fast preprocessing for inducing an improved classification model in real-time. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms and incorporation of the novel improved feature selection technique with a scenario where different gesture patterns are to be recognized from streaming sensor data.

  9. Image Encryption Using a Lightweight Stream Encryption Algorithm

    Saeed Bahrami


    Full Text Available Security of the multimedia data including image and video is one of the basic requirements for the telecommunications and computer networks. In this paper, we consider a simple and lightweight stream encryption algorithm for image encryption, and a series of tests are performed to confirm suitability of the described encryption algorithm. These tests include visual test, histogram analysis, information entropy, encryption quality, correlation analysis, differential analysis, and performance analysis. Based on this analysis, it can be concluded that the present algorithm in comparison to A5/1 and W7 stream ciphers has the same security level, is better in terms of the speed of performance, and is used for real-time applications.

  10. A Fast Algorithm for Mining Association Rules

    黄刘生; 陈华平; 王洵; 陈国良


    In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm,BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.

  11. Randomized algorithms in automatic control and data mining

    Granichin, Oleg; Toledano-Kitai, Dvora


    In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.

  12. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Jerry Chun-Wei Lin


    Full Text Available Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.


    Xu Baowen; Yi Tong; Wu Fangjun; Chen Zhenqiang


    In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.

  14. Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

    Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.


    Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161

  15. Interactive evolutionary algorithms and data mining for drug design

    Lameijer, Eric Marcel Wubbo


    One of the main problems of drug design is that it is quite hard to discover compounds that have all the required properties to become a drug (efficacy against the disease, good biological availability, low toxicity). This thesis describes the use of data mining and interactive evolutionary algorithms to design novel classes of molecules. Using data mining, we split a 250,000 compound database into ring systems, substituents and linkers. We then counted the occurrence of the different fragmen...

  16. NIA2: A fast indirect association mining algorithm

    NI Min; XU Xiao-fei; DENG Sheng-chun; WEN Xiao-xian


    Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis,web - log analysis, recommended system, etc. Existing indirect association mining algorithms are mostly based on the notion of post - processing of discovery of frequent item sets. In the mining process, all frequent item sets need to be generated first, and then they are filtered and joined to form indirect associations. We have presented an indirect association mining algorithm (NIA) based on anti - monotonicity of indirect associations whereas k candidate indirect associations can be generated directly from k - 1 candidate indirect associations,without all frequent item sets generated. We also use the frequent itempair support matrix to reduce the time and memory space needed by the algorithm. In this paper, a novel algorithm (NIA2) is introduced based on the generation of indirect association patterns between itempairs through one item mediator sets from frequent itempair support matrix. A notion of mediator set support threshold is also presented. NIA2 mines indirect association patterns directly from the dataset, without generating all frequent item sets. The frequent itempair support matrix and the notion of using tm as the support threshold for mediator sets can significantly reduce the cost of joint operations and the search process compared with existing algorithms. Results of experiments on a realword web log dataset have proved NIA2 one order of magnitude faster than existing algorithms.

  17. A New Hybrid Algorithm for Association Rule Mining

    ZHANG Min-cong; YAN Cun-liang; ZHU Kai-yu


    HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ItemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (direct-addressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.

  18. CIMTEL- Mining Algorithm for Big Data in Telecommunication



    Full Text Available The field of data mining has flourished into research area of significant technological and social importance due to the advancement in technology. Mining frequent pattern or itemset from real data environment is a fundamental and essential problem in many data mining applications. The Apriori-inspired algorithms show good performance with sparse datasets such as market-basket data, where the frequent patterns are very short. However, in the area with dense datasets such as telecommunication, computational biology and census data, the performance of these algorithms degrades incredibly as there were many, long frequent patterns. The focus of this paper is to design a CPU- efficient algorithm CIMTEL for finding closed frequent calling patterns (long pattern in a telecommunication database. Due to the evaluation of next generation telecommunication network the amount of communication data’s rises in volume, vaiety and velocity (Bid Data. Thus algorithm provides an way for analysis the consumer for service provider. The Performance of this algorithm outperforms the former COLTEL , CHARM and EXPEDITE algorithm by an order of two for a worst case scenario. Also the performance analysis of this algorithm with former algorithms is determined.

  19. Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm



    Full Text Available In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA, is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets.

  20. Energy-Efficient Routing Algorithm for WSNs in Underground mining

    Feng Wang


    Full Text Available Wireless Sensor Networks (WSNs technology has been used for security monitoring to provide a safe working environment for Underground mining miners. But, energy consumption is an important issue in the design of routing protocol in wireless sensor networks. For the problems of coal mine chain-type network topology, unbalanced information flow and energy consumption among nodes is serious uneven, we proposed a Liner Energy-Balanced Uneven Cluster Routing Protocol Algorithm for Underground Mining. The cluster head election mechanism of the algorithm considered the distance between nodes and base station node, nodes energy and nodes density. At the same time, the algorithm reduces the amount of computation when candidate nodes campaigning. We evaluate LEBUC and the results show that LEBUC effectively graded the wireless sensor network of coal mine’s "hot zone" problem and reduces network traffic and energy consumption. It extends the survival time of the network significantly.

  1. Integrating Economic Knowledge in Data Mining Algorithms

    Daniëls, H.A.M.; Feelders, A.J.


    The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this

  2. Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

    Abdous, M'hammed; He, Wu


    Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…

  3. Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

    Abdous, M'hammed; He, Wu


    Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…

  4. An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time

    Sujni Paul


    Many current data mining tasks can be accomplished successfully only in a distributed setting. The field of distributed data mining has therefore gained increasing importance in the last decade. The Apriori algorithm by Rakesh Agarwal has emerged as one of the best Association Rule mining algorithms. Ii also serves as the base algorithm for most parallel algorithms. The enormity and high dimensionality of datasets typically available as input to problem of association rule discovery, makes it...

  5. An Efficient Hybrid Algorithm for Mining Web Frequent Access Patterns

    ZHAN Li-qiang; LIU Da-xin


    We propose an efficient hybrid algorithm WDHP in this paper for mining frequent access patterns.WDHP adopts the techniques of DHP to optimize its performance, which is using hash table to filter candidate set and trimming database.Whenever the database is trimmed to a size less than a specified threshold, the algorithm puts the database into main memory by constructing a tree, and finds frequent patterns on the tree.The experiment shows that WDHP outperform algorithm DHP and main memory based algorithm WAP in execution efficiency.

  6. Study And Implementation Of LCS Algorithm For Web Mining

    Vrishali P. Sonavane


    Full Text Available The Internet is the roads and the highways in the information World, the content providers are the road workers, and the visitors are the drivers. As in the real world, there can be traffic jams, wrong signs, blind alleys, and so on. The content providers, as the road workers, need information about their users to make possible Web site adjustments. Web logs store every motion on the provider's Web site. So the providers need only a tool to analyze these logs. This tool is called Web Usage Mining. Web Usage Mining is a part of Web Mining. It is the foundation for a Web site analysis. It employs various knowledge discovery methods to gain Web usage patterns. In this paper we used LCS algorithm for improving accuracy of recommendation. The Expremental results show that the approach can improve accuracy of classification in the architecture. Using LCS algorithm we can predict users future request more accurately.

  7. Clone Detection Using DIFF Algorithm For Aspect Mining

    Rowyda Mohammed Abd El-Aziz


    Full Text Available Aspect mining is a reverse engineering process that aims at mining legacy systems to discover crosscutting concerns to be refactored into aspects. This process improves system reusability and maintainability. But, locating crosscutting concerns in legacy systems manually is very difficult and causes many errors. So, there is a need for automated techniques that can discover crosscutting concerns in source code. Aspect mining approaches are automated techniques that vary according to the type of crosscutting concerns symptoms they search for. Code duplication is one of such symptoms which risks software maintenance and evolution. So, many code clone detection techniques have been proposed to find this duplicated code in legacy systems. In this paper, we present a clone detection technique to extract exact clones from object-oriented source code using Differential File Comparison Algorithm (DIFF to improve system reusability and maintainability which is a major objective of aspect mining.

  8. An Application of Data Mining Algorithms for Shipbuilding Cost Estimation

    Kaluzny, B.L.; Barbici, S.; Berg, G.; Chiomento, R.; Derpanis,D.; Jonsson, U.; Shaw, R.H.A.D.; Smit, M.C.; Ramaroson, F.


    This article presents a novel application of known data mining algorithms to the problem of estimating the cost of ship development and construction. The work is a product of North Atlantic Treaty Organization Research and Technology Organization Systems Analysis and Studies 076 Task Group “NATO Ind

  9. Analysing Customer Opinions with Text Mining Algorithms

    Consoli, Domenico


    Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.

  10. A partition enhanced mining algorithm for distributed association rule mining systems

    A.O. Ogunde


    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  11. The efficiency of algorithms DATA MINING

    Mohamed El far


    Full Text Available This paper presents a new classification and search method of 3D object features views. This method is an application of algorithms: Close+ for an object views classification purpose  Algorithm for extracting association rules in order to extract the characteristic view. We use the geometric descriptor of Zernike Moments to index 2D views of 3D object. The proposed method relies on a Bayesian probabilistic approach for search queries. The resulting outcome is presented by a collection of 120 3D models of the Princeton-based benchmark and then compared to those obtained from conventional methods.


    XuBaowen; YiTong; 等


    In this letter,on the basis of Frequent Pattern(FP) tree,the support function to update FP-tree is introduced,then an incremental FP(IFP) algorithm for mining association rules is proposed.IFP algorithm considers not only adding new data into the database but also reducing old data from the database.Furthermore,it can predigest five cases to three case .The algorithm proposed in this letter can avoid generating lots of candidate items,and it is high efficient.

  13. An Approximate L p Difference Algorithm for Massive Data Streams

    Jessica H. Fong


    Full Text Available Several recent papers have shown how to approximate the difference ∑ i |a i-b i | or ∑|a i-b i | 2 between two functions, when the function values a i and b i are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream and little time to process each item in the stream. They approximate with small relative error. Using different techniques, we show how to approximate the L p-difference ∑ i |a i-b i | p for any rational-valued p∈(0,2], with comparable efficiency and error. We also show how to approximate ∑ i |a i-b i | p for larger values of p but with a worse error guarantee. Our results fill in gaps left by recent work, by providing an algorithm that is precisely tunable for the application at hand. These results can be used to assess the difference between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buffering the data or requiring the data source to pause. For example, one can use our techniques to judge whether the traffic on two remote network routers are similar without requiring either router to transmit a copy of its traffic. A web search engine could use such algorithms to construct a library of small ``sketches,'' one for each distinct page on the web; one can approximate the extent to which new web pages duplicate old ones by comparing the sketches of the web pages. Such techniques will become increasingly important as the enormous scale, distributional nature, and one-pass processing requirements of data sets become more commonplace.

  14. Arsenic partitioning among particle-size fractions of mine wastes and stream sediments from cinnabar mining districts.

    Silva, Veronica; Loredo, Jorge; Fernández-Martínez, Rodolfo; Larios, Raquel; Ordóñez, Almudena; Gómez, Belén; Rucandio, Isabel


    Tailings from abandoned mercury mines represent an important pollution source by metals and metalloids. Mercury mining in Asturias (north-western Spain) has been carried out since Roman times until the 1970s. Specific and non-specific arsenic minerals are present in the paragenesis of the Hg ore deposit. As a result of intensive mining operations, waste materials contain high concentrations of As, which can be geochemically dispersed throughout surrounding areas. Arsenic accumulation, mobility and availability in soils and sediments are strongly affected by the association of As with solid phases and granular size composition. The objective of this study was to examine phase associations of As in the fine grain size subsamples of mine wastes (La Soterraña mine site) and stream sediments heavily affected by acid mine drainage (Los Rueldos mine site). An arsenic-selective sequential procedure, which categorizes As content into seven phase associations, was applied. In spite of a higher As accumulation in the finest particle-size subsamples, As fractionation did not seem to depend on grain size since similar distribution profiles were obtained for the studied granulometric fractions. The presence of As was relatively low in the most mobile forms in both sites. As was predominantly linked to short-range ordered Fe oxyhydroxides, coprecipitated with Fe and partially with Al oxyhydroxides and associated with structural material in mine waste samples. As incorporated into short-range ordered Fe oxyhydroxides was the predominant fraction at sediment samples, representing more than 80% of total As.

  15. Predicting mining activity with parallel genetic algorithms

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,


    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  16. Web Based Genetic Algorithm Using Data Mining

    Ashiqur Rahman; Asaduzzaman Noman; Md. Ashraful Islam; Al-Amin Gaji


    This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize the prediction accuracy and get a marked improvement over raw classification. It further shows that when the number of features is few; fea...

  17. Segment-based traffic smoothing algorithm for VBR video stream


    Transmission of variable bit rate (VBR) video, because of the burstiness of VBR video traffic, has high fluctuation in bandwidth requirement. Traffic smoothing algorithm is very efficient in reducing burstiness of the VBR video stream by transmitting data in a series of fixed rates. We propose in this paper a novel segment-based bandwidth allocation algorithm which dynamically adjusts the segmentation boundary and changes the transmission rate at the latest possible point so that the video segment will be extended as long as possible and the number of rate changes can be as small as possible while keeping the peak rate low. Simulation results showed that our approach has small bandwidth requirement, high bandwidth utilization and low computation cost.

  18. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems

    Brian Foo


    Full Text Available Networks of classifiers can offer improved accuracy and scalability over single classifiers by utilizing distributed processing resources and analytics. However, they also pose a unique combination of challenges. First, classifiers may be located across different sites that are willing to cooperate to provide services, but are unwilling to reveal proprietary information about their analytics, or are unable to exchange their analytics due to the high transmission overheads involved. Furthermore, processing of voluminous stream data across sites often requires load shedding approaches, which can lead to suboptimal classification performance. Finally, real stream mining systems often exhibit dynamic behavior and thus necessitate frequent reconfiguration of classifier elements to ensure acceptable end-to-end performance and delay under resource constraints. Under such informational constraints, resource constraints, and unpredictable dynamics, utilizing a single, fixed algorithm for reconfiguring classifiers can often lead to poor performance. In this paper, we propose a new optimization framework aimed at developing rules for choosing algorithms to reconfigure the classifier system under such conditions. We provide an adaptive, Markov model-based solution for learning the optimal rule when stream dynamics are initially unknown. Furthermore, we discuss how rules can be decomposed across multiple sites and propose a method for evolving new rules from a set of existing rules. Simulation results are presented for a speech classification system to highlight the advantages of using the rules-based framework to cope with stream dynamics.

  19. Developing and Implementing the Data Mining Algorithms in RAVEN

    Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)


    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  20. Physical habitat and water chemistry changes induced by logging and gold mining in French Guiana streams

    Dedieu N.


    Full Text Available Understanding the effects of disturbances on the physical-chemical quality of ecosystems is a crucial step to the development of ecosystem assessment tools. 95 sampling sites distributed among 4 categories of disturbance, i.e.: reference, logging, formerly and currently gold mining, were characterized using stream physical and chemical variables. Our hypotheses were: (i logging and gold mining activities primarily affect the physical habitat structure of streams and (ii both have an effect on chemical environments through nutrient and/or fine particulate resuspension. We demonstrate that physical variables describing the river bottom, and suspended solids discriminate both current and formerly gold mined sites from reference sites, while, whatever the type of impact encountered, nutrient concentrations do not prove relevant to measure human impacts. To understand distribution patterns of aquatic organism across FG, future research should thus aim at examining the match between physical-chemical and biological classifications of small streams under reference and impacted conditions.

  1. Real-Time Clinical Decision Support System with Data Stream Mining

    Yang Zhang


    Full Text Available This research aims to describe a new design of data stream mining system that can analyze medical data stream and make real-time prediction. The motivation of the research is due to a growing concern of combining software technology and medical functions for the development of software application that can be used in medical field of chronic disease prognosis and diagnosis, children healthcare, diabetes diagnosis, and so forth. Most of the existing software technologies are case-based data mining systems. They only can analyze finite and structured data set and can only work well in their early years and can hardly meet today's medical requirement. In this paper, we describe a clinical-support-system based data stream mining technology; the design has taken into account all the shortcomings of the existing clinical support systems.

  2. Metagenomic signatures of a tropical mining-impacted stream reveal complex microbial and metabolic networks.

    Reis, Mariana P; Dias, Marcela F; Costa, Patrícia S; Ávila, Marcelo P; Leite, Laura R; de Araújo, Flávio M G; Salim, Anna C M; Bucciarelli-Rodriguez, Mônica; Oliveira, Guilherme; Chartone-Souza, Edmar; Nascimento, Andréa M A


    Bacteria from aquatic ecosystems significantly contribute to biogeochemical cycles, but details of their community structure in tropical mining-impacted environments remain unexplored. In this study, we analyzed a bacterial community from circumneutral-pH tropical stream sediment by 16S rRNA and shotgun deep sequencing. Carrapatos stream sediment, which has been exposed to metal stress due to gold and iron mining (21 [g Fe]/kg), revealed a diverse community, with predominance of Proteobacteria (39.4%), Bacteroidetes (12.2%), and Parcubacteria (11.4%). Among Proteobacteria, the most abundant reads were assigned to neutrophilic iron-oxidizing taxa, such as Gallionella, Sideroxydans, and Mariprofundus, which are involved in Fe cycling and harbor several metal resistance genes. Functional analysis revealed a large number of genes participating in nitrogen and methane metabolic pathways despite the low concentrations of inorganic nitrogen in the Carrapatos stream. Our findings provide important insights into bacterial community interactions in a mining-impacted environment.

  3. An Algorithm of Association Rule Mining for Microbial Energy Prospection

    Shaheen, Muhammad; Shahbaz, Muhammad


    The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846

  4. Research on Algorithms for Mining Distance-Based Outliers

    WANGLizhen; ZOULikun


    The outlier detection is an important and valuable research in KDD (Knowledge discover in database). The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even weather forecast. In existing methods that we have seen for finding outliers, the notion of DB-(Distance-based) outliers is not restricted computationally to small values of the number of dimensions k and goes beyond the data space. Here, we study algorithms for mining DB-outliers. We focus on developing algorithms unlimited by k. First, we present a Partition-based algorithm (the PBA). The key idea is to gain efficiency by divide-and-conquer. Second, we present an optimized algorithm called Object-class-based algorithm (the OCBA). The computing of this algorithm has nothing to do with k and the efficiency of this algorithm is as good as the cell-based algorithm. We provide experimental results showing that the two new algorithms have better execution efficiency.

  5. Ecological impacts of lead mining on Ozark streams: toxicity of sediment and pore water.

    Besser, John M; Brumbaugh, William G; Allert, Ann L; Poulton, Barry C; Schmitt, Christopher J; Ingersoll, Christopher G


    We studied the toxicity of sediments downstream of lead-zinc mining areas in southeast Missouri, using chronic sediment toxicity tests with the amphipod, Hyalella azteca, and pore-water toxicity tests with the daphnid, Ceriodaphnia dubia. Tests conducted in 2002 documented reduced survival of amphipods in stream sediments collected near mining areas and reduced survival and reproduction of daphnids in most pore waters tested. Additional amphipod tests conducted in 2004 documented significant toxic effects of sediments from three streams downstream of mining areas: Strother Creek, West Fork Black River, and Bee Fork. Greatest toxicity occurred in sediments from a 6-km reach of upper Strother Creek, but significant toxic effects occurred in sediments collected at least 14 km downstream of mining in all three watersheds. Toxic effects were significantly correlated with metal concentrations (nickel, zinc, cadmium, and lead) in sediments and pore waters and were generally consistent with predictions of metal toxicity risks based on sediment quality guidelines, although ammonia and manganese may also have contributed to toxicity at a few sites. Responses of amphipods in sediment toxicity tests were significantly correlated with characteristics of benthic invertebrate communities in study streams. These results indicate that toxicity of metals associated with sediments contributes to adverse ecological effects in streams draining the Viburnum Trend mining district.

  6. Web Log Mining using Improved Version of Proposed Algorithm

    Dr. Manish Shrivastava


    Full Text Available Association Rule mining is one of the important and most popular data mining technique. It extracts interesting correlations, frequent patterns and associations among sets of items in the transaction databases or other data repositories. Most of the existing algorithms require multiple passes over the database for discovering frequent patterns resulting in a large number of disk reads and placing a huge burden on the input/output subsystem. In order to reduce repetitive disk read, a novel method of top down approach is proposed in this paper. The improved version of Apriori Algorithm greatly reduces the data base scans and avoids generation of unnecessary patterns which reduces data base scan, time and space consumption.

  7. Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining

    ZHANG Jing; SUN Zhi-hui


    Outlier detection has very important applied value in data mining literature.Different outlier detection algorithms based on distinct theories have different definitions and mining processes.The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory.

  8. Web Based Genetic Algorithm Using Data Mining

    Ashiqur Rahman


    Full Text Available This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize the prediction accuracy and get a marked improvement over raw classification. It further shows that when the number of features is few; feature weighting is works better than just feature selection. Many leading educational institutions are working to establish an online teaching and learning presence. Several systems with different capabilities and approaches have been developed to deliver online education in an academic setting. In particular, Michigan State University (MSU has pioneered some of these systems to provide an infrastructure for online instruction. The research presented here was performed on a part of the latest online educational system developed at MSU, the Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA

  9. On Resource Aware Algorithms in Epidemic Live Streaming

    Mathieu, Fabien


    Epidemic-style diffusion schemes have been previously proposed for achieving peer-to-peer live streaming. Their performance trade-offs have been deeply analyzed for homogeneous systems, where all peers have the same upload capacity. However, epidemic schemes designed for heterogeneous systems have not been completely understood yet. In this report we focus on the peer selection process and propose a generic model that encompasses a large class of algorithms. The process is modeled as a combination of two functions, an aware one and an agnostic one. By means of simulations, we analyze the awareness-agnostism trade-offs on the peer selection process and the impact of the source distribution policy in non-homogeneous networks. We highlight that the early diffusion of a given chunk is crucial for its overall diffusion performance, and a fairness trade-off arises between the performance of heterogeneous peers, as a function of the level of awareness.

  10. The algorithm of measuring parameters of separate oil streams components

    Kopteva, A. V.; Voytyuk, I. N.


    This paper describes a development in the area of non-contact measurement of moving flows, including mass flow, the number of components and their mass ratios in a multicomponent flow, as well as measurement of flows based on algorithms and functional developed for various industries and production processes. The paper demonstrates that at the core of the proposed systems, there is the physical information field created in the cross section of the moving flow by hard electromagnetic radiation. The substantiation and measurement of the information parameters are performed by the hardware and the software of the automatic measuring system. A new way of statistical pulsation measurements by the radioisotope technique is described, being alternative to the existing stream control methods and allowing improving accuracy of measurements. The basic formula fundamental for the method of calibration characteristics correction is shown.

  11. Evaluation of Stream Mining Classifiers for Real-Time Clinical Decision Support System: A Case Study of Blood Glucose Prediction in Diabetes Therapy

    Simon Fong


    Full Text Available Earlier on, a conceptual design on the real-time clinical decision support system (rt-CDSS with data stream mining was proposed and published. The new system is introduced that can analyze medical data streams and can make real-time prediction. This system is based on a stream mining algorithm called VFDT. The VFDT is extended with the capability of using pointers to allow the decision tree to remember the mapping relationship between leaf nodes and the history records. In this paper, which is a sequel to the rt-CDSS design, several popular machine learning algorithms are investigated for their suitability to be a candidate in the implementation of classifier at the rt-CDSS. A classifier essentially needs to accurately map the events inputted to the system into one of the several predefined classes of assessments, such that the rt-CDSS can follow up with the prescribed remedies being recommended to the clinicians. For a real-time system like rt-CDSS, the major technological challenges lie in the capability of the classifier to process, analyze and classify the dynamic input data, quickly and upmost reliably. An experimental comparison is conducted. This paper contributes to the insight of choosing and embedding a stream mining classifier into rt-CDSS with a case study of diabetes therapy.

  12. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval

    P. R. Kumar


    Full Text Available Problem statement: A study on hyperlink analysis and the algorithms used for link analysis in the Web Information retrieval was done. Approach: This research was initiated because of the dependability of search engines for information retrieval in the web. Understand the web structure mining and determine the importance of hyperlink in web information retrieval particularly using the Google Search engine. Hyperlink analysis was important methodology used by famous search engine Google to rank the pages. Results: The different algorithms used for link analysis like PageRank (PR, Weighted PageRank (WPR and Hyperlink-Induced Topic Search (HITS algorithms are discussed and compared. PageRank algorithm was implemented using a Java program and the convergence of the PageRank values are shown in a chart form. Conclusion: This study was done basically to explore the link structure algorithms for ranking and compare those algorithms. The further research on this area will be problems facing PageRank algorithm and how to handle those problems.

  13. A Control Chart Approach for Representing and Mining Data Streams with Shape Based Similarity

    Omitaomu, Olufemi A [ORNL


    The mining of data streams for online condition monitoring is a challenging task in several domains including (electric) power grid system, intelligent manufacturing, and consumer science. Considering a power grid application in which thousands of sensors, called the phasor measurement units, are deployed on the power grid network to continuously collect streams of digital data for real-time situational awareness and system management. Depending on design, each sensor could stream between ten and sixty data samples per second. The myriad of sensory data captured could convey deeper insights about sequence of events in real-time and before major damages are done. However, the timely processing and analysis of these high-velocity and high-volume data streams is a challenge. Hence, a new data processing and transformation approach, based on the concept of control charts, for representing sequence of data streams from sensors is proposed. In addition, an application of the proposed approach for enhancing data mining tasks such as clustering using real-world power grid data streams is presented. The results indicate that the proposed approach is very efficient for data streams storage and manipulation.

  14. Penguins Search Optimisation Algorithm for Association Rules Mining

    Youcef Gheraibia


    Full Text Available Association Rules Mining (ARM is one of the most popular and well-known approaches for the decision-making process. All existing ARM algorithms are time consuming and generate a very large number of association rules with high overlapping. To deal with this issue, we propose a new ARM approach based on penguins search optimization algorithm (Pe-ARM for short. Moreover, an efficient measure is incorporated into the main process to evaluate the amount of overlapping among the generated rules. The proposed approach also ensures a good diversification over the whole solutions space. To demonstrate the effectiveness of the proposed approach, several experiments have been carried out on different datasets and specifically on the biological ones. The results reveal that the proposed approach outperforms the well-known ARM algorithms in both execution time and solution quality.

  15. RStorm : Developing and testing streaming algorithms in R

    Kaptein, M.C.


    Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams. However

  16. RStorm : Developing and testing streaming algorithms in R

    Kaptein, M.C.


    Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams.

  17. RStorm: Developing and Testing Streaming Algorithms in R

    Kaptein, M.C.


    Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams.

  18. RStorm: Developing and Testing Streaming Algorithms in R

    Kaptein, M.C.


    Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams. However

  19. Streaming algorithms for recognizing nearly well-parenthesized expressions

    Krebs, Andreas; Srinivasan, Srikanth


    We study the streaming complexity of the membership problem of 1-turn-Dyck2 and Dyck2 when there are a few errors in the input string. 1-turn-Dyck2 with errors: We prove that there exists a randomized one-pass algorithm that given x checks whether there exists a string x' in 1-turn-Dyck2 such that x is obtained by flipping at most $k$ locations of x' using: - O(k log n) space, O(k log n) randomness, and poly(k log n) time per item and with error at most 1/poly(n). - O(k^{1+epsilon} + log n) space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per item, with error at most 1/8. Here, we also prove that any randomized one-pass algorithm that makes error at most k/n requires at least Omega(k log(n/k)) space to accept strings which are exactly k-away from strings in 1-turn-Dyck2 and to reject strings which are exactly (k+2)-away from strings in 1-turn-Dyck2. Since 1-turn-Dyck2 and the Hamming Distance problem are closely related we also obtain new upper and lower bounds for th...

  20. Origin and influence of coal mine drainage on streams of the United States

    Powell, J.D.


    Degradation of water quality related to oxidation of iron disulfide minerals associated with coal is a naturally occurring process that has been observed since the late seventeenth century, many years before commencement of commercial coal mining in the United States. Disturbing coal strata during mining operations accelerates this natural deterioration of water quality by exposing greater surface areas of reactive minerals to the weathering effects of the atmosphere, hydrosphere, and biosphere. Degraded water quality in the temperate eastern half of the United States is readily detected because of the low mineralization of natural water. Maps are presented showing areas in the eastern United States where concentrations of chemical constituents in water affected by coal mining (pH, dissolved sulfate, total iron, total manganese) exceed background values and indicate effects of coal mining. Areas in the East most affected by mine drainage are in western Pennsylvania, southern Ohio, western Maryland, West Virginia, southern Illinois, western Kentucky, northern Missouri, and southern Iowa. Effects of coal mining on water quality in the more arid western half of the United States are more difficult to detect because of the high degree of mineralization of natural water. Normal background concentrations of constituents are not useful in evaluating effects of coal mine drainage on streams in the more arid West. Three approaches to reduce the effects of coal mining on water quality are: (1) exclusion of oxygenated water from reactive minerals, (2) neutralization of the acid produced, (3) retardation of acid-producing bacteria population in spoil material, by application of detergents that do not produce byproducts requiring disposal. These approaches can be used to help prevent further degradation of water quality in streams by future mining. ?? 1988 Springer-Verlag New York Inc.

  1. Classification of stream basins in southeastern Ohio according to extent of surface coal mining

    Childress, C.J.


    Water-quality data were collected from streams grenadine 35 basins in the southeaster-Ohio coal region to evaluate and categorize the effect of surface coal mining on stream quality. The study area is underlain by rocks of Pennsylvanian age, the most important coal-producing formations of which are the Allegheny and Monogahela Formations. The study area contains 276 data-collection sites, each of which was sampled four times over a 3-year period. Water and bed-material samples were collected. Each site was classified as 'abandoned,' reclaimed,' unmined,' or mixed,' depending on the proportion of the drainage basin disturbed by mining, and if mined, on the present condition of the mine. Of the 130 sites in the Monogahela Formation, 18 percent were classified as abandoned, 2 percent as reclaimed, 10 percent as unmined, and 70 percent as mixed. Of the 146 sites in the Allegheny Formation, 14 percent were classified as abandoned, 11 percent as unmined, and 75 percent as mixed. Streams draining the carbonate-bearing Monogahela Formation have a significantly greater buffering capacity than streams draining the Allegheny Formation. THere are significant differences in specific conductance; pH; alkalinity; acidity; hardness; total and dissolve manganese, and aluminum; dissolved nickel, zinc, and sulfate; and dissolved solids among mining-disturbance types in the Allegheny Formation. However, in stream draining the Monogahela Formation, only hardness, sulfur, dissolved solids, and dissolved manganese are significantly different among mining-disturbance types. Discriminant-function analysis of water-quality data was used to classify each 'mixed' site into one of four categories: Abandoned, reclaimed, unmined, or uncertain. In addition, observations in each of the first three categories were classified as strongly, moderately, or weakly characteristic of that category. The discriminant function was based on specific conductance, pH, acidity, dissolved sulfate, dissolved

  2. Stream Response to Storm Events Downstream of Mine Tailings: Identifying Contaminant Sources Using Hydrograph Separation and Stream Chemistry

    Holmes, J.; Renshaw, C. E.; Feng, X.


    Quantifying sources of contamination is paramount to good remediation plans at abandoned mine sites. We collected surface water samples from Copperas Brook, a second order stream draining over 16 ha (40 acres) of mine tailings from the abandoned Elizabeth Copper Mine in east central Vermont. Streamflow exhibits a rapid response to rain events. Hydrograph separations using oxygen isotopes consistently indicate considerably higher percentages of new water during rain events compared to a nearby control catchment and to other northeastern U.S. catchments. We attribute most of the new water to direct precipitation on low-infiltration hardpans at the base of the mine tailings, as well as to direct precipitation on to the stream channel itself. In stormflow, base cations (Ca, Mg, Na, K) are diluted, consistent with other studies. By contrast, heavy metal concentrations (Cu, Zn, Cd, Co) increase by up to an order of magnitude. Other studies have suggested that the increased metals in stormflow may be the result of rapid dissolution and transport of the soluble efflorescent sulfate minerals coating the hardpans. Copperas Brook could be highly susceptible to this process given the high percentage of new water in its stormflow. However, multiple regression of stormflow chemical source end-members shows that neither dissolved sulfur salts nor groundwater seeps from the major tailings pile are primarily responsible for the increased metals concentrations at this site. Rather, the majority of heavy metals derive from an isolated 2 ha (5 acres) tailings pile via a pathway that is not connected with the major tailings. This may have profound implications for prioritizing the remediation of this site.

  3. An Algorithm for Mining Multidimensional Fuzzy Association Rules

    Khare, Neelu; Pardasani, K R


    Multidimensional association rule mining searches for interesting relationship among the values from different dimensions or attributes in a relational database. In this method the correlation is among set of dimensions i.e., the items forming a rule come from different dimensions. Therefore each dimension should be partitioned at the fuzzy set level. This paper proposes a new algorithm for generating multidimensional association rules by utilizing fuzzy sets. A database consisting of fuzzy transactions, the Apriory property is employed to prune the useless candidates, itemsets.

  4. FSRM: A Fast Algorithm for Sequential Rule Mining

    Anjali Paliwal


    Full Text Available Recent developments in computing and automation technologies have resulted in computerizing business and scientific applications in various areas. Turing the massive amounts of accumulated information into knowledge is attracting researchers in numerous domains as well as databases, machine learning, statistics, and so on. From the views of information researchers, the stress is on discovering meaningful patterns hidden in the massive data sets. Hence, a central issue for knowledge discovery in databases, additionally the main focus of this paper, is to develop economical and scalable mining algorithms as integrated tools for management systems.

  5. The algorithm of malicious code detection based on data mining

    Yang, Yubo; Zhao, Yang; Liu, Xiabi


    Traditional technology of malicious code detection has low accuracy and it has insufficient detection capability for new variants. In terms of malicious code detection technology which is based on the data mining, its indicators are not accurate enough, and its classification detection efficiency is relatively low. This paper proposed the information gain ratio indicator based on the N-gram to choose signature, this indicator can accurately reflect the detection weight of the signature, and helped by C4.5 decision tree to elevate the algorithm of classification detection.

  6. Effects of remediation on the bacterial community of an acid mine drainage impacted stream.

    Ghosh, Suchismita; Moitra, Moumita; Woolverton, Christopher J; Leff, Laura G


    Acid mine drainage (AMD) represents a global threat to water resources, and as such, remediation of AMD-impacted streams is a common practice. During this study, we examined bacterial community structure and environmental conditions in a low-order AMD-impacted stream before, during, and after remediation. Bacterial community structure was examined via polymerase chain reaction amplification of 16S rRNA genes followed by denaturing gradient gel electrophoresis. Also, bacterial abundance and physicochemical data (including metal concentrations) were collected and relationships to bacterial community structure were determined using BIO-ENV analysis. Remediation of the study stream altered environmental conditions, including pH and concentrations of some metals, and consequently, the bacterial community changed. However, remediation did not necessarily restore the stream to conditions found in the unimpacted reference stream; for example, bacterial abundances and concentrations of some elements, such as sulfur, magnesium, and manganese, were different in the remediated stream than in the reference stream. BIO-ENV analysis revealed that changes in pH and iron concentration, associated with remediation, primarily explained temporal alterations in bacterial community structure. Although the sites sampled in the remediated stream were in relatively close proximity to each other, spatial variation in community composition suggests that differences in local environmental conditions may have large impacts on the microbial assemblage.

  7. Microbiological and chemical characteristics of an acidic stream draining a disused copper mine.

    Walton, K C; Johnson, D B


    Water samples draining a disused copper mine (Parys Mountain) in Anglesey, North Wales, were analysed for distribution of acidophilic bacteria (iron oxidising and heterotrophic) and for changes in physicochemical composition along the length of the drainage stream. Ten samples were taken at regular distance intervals along a 1 km stretch from the source of the acid mine drainage. The stream remained highly acidic (pH iron was in the ferrous form in the upper reaches of the stream, but ferric iron became increasingly dominant downstream as a result of microbial oxidation. Although concentrations of nutrients such as nitrogen and phosphorus were low in the acid mine drainage, they were not limiting rates of bacterial iron oxidation, which appeared to be limited more by temperature. The iron oxidising bacteria Thiobacillus ferrooxidans and Leptospirillum ferrooxidans were both isolated from all sampling sites, although their relative abundances varied; L. ferrooxidans accounted for 57% of all iron oxidising isolates. Numbers of iron oxidising bacteria decreased with distance from drainage source, in contrast to those of acidophilic heterotrophic bacteria which increased. The diversity of heterotrophic isolates also increased with distance. The relationship between the chemistry and microbiology of the stream is discussed.

  8. Long Wall Mining Subsidence and Fluvial Stream Changes: an Ecological Perspective

    Spear, R.; Proch, T.


    Long wall mining is a high efficiency underground coal extraction technique that removes large panels of coal and causes immediate subsidence of the overburden. Surface subsidence may vary between 0.3 meters and 2 meters depending on the depth of the coal seam. Fractures in the overburden allow perched water tables to drain to greater depths and dries up their associated springs. Base flow in headwater streams is often eliminated. Subsidence also changes the physical characteristics of streams. Typical riffle pool sequences are replaced with long pools and glides. Benthic invertebrate and fish community assemblages reflect the physical habitat changes. EPT invertebrate taxa are replaced with Odonates and Diptera larvae associated with glide/pool habitat. Fish community diversity is negatively impacted. Diverse riffle dwelling assemblages are replaced and dominated by increasingly homogenous pool dwelling fish communities. Stream subsidence effectively increases the stream order without increasing its size or flow regime.

  9. Ecological effects of lead mining on Ozark streams: In-situ toxicity to woodland crayfish (Orconectes hylas)

    Allert, A.L.; Fairchild, J.F.; DiStefano, R.J.; Schmitt, C.J.; Brumbaugh, W.G.; Besser, J.M.


    The Viburnum Trend mining district in southeast Missouri, USA is one of the largest producers of lead-zinc ore in the world. Previous stream surveys found evidence of increased metal exposure and reduced population densities of crayfish immediately downstream of mining sites. We conducted an in-situ 28-d exposure to assess toxicity of mining-derived metals to the woodland crayfish (Orconectes hylas). Crayfish survival and biomass were significantly lower at mining sites than at reference and downstream sites. Metal concentrations in water, detritus, macroinvertebrates, fish, and crayfish were significantly higher at mining sites, and were negatively correlated with caged crayfish survival. These results support previous field and laboratory studies that showed mining-derived metals negatively affect O. hylas populations in streams draining the Viburnum Trend, and that in-situ toxicity testing was a valuable tool for assessing the impacts of mining on crayfish populations.

  10. Acid mine drainage and stream recovery: Effects of restoration on water quality, macroinvertebrates, and fish

    Williams K.M.


    Full Text Available Acid mine drainage (AMD is a prominent threat to water quality in many of the world’s mining districts as it can severely degrade both the biological community and physical habitat of receiving streams. There are relatively few long-term studies investigating the ability of stream ecosystems to recover from AMD. Here we assess watershed scale recovery of a cold-water stream from pollution by AMD using a 1967 survey of the biological and chemical properties of the stream as a pre-restoration benchmark. We sampled water chemistry, benthic macroinvertebrates, and fish throughout the watershed during the spring and summer of 2011. Water chemistry results indicated that pH and total alkalinity increased post-restoration, while acidity, sulfate, and iron concentrations decreased. Watershed-level taxa richness, local taxa richness, biomass, diversity, and density of macroinvertebrates were significantly higher post-restoration; however, %EPT was not significantly different. Fish species richness, density, and brook trout density were all significantly higher post-restoration. These results provide clear evidence that both abiotic and biotic components of streams can recover from AMD pollution.

  11. Geochemistry of acid mine drainage from a coal mining area and processes controlling metal attenuation in stream waters, southern Brazil



    Full Text Available Acid drainage influence on the water and sediment quality was investigated in a coal mining area (southern Brazil. Mine drainage showed pH between 3.2 and 4.6 and elevated concentrations of sulfate, As and metals, of which, Fe, Mn and Zn exceeded the limits for the emission of effluents stated in the Brazilian legislation. Arsenic also exceeded the limit, but only slightly. Groundwater monitoring wells from active mines and tailings piles showed pH interval and chemical concentrations similar to those of mine drainage. However, the river and ground water samples of municipal public water supplies revealed a pH range from 7.2 to 7.5 and low chemical concentrations, although Cd concentration slightly exceeded the limit adopted by Brazilian legislation for groundwater. In general, surface waters showed large pH range (6 to 10.8, and changes caused by acid drainage in the chemical composition of these waters were not very significant. Locally, acid drainage seemed to have dissolved carbonate rocks present in the local stratigraphic sequence, attenuating the dispersion of metals and As. Stream sediments presented anomalies of these elements, which were strongly dependent on the proximity of tailings piles and abandoned mines. We found that precipitation processes in sediments and the dilution of dissolved phases were responsible for the attenuation of the concentrations of the metals and As in the acid drainage and river water mixing zone. In general, a larger influence of mining activities on the chemical composition of the surface waters and sediments was observed when enrichment factors in relation to regional background levels were used.

  12. Evaluation of Metal Toxicity in Streams Affected by Abandoned Mine Lands, Upper Animas River Watershed, Colorado

    Besser, John M.; Allert, Ann L.; Hardesty, Douglas K.; Ingersoll, Christopher G.; May, Thomas W.; Wang, Ning; Leib, Kenneth J.


    Acid drainage from abandoned mines and from naturally-acidic rocks and soil in the upper Animas River watershed of Colorado generates elevated concentrations of acidity and dissolved metals in stream waters and deposition of metal-contaminated particulates in streambed sediments, resulting in both toxicity and habitat degradation for stream biota. High concentrations of iron (Fe), aluminum (Al), zinc (Zn), copper (Cu), cadmium (Cd), and lead (Pb) occur in acid streams draining headwaters of the upper Animas River watershed, and high concentrations of some metals, especially Zn, persist in circumneutral reaches of the Animas River and Mineral Creek, downstream of mixing zones of acid tributaries. Seasonal variation of metal concentrations is reflected in variation in toxicity of stream water. Loadings of dissolved metals to the upper Animas River and tributaries are greatest during summer, during periods of high stream discharge from snowmelt and monsoonal rains, but adverse effects on stream biota may be greater during winter low-flow periods, when stream flows are dominated by inputs of groundwater and contain greatest concentrations of dissolved metals. Fine stream-bed sediments of the upper Animas River watershed also contain elevated concentrations of potentially toxic metals. Greatest sediment metal concentrations occur in the Animas River upstream from Silverton, where there are extensive deposits of mine and mill tailings, and in mixing zones in the Animas River and lower Mineral Creek, where precipitates of Fe and Al oxides also contain high concentrations of other metals. This report summarizes the findings of a series of toxicity studies in streams of the upper Animas River watershed, conducted on-site and in the laboratory between 1998 and 2000. The objectives of these studies were: (1) to determine the relative toxicity of stream water and fine stream-bed sediments to fish and invertebrates; (2) to determine the seasonal range of toxicity in stream

  13. Temporal variability in metal concentrations in a mine-impacted stream: Implications for metal bioavailability

    Maest, A.; Beltman, D.; Lipton, J. [Hagler Bailly, Boulder, CO (United States)


    Variability in total and dissolved metal concentrations and dissolved organic carbon (DOC) in a stream impacted by a cobalt/copper mine were determined on several temporal scales, including hourly, daily, weekly, seasonally, and annually. Stream samples were collected during 1993 and 1994 spring runoff and 1993 low-flow conditions. Concentrations of mine-released metals varied from approximately one order of magnitude within a 24-hour period to several orders of magnitude seasonally. During spring runoff, dissolved metal concentrations peaked before total metal concentrations. Total metal concentrations tended to vary more widely on all temporal scales than did dissolved concentrations. Temporal changes in DOC concentrations did not follow those of metals: when metal concentrations were highest (early spring runoff), generally DOC concentrations were lowest. DOC concentrations increased as metal concentrations decreased during later spring runoff. The results show the temporal variability in metals bioavailability and demonstrate the importance of accounting for temporal variability in surface water sampling design and toxicity evaluations.

  14. Fingerprinting two metal contaminants in streams with Cu isotopes near the Dexing Mine, China

    Song, Shiming [Chinese Geological Survey, Nanjing Center, Nanjing (China); Mathur, Ryan, E-mail: [Department of Geology, Juniata College, Huntingdon, PA (United States); Ruiz, Joaquin [Department of Geosciences, University of Arizona, Tucson, AZ (United States); Chen, Dandan [Chinese Geological Survey, Nanjing Center, Nanjing (China); Allin, Nicholas [Department of Geology, Juniata College, Huntingdon, PA (United States); Guo, Kunyi; Kang, Wenkai [Chinese Geological Survey, Nanjing Center, Nanjing (China)


    Transition metal isotope signatures are becoming useful for fingerprinting sources in surface waters. This study explored the use of Cu isotope values to trace dissolved metal contaminants in stream water throughout a watershed affected by mining by-products of the Dexing Mine, the largest porphyry Cu operation in Asia. Cu isotope values of stream water were compared to potential mineral sources of Cu in the mining operation, and to proximity to the known Cu sources. The first mineral source, chalcopyrite, CuFeS{sub 2} has a ‘tight’ cluster of Cu isotope values (− 0.15‰ to + 1.65‰; + 0.37 ± 0.6‰, 1σ, n = 10), and the second mineral source, pyrite (FeS{sub 2}), has a much larger range of Cu isotope values (− 4‰ to + 11.9‰; 2.7 ± 4.3‰, 1σ, n = 16). Dissolved Cu isotope values of stream water indicated metal derived from either chalcopyrite or pyrite. Above known Cu mineralization, stream waters are approximately + 1.5‰ greater than the average chalcopyrite and are interpreted as derived from weathering of chalcopyrite. In contrast, dissolved Cu isotope values in stream water emanating from tailings piles had Cu isotope values similar to or greater than pyrite (>+6‰, a common mineral in the tailings). These values are interpreted as sourced from the tailings, even in solutions that possess significantly lower concentrations of Cu (< 0.05 ppm). Elevated Cu isotope values were also found in two soil and two tailings samples (δ{sup 65}Cu ranging between + 2 to + 5‰). These data point to the mineral pyrite in tailings as the mineral source for the elevated Cu isotope values. Therefore, Cu isotope values of waters emanating from a clearly contaminated drainage possess different Cu isotope values, permitting the discrimination of Cu derived from chalcopyrite and pyrite in solution. Data demonstrate the utility of Cu isotopic values in waters, minerals, and soils to fingerprint metallic contamination for environmental problems. - Highlights:

  15. Estimating benthic secondary production from aquatic insect emergence in streams affected by mountaintop removal coal mining, West Virginia USA

    Mountaintop removal and valley fill (MTR/VF) coal mining recountours the Appalachian landscape, buries headwater stream channels, and degrades downstream water quality. The goal of this study was to compare benthic community production estimates, based on seasonal insect emergen...

  16. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Zhongda Lin; Yanfeng Hong; Kun Deng


    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  17. Data mining algorithm for discovering matrix association regions (MARs)

    Singh, Gautam B.; Krawetz, Shephan A.


    Lately, there has been considerable interest in applying Data Mining techniques to scientific and data analysis problems in bioinformatics. Data mining research is being fueled by novel application areas that are helping the development of newer applied algorithms in the field of bioinformatics, an emerging discipline representing the integration of biological and information sciences. This is a shift in paradigm from the earlier and the continuing data mining efforts in marketing research and support for business intelligence. The problem described in this paper is along a new dimension in DNA sequence analysis research and supplements the previously studied stochastic models for evolution and variability. The discovery of novel patterns from genetic databases as described is quite significant because biological patterns play an important role in a large variety of cellular processes and constitute the basis for gene therapy. Biological databases containing the genetic codes from a wide variety of organisms, including humans, have continued their exponential growth over the last decade. At the time of this writing, the GenBank database contains over 300 million sequences and over 2.5 billion characters of sequenced nucleotides. The focus of this paper is on developing a general data mining algorithm for discovering regions of locus control, i.e. those regions that are instrumental for determining cell type. One such type of element of locus control are the MARs or the Matrix Association Regions. Our limited knowledge about MARs has hampered their detection using classical pattern recognition techniques. Consequently, their detection is formulated by utilizing a statistical interestingness measure derived from a set of empirical features that are known to be associated with MARs. This paper presents a systematic approach for finding associations between such empirical features in genomic sequences, and for utilizing this knowledge to detect biologically interesting

  18. Transport and fate of mercury under different hydrologic regimes in polluted stream in mining area.

    Lin, Yan; Larssen, Thorjørn; Vogt, Rolf D; Feng, Xinbin; Zhang, Hua


    Seepage from Hg mine wastes and calcines contains high concentrations of mercury (Hg). Hg pollution is a major environmental problem in areas with abandoned mercury mines and retorting units. This study evaluates factors, especially the hydrological and sedimentary variables, governing temporal and spatial variation in levels and state of mercury in streams impacted by Hg contaminated runoff. Samples were taken during different flow regimes in the Wanshan Hg mining area in Guizhou Province, China. In its headwaters the sampled streams/rivers pass by several mine wastes and calcines with high concentration of Hg. Seepage causes serious Hg contamination to the downstream area. Concentrations of Hg in water samples showed significant seasonal variations. Periods of higher flow showed high concentrations of total Hg (THg) in water due to more particles being re-suspended and transported. The concentrations of major anions (e.g., Cl-, F-, NO3- and SO4(2-)) were lower during higher flow due to dilution. Due to both sedimentation of particles and dilution from tributaries the concentration of THg decreased from 2100 ng/L to background levels (MINTEQ) showed that Hg(OH)2 associated with dissolved organic matter is the main form of Hg in dissolved phase in surface waters in Wanshan (over 95%).

  19. Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Harry Dhika


    Full Text Available The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC. Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC is used to find data tables and comparison Area Under Curve (AUC

  20. A Novel Incremental Mining Algorithm of Frequent Patterns for Web Usage Mining

    DONG Yihong; ZHUANG Yueting; TAI Xiaoying


    Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.

  1. Roles of Benthic Algae in the Structure, Function, and Assessment of Stream Ecosystems Affected by Acid Mine Drainage

    Tens of thousands of stream kilometers around the world are degraded by a legacy of environmental impacts and acid mine drainage (AMD) caused by abandoned underground and surface mines, piles of discarded coal wastes, and tailings. Increased acidity, high concentrations of metals...

  2. Roles of Benthic Algae in the Structure, Function, and Assessment of Stream Ecosystems Affected by Acid Mine Drainage

    Tens of thousands of stream kilometers around the world are degraded by a legacy of environmental impacts and acid mine drainage (AMD) caused by abandoned underground and surface mines, piles of discarded coal wastes, and tailings. Increased acidity, high concentrations of metals...


    S. Vasukipriya


    Full Text Available The information on the World Wide Web grows in an explosive rate. Societies are relying more on the Web for their miscellaneous needs of information. Recommendation systems are active information filtering systems that attempt to present the information items like movies, music, images, books recommendations, tags recommendations, query suggestions, etc., to the users. Various kinds of data bases are used for the recommendations; fundamentally these data bases can be molded in the form of many types of graphs. Aiming at provided that a general framework on effective DR (Recommendations by Diffusion algorithm for web graphs mining. First introduce a novel graph diffusion model based on heat diffusion. This method can be applied to both undirected graphs and directed graphs. Then it shows how to convert different Web data sources into correct graphs in our models.

  4. Improved heuristic algorithm for selection of tear streams and precedence ordering in process flowsheeting computations

    Kristian M. Lien


    Full Text Available This paper presents a new algorithm based on the heuristic tearing algorithm by Gundersen and Hertzberg (1983. The basic idea in both the original and the proposed algorithm is sequential tearing of strong components which have been identified by an algorithm proposed by Targan (1972. The new algorithm has two alternative options for selection of tear streams, and alternative precedence orderings may be generated for the selected set of tear streams. The algorithm has been tested on several problems. It has identified minimal (optimal tear sets for all of them, including the four problems presented in Gundersen and Hertzberg (1983 where the original algorithm could not find a minimal tear set. A Lisp implementation of the algorithm is described, and example problems arc presented.

  5. The ClusTree : indexing micro-clusters for anytime stream mining

    Kranen, Philipp; Assent, Ira; Baldauf, Corinna


    Clustering streaming data requires algorithms that are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival...

  6. Geochemical Characterization of Mine Waste, Mine Drainage, and Stream Sediments at the Pike Hill Copper Mine Superfund Site, Orange County, Vermont

    Piatak, Nadine M.; Seal, Robert R.; Hammarstrom, Jane M.; Kiah, Richard G.; Deacon, Jeffrey R.; Adams, Monique; Anthony, Michael W.; Briggs, Paul H.; Jackson, John C.


    The Pike Hill Copper Mine Superfund Site in the Vermont copper belt consists of the abandoned Smith, Eureka, and Union mines, all of which exploited Besshi-type massive sulfide deposits. The site was listed on the U.S. Environmental Protection Agency (USEPA) National Priorities List in 2004 due to aquatic ecosystem impacts. This study was intended to be a precursor to a formal remedial investigation by the USEPA, and it focused on the characterization of mine waste, mine drainage, and stream sediments. A related study investigated the effects of the mine drainage on downstream surface waters. The potential for mine waste and drainage to have an adverse impact on aquatic ecosystems, on drinking- water supplies, and to human health was assessed on the basis of mineralogy, chemical concentrations, acid generation, and potential for metals to be leached from mine waste and soils. The results were compared to those from analyses of other Vermont copper belt Superfund sites, the Elizabeth Mine and Ely Copper Mine, to evaluate if the waste material at the Pike Hill Copper Mine was sufficiently similar to that of the other mine sites that USEPA can streamline the evaluation of remediation technologies. Mine-waste samples consisted of oxidized and unoxidized sulfidic ore and waste rock, and flotation-mill tailings. These samples contained as much as 16 weight percent sulfides that included chalcopyrite, pyrite, pyrrhotite, and sphalerite. During oxidation, sulfides weather and may release potentially toxic trace elements and may produce acid. In addition, soluble efflorescent sulfate salts were identified at the mines; during rain events, the dissolution of these salts contributes acid and metals to receiving waters. Mine waste contained concentrations of cadmium, copper, and iron that exceeded USEPA Preliminary Remediation Goals. The concentrations of selenium in mine waste were higher than the average composition of eastern United States soils. Most mine waste was

  7. Transport and fate of mercury under different hydrologic regimes in polluted stream in mining area

    Yan Lin; Thorjφrm Larssen; Rolf D. Vogt; Xinbin Feng; Hua Zhang


    Seepage from Hg mine wastes and calcines contains high concentrations of mercury (Hg).Hg pollution is a major environmental problem in areas with abandoned mercury mines and retorting units.This study evaluates factors, especially the hydrological and sedimentary variables, governing temporal and spatial variation in levels and state of mercury in streams impacted by Hg contaminated runoff.Samples were taken during different flow regimes in the Wanshan Hg mining area in Guizhou Province, China.In its headwaters the sampled streams/rivers pass by several mine wastes and calcines with high concentration of Hg.Seepage causes serious Hg contamination to the downstream area.Concentrations of Hg in water samples showed significant seasonal variations.Periods of higher flow showed high concentrations of total Hg (THg) in water due to more particles being re-suspended and transported.The concentrations of major anions (e.g., CI-, F-, NO3- and 8042-) were lower during higher flow due to dilution.Due to both sedimentation of particles and dilution from tributaries the concentration of THg decreased from 2100 ng/L to background levels (< 50 ng/L) within 10 km distance downstream.Sedimentation is the main reason for the fast decrease of the concentration, it accounts for 69% and 60%for higher flow and lower flow regimes respectively in the upper part of the stream.Speciation calculation of the dissolved Hg fraction (DHg) (using Visual MINTEQ) showed that Hg(OH)2 associated with dissolved organic matter is the main form of Hg in dissolved phase in surface waters in Wanshan (over 95%).

  8. Prediction of fish and sediment mercury in streams using landscape variables and historical mining

    Alpers, Charles N.; Yee, Julie L.; Ackerman, Josh; Orlando, James; Slotton, Darrell G.; Marvin-DiPasquale, Mark C.


    Widespread mercury (Hg) contamination of aquatic systems in the Sierra Nevada of California, U.S., is associated with historical use to enhance gold (Au) recovery by amalgamation. In areas affected by historical Au mining operations, including the western slope of the Sierra Nevada and downstream areas in northern California, such as San Francisco Bay and the Sacramento River–San Joaquin River Delta, microbial conversion of Hg to methylmercury (MeHg) leads to bioaccumulation of MeHg in food webs, and increased risks to humans and wildlife. This study focused on developing a predictive model for THg in stream fish tissue based on geospatial data, including land use/land cover data, and the distribution of legacy Au mines. Data on total mercury (THg) and MeHg concentrations in fish tissue and streambed sediment collected during 1980–2012 from stream sites in the Sierra Nevada, California were combined with geospatial data to estimate fish THg concentrations across the landscape. THg concentrations of five fish species (Brown Trout, Rainbow Trout, Sacramento Pikeminnow, Sacramento Sucker, and Smallmouth Bass) within stream sections were predicted using multi-model inference based on Akaike Information Criteria, using geospatial data for mining history and landscape characteristics as well as fish species and length (r2 = 0.61, p size resulted in an improved fit (r2 = 0.63, p < 0.001). These models can be used to estimate THg concentrations in stream fish based on landscape variables in the Sierra Nevada in areas where direct measurements of THg concentration in fish are unavailable.

  9. Distribution, speciation, and transport of mercury in stream-sediment, stream-water, and fish collected near abandoned mercury mines in southwestern Alaska, USA

    Gray, J.E.; Theodorakos, P.M.; Bailey, E.A.; Turner, R.R.


    Concentrations of total Hg, Hg (II), and methylmercury were measured in stream-sediment, stream-water, and fish collected downstream from abandoned mercury mines in south-western Alaska to evaluate environmental effects to surrounding ecosystems. These mines are found in a broad belt covering several tens of thousands of square kilometers, primarily in the Kuskokwim River basin. Mercury ore is dominantly cinnabar (HgS), but elemental mercury (Hg(o)) is present in ore at one mine and near retorts and in streams at several mine sites. Approximately 1400 t of mercury have been produced from the region, which is approximately 99% of all mercury produced from Alaska. These mines are not presently operating because of low prices and low demand for mercury. Stream-sediment samples collected downstream from the mines contain as much as 5500 ??g/g Hg. Such high Hg concentrations are related to the abundance of cinnabar, which is highly resistant to physical and chemical weathering, and is visible in streams below mine sites. Although total Hg concentrations in the stream-sediment samples collected near mines are high, Hg speciation data indicate that concentrations of Hg (II) are generally less than 5%, and methylmercury concentrations are less than 1% of the total Hg. Stream waters below the mines are neutral to slightly alkaline (pH 6.8-8.4), which is a result of the insolubility of cinnabar and the lack of acid- generating minerals such as pyrite in the deposits. Unfiltered stream-water samples collected below the mines generally contain 500-2500 ng/l Hg; whereas, corresponding stream-water samples filtered through a 0.45-??m membrane contain less than 50 ng/l Hg. These stream-water results indicate that most of the Hg transported downstream from the mines is as finely- suspended material rather than dissolved Hg. Mercury speciation data show that concentrations of Hg (II) and methylmercury in stream-water samples are typically less than 22 ng/l, and generally less than

  10. ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures

    Wu, Huayi; Guan, Xuefeng; Gong, Jianya


    This paper presents a robust parallel Delaunay triangulation algorithm called ParaStream for processing billions of points from nonoverlapped block LiDAR files. The algorithm targets ubiquitous multicore architectures. ParaStream integrates streaming computation with a traditional divide-and-conquer scheme, in which additional erase steps are implemented to reduce the runtime memory footprint. Furthermore, a kd-tree-based dynamic schedule strategy is also proposed to distribute triangulation and merging work onto the processor cores for improved load balance. ParaStream exploits most of the computing power of multicore platforms through parallel computing, demonstrating qualities of high data throughput as well as a low memory footprint. Experiments on a 2-Way-Quad-Core Intel Xeon platform show that ParaStream can triangulate approximately one billion LiDAR points (16.4 GB) in about 16 min with only 600 MB physical memory. The total speedup (including I/O time) is about 6.62 with 8 concurrent threads.

  11. A Heuristic Clustering Algorithm for Mining Communities in Signed Networks

    Bo Yang; Da-You Liu


    Signed network is an important kind of complex network, which includes both positive relations and negative relations. Communities of a signed network are defined as the groups of vertices, within which positive relations are dense and between which negative relations are also dense. Being able to identify communities of signed networks is helpful for analysis of such networks. Hitherto many algorithms for detecting network communities have been developed. However, most of them are designed exclusively for the networks including only positive relations and are not suitable for signed networks.So the problem of mining communities of signed networks quickly and correctly has not been solved satisfactorily. In this paper, we propose a heuristic algorithm to address this issue. Compared with major existing methods, our approach has three distinct features. First, it is very fast with a roughly linear time with respect to network size. Second, it exhibits a good clustering capability and especially can work well with complex networks without well-defined community structures.Finally, it is insensitive to its built-in parameters and requires no prior knowledge.

  12. A genetic algorithm approach to recognition and data mining

    Punch, W.F.; Goodman, E.D.; Min, Pei [Michigan State Univ., East Lansing, MI (United States)] [and others


    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  13. Periphyton communities in New Zealand streams impacted by acid mine drainage

    Bray, J.P.; Broady, P.A.; Niyogi, D.K.; Harding, J.S. [University of Canterbury, Christchurch (New Zealand). School for Biological Science


    Discharges from historic and current coal mines frequently generate waters low in pH (< 3), high in heavy metals ( e. g. Fe, Al) and cover streambeds in metal precipitates. The present study investigated periphyton communities at 52 stream sites on the West Coast, South Island, New Zealand, representing a range of impacts from acid mine drainage (AMD). Taxonomic richness was negatively related to acidity and metal oxides and biomass was negatively correlated with metal oxides, but positively related to acidity. Streams with low pH (< 3.5) had low periphyton richness (14 taxa across all sites) and were dominated by Klebsormidium acidophilum, Navicula cincta and Euglena mutabilis. As pH increased, so did taxonomic richness while community dominance decreased and community composition became more variable. Canonical correspondence analyses of algal assemblages revealed patterns influenced by pH. These findings indicate that streams affected by AMD possess a predictable assemblage composition of algal species that can tolerate the extreme water chemistry and substrate conditions. The predictability of algal communities declines with decreasing stress, as other abiotic and biotic factors become increasingly more important.

  14. Anti Interference Mining of Web Data Stream Based on Kalman Filtering%基于Kalman滤波的Web数据流抗干扰挖掘算法



    提出一种基于变维Kalman滤波的Web海量数据流抗干扰挖掘算法.构建Web环境下的海量数据挖掘数据流信息模型和噪声干扰模型,结合现代信号处理方法,设计变维Kalman滤波算法进行海量数据流信号滤波预处理,把Web海量数据流映射为一组非线性宽带调频信号模型,采用信号检测算法实现Web海量数据的抗干扰挖掘.仿真结果表明,采用该算法进行Web海量数据信息的抗干扰挖掘,具有较高的数据检测精度和准确挖掘性能,具有较高的抗干扰性和鲁棒性.%An anti jamming mining algorithm for Web massive data stream based on the variable dimension Kalman filter-ing is proposed.. Construct the massive amount of data in the web data mining information flow model and noise model, com-bined with modern signal processing methods to design the variable dimension Kalman filtering algorithm of massive data flow signal filtering pre processing, the web massive data flow is mapped to a set of nonlinear wideband FM signal model and uses the signal detection algorithm is to achieve a large amount of Web data anti-interference mining. Simulation re-sults show that by using the algorithm of Web data information of magnanimity anti-interference mining, it has higher preci-sion of measured data and accurate mining performance and has high anti-interference and robustness.

  15. An initial peer configuration algorithm for multi-streaming peer-to-peer networks

    Ishii, Tomoyuki


    The growth of the Internet technology enables us to use network applications for streaming audio and video. Especially, real-time streaming services using peer-to-peer (P2P) technology are currently emerging. An important issue on P2P streaming is how to construct a logical network (overlay network) on a physical network (IP network). In this paper, we propose an initial peer configuration algorithm for a multi-streaming peer-to-peer network. The proposed algorithm is based on a mesh-pull approach where any node has multiple parent and child nodes as neighboring nodes, and content transmitted between these neighboring nodes depends on their parent-child relationships. Our simulation experiments show that the proposed algorithm improves the number of joining node and traffic load.

  16. Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

    Noha Negm


    Full Text Available Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT. It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.

  17. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun


    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  18. Tuning, Diagnostics & Data Preparation for Generalized Linear Models Supervised Algorithm in Data Mining Technologies

    Sachin Bhaskar


    Full Text Available Data mining techniques are the result of a long process of research and product development. Large amount of data are searched by the practice of Data Mining to find out the trends and patterns that go beyond simple analysis. For segmentation of data and also to evaluate the possibility of future events, complex mathematical algorithms are used here. Specific algorithm produces each Data Mining model. More than one algorithms are used to solve in best way by some Data Mining problems. Data Mining technologies can be used through Oracle. Generalized Linear Models (GLM Algorithm is used in Regression and Classification Oracle Data Mining functions. For linear modelling, GLM is one the popular statistical techniques. For regression and binary classification, GLM is implemented by Oracle Data Mining. Row diagnostics as well as model statistics and extensive co-efficient statistics are provided by GLM. It also supports confidence bounds.. This paper outlines and produces analysis of GLM algorithm, which will guide to understand the tuning, diagnostics & data preparation process and the importance of Regression & Classification supervised Oracle Data Mining functions and it is utilized in marketing, time series prediction, financial forecasting, overall business planning, trend analysis, environmental modelling, biomedical and drug response modelling, etc.

  19. The Books Recommend Service System Based on Improved Algorithm for Mining Association Rules



    The Apriori algorithm is a classical method of association rules mining. Based on analysis of this theory, the paper provides an improved Apriori algorithm. The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to en-hance the usage efficiency of resources as well as the individualized service of the data library.

  20. Mining Interesting Positive and Negative Association Rule Based on Improved Genetic Algorithm (MIPNAR_GA

    Nikky Suryawanshi Rai


    Full Text Available Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the mining of positive and negative rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. A number of algorithms are wonder performance but produce large number of negative association rule and also suffered from multi-scan problem. The idea of this paper is to eliminate these problems and reduce large number of negative rules. Hence we proposed an improved approach to mine interesting positive and negative rules based on genetic and MLMS algorithm. In this method we used a multi-level multiple support of data table as 0 and 1. The divided process reduces the scanning time of database. The proposed algorithm is a combination of MLMS and genetic algorithm. This paper proposed a new algorithm (MIPNAR_GA for mining interesting positive and negative rule from frequent and infrequent pattern sets. The algorithm is accomplished in to three phases: a.Extract frequent and infrequent pattern sets by using apriori method b.Efficiently generate positive and negative rule. c.Prune redundant rule by applying interesting measures. The process of rule optimization is performed by genetic algorithm and for evaluation of algorithm conducted the real world dataset such as heart disease data and some standard data used from UCI machine learning repository.

  1. Algorithms for Deterministic Call Admission Control of Pre-stored VBR Video Streams

    Christos Tryfonas


    Full Text Available We examine the problem of accepting a new request for a pre-stored VBR video stream that has been smoothed using any of the smoothing algorithms found in the literature. The output of these algorithms is a piecewise constant-rate schedule for a Variable Bit-Rate (VBR stream. The schedule guarantees that the decoder buffer does not overflow or underflow. The problem addressed in this paper is the determination of the minimal time displacement of each new requested VBR stream so that it can be accommodated by the network and/or the video server without overbooking the committed traffic. We prove that this call-admission control problem for multiple requested VBR streams is NP-complete and inapproximable within a constant factor, by reducing it from the VERTEX COLOR problem. We also present a deterministic morphology-sensitive algorithm that calculates the minimal time displacement of a VBR stream request. The complexity of the proposed algorithm along with the experimental results we provide indicate that the proposed algorithm is suitable for real-time determination of the time displacement parameter during the call admission phase.

  2. Analyzing Conductivity Profiles in Stream Waters Influenced by Mine Water Discharges

    Räsänen, Teemu; Hämäläinen, Emmy; Hämäläinen, Matias; Turunen, Kaisa; Pajula, Pasi; Backnäs, Soile


    Conductivity is useful as a general measure of stream water quality. Each stream inclines to have a quite constant range of conductivity that can be used as a baseline for comparing and detecting influence of contaminant sources. Conductivity in natural streams and rivers is affected primarily by the geology of the watershed. Thus discharges from ditches and streams affect not only the flow rate in the river but also the water quality and conductivity. In natural stream waters, the depth and the shape of the river channel change constantly, which changes also the water flow. Thus, an accurate measuring of conductivity or other water quality indicators is difficult. Reliable measurements are needed in order to have holistic view about amount of contaminants, sources of discharges and seasonal variation in mixing and dilution processes controlling the conductivity changes in river system. We tested the utility of CastAway-CTD measuring device (SonTek Inc) to indicate the influence of mine waters as well as mixing and dilution occurring in the recipient river affected by treated dewatering and process effluent water discharges from a Finnish gold mine. The CastAway-CTD measuring device is a small, rugged and designed for profiling of depths of up to 100m. Device measures temperature, salinity, conductivity and sound of speed using 5 Hz response time. It has also built-in GPS which produces location information. CTD casts are normally used to produce vertical conductivity profile for rather deep waters like seas or lakes. We did seasonal multiple Castaway-CTD measurements during 2013 and 2014 and produced scaled vertical and horizontal profiles of conductivity and water temperature at the river. CastAway-CTD measurement pinpoints how possible contaminants behave and locate in stream waters. The conductivity profiles measured by CastAway-CTD device show the variation in maximum conductivity values vertically in measuring locations and horizontally in measured cross

  3. Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

    Michael Hahsler


    Full Text Available In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python. In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.

  4. Variations in heavy metal contamination of stream water and groundwater affected by an abandoned lead-zinc mine in Korea.

    Lee, Jin-Yong; Choi, Jung-Chan; Lee, Kang-Kun


    This study evaluated variations in heavy metal contamination of stream waters and groundwaters affected by an abandoned lead-zinc mine, where a rockfill dam for water storage will be built 11 km downstream. For these purposes, a total of 10 rounds of stream and groundwater samplings and subsequent chemical analyses were performed during 2002-2003. Results of an exploratory investigation of stream waters in 2000 indicated substantial contamination with heavy metals including zinc (Zn), iron (Fe) and arsenic (As) for at least 6 km downstream from the mine. Stream waters near the mine showed metal contamination as high as arsenic (As) 8,923 microg L(-1), copper (Cu) 616 microg L(-1), cadmium (Cd) 223 microg L(-1) and lead (Pb) 10,590 microg L(-1), which greatly exceeded the Korean stream water guidelines. Remediation focused on the mine tailing piles largely improved the stream water qualities. However, there have still been quality problems for the waters containing relatively high concentrations of As (6-174 microg L(-1)), Cd (1-46 microg L(-1)) and Pb (2-26 microg L(-1)). Rainfall infiltration into the mine tailing piles resulted in an increase of heavy metals in the stream waters due to direct discharge of waste effluent, while dilution of the contaminated stream waters improved the water quality due to mixing with metal free rain waters. Levels of As, Cu and chromium (Cr) largely decreased after heavy rain but that of Pb was rather elevated. The stream waters were characterized by high concentrations of calcium (Ca) and sulfate (SO(4)), which were derived from dissolution and leaching of carbonate and sulfide minerals. It was observed that the proportions of Ca and SO(4) increased while those of bicarbonate (HCO(3)) and sodium and potassium (Na+K) decreased after a light rainfall event. Most interestingly, the reverse was generally detected for the groundwaters. The zinc, being the metal mined, was the most dominant heavy metal in the groundwaters (1758

  5. On the Suitability of Genetic-Based Algorithms for Data Mining

    Choenni, R.S.

    Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search

  6. Review of samples of tailings, soils and stream sediment adjacent to and downstream from the Ruth Mine, Inyo County, California

    Rytuba, James J.; Kim, Christopher S.; Goldstein, Daniel N.


    The Ruth Mine and mill are located in the western Mojave Desert in Inyo County, California (fig. 1). The mill processed gold-silver (Au-Ag) ores mined from the Ruth Au-Ag deposit, which is adjacent to the mill site. The Ruth Au-Ag deposit is hosted in Mesozoic intrusive rocks and is similar to other Au-Ag deposits in the western Mojave Desert that are associated with Miocene volcanic centers that formed on a basement of Mesozoic granitic rocks (Bateman, 1907; Gardner, 1954; Rytuba, 1996). The volcanic rocks consist of silicic domes and associated flows, pyroclastic rocks, and subvolcanic intrusions (fig. 2) that were emplaced into Mesozoic silicic intrusive rocks (Troxel and Morton, 1962). The Ruth Mine is on Federal land managed by the U.S. Bureau of Land Management (BLM). Tailings from the mine have been eroded and transported downstream into Homewood Canyon and then into Searles Valley (figs. 3, 4, 5, and 6). The BLM provided recreational facilities at the mine site for day-use hikers and restored and maintained the original mine buildings in collaboration with local citizen groups for use by visitors (fig. 7). The BLM requested that the U.S. Geological Survey (USGS), in collaboration with Chapman University, measure arsenic (As) and other geochemical constituents in soils and tailings at the mine site and in stream sediments downstream from the mine in Homewood Canyon and in Searles Valley (fig. 3). The request was made because initial sampling of the site by BLM staff indicated high concentrations of As in tailings and soils adjacent to the Ruth Mine. This report summarizes data obtained from field sampling of mine tailings and soils adjacent to the Ruth Mine and stream sediments downstream from the mine on June 7, 2009. Our results permit a preliminary assessment of the sources of As and associated chemical constituents that could potentially impact humans and biota.

  7. Soil Erosion from Agriculture and Mining: A Threat to Tropical Stream Ecosystems

    Jan H. Mol


    Full Text Available In tropical countries soil erosion is often increased due to high erodibility of geologically old and weathered soils; intensive rainfall; inappropriate soil management; removal of forest vegetation cover; and mining activities. Stream ecosystems draining agricultural or mining areas are often severely impacted by the high loads of eroded material entering the stream channel; increasing turbidity; covering instream habitat and affecting the riparian zone; and thereby modifying habitat and food web structures. The biodiversity is severely threatened by these negative effects as the aquatic and riparian fauna and flora are not adapted to cope with excessive rates of erosion and sedimentation. Eroded material may also be polluted by pesticides or heavy metals that have an aggravating effect on functions and ecosystem services. Loss of superficial material and deepening of erosion gullies impoverish the nutrient and carbon contents of the soils; and lower the water tables; causing a “lose-lose” situation for agricultural productivity and environmental integrity. Several examples show how to interrupt this vicious cycle by integrated catchment management and by combining “green” and “hard” engineering for habitat restoration. In this review; we summarize current findings on this issue from tropical countries with a focus on case studies from Suriname and Brazil.

  8. Analysis on different Data mining Techniques and algorithms used in IOT

    Shweta Bhatia


    Full Text Available In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some important algorithm has also been reviewed here of each functionalities that show advantages and limitations as well as some new algorithm that are in research direction. Here we had represent knowledge view of data mining in IOT.

  9. Solving for the RC4 stream cipher state register using a genetic algorithm

    Benjamin Ferriman


    Full Text Available The RC4 stream cipher has shown to be quite resilient to cryptanalysis for the 26 years it has been around. The algorithm is still one of the most widely used methods of encryption over the Internet today being implemented through the Secure Socket Layer and Transport Layer Security protocols. Genetic algorithms are a sub-class of evolutionary algorithms that have been used to help solve many different problems of optimization in a variety of disciplines. In this paper we will examine the abilities of the genetic algorithm as a tool to help solve the permutation that is stored as the state register of the RC4 stream cipher. Finally, we will show that on average the genetic algorithm can solve 100% of the keystream in 2121:5 generations.

  10. Mining

    Khairullah Khan


    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  11. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan


    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  12. pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts

    Jyoti Rani; Ab Rauf Shah; Srinivasan Ramachandran


    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.r-


    Grace Lamudur Arta Sihombing


    Full Text Available Confidentiality of data is very important in communication. Many cyber crimes that exploit security holes for entry and manipulation. To ensure the security and confidentiality of the data, required a certain technique to encrypt data or information called cryptography. It is one of the components that can not be ignored in building security. And this research aimed to analyze the hybrid cryptography with symmetric key by using a stream cipher algorithm and asymmetric key by using RSA (Rivest Shamir Adleman algorithm. The advantages of hybrid cryptography is the speed in processing data using a symmetric algorithm and easy transfer of key using asymmetric algorithm. This can increase the speed of transaction processing data. Stream Cipher Algorithm using the image digital signature as a keys, that will be secured by the RSA algorithm. So, the key for encryption and decryption are different. Blum Blum Shub methods used to generate keys for the value p, q on the RSA algorithm. It will be very difficult for a cryptanalyst to break the key. Analysis of hybrid cryptography stream cipher and RSA algorithms with digital signatures as a key, indicates that the size of the encrypted file is equal to the size of the plaintext, not to be larger or smaller so that the time required for encryption and decryption process is relatively fast.

  14. Minería de datos sobre streams de redes sociales, una herramienta al servicio de la Bibliotecología = Data Mining Streams of Social Networks, A Tool to Improve The Library Services

    Sonia Jaramillo Valbuena


    , Facebook, RSS feeds and blogs, generate a large amount of unstructured data streams. They can be used to the problem of mining topic-specific influence, graph mining, opinion mining and recommender systems, thus achieving that libraries can obtain maximum benefit from the use of Information and Communication Technologies. From the perspective of data stream mining, the processing of these streams poses significant challenges. The algorithms must be adapted to problems such as: high arrival rate, memory requirements without restrictions, diverse sources of data and concept-drift. In this work, we explore the current state-of-the-art solutions of data stream mining originating from social networks, specifically, Facebook and Twitter. We present a review of the most representative algorithms and how they contribute to knowledge discovery in the area of librarianship. We conclude by presenting some of the problems that are the subject of active research.

  15. A Multiplexing Algorithm of Multiple Elementary Streams Based on Virtual Buffer Control

    YI Zhixiong; ZOU Xuecheng; LIU Weizhong; CHEN Weibing


    The paper presents a prototype of virtual decoder of the transport stream's system target decoder (T-STD).By connecting the coding model and decoding model, and feeding the overflow of decoding buffer back to control coding, we have got a self-adaptive coding model, and propose an algorithm of multiplexing multiple elementary streams to a transport stream based on the principle of virtual buffer controlling strategy. The transport stream (TS) which uses this method passes the test of software unzipping and set-top-box (STB) playing, and all of the analyzing parameters which are detected by code analyzer accord with the standard of MPEG-2. Some problems that playing time becomes longer and mu-tiple TS streaming can not be fit for all the players are also analyzed.

  16. Vascular riffle flora of Appalachian streams: the ecology and effects of acid mine drainage on Justificia americana (L. ) Vahl

    Koryak, M.; Reilly, R.J.


    Justicia americana is a stout-based colonial plant, abundant in most of the larger, low to moderate gradient streams of the upper Ohio River basin. The distribution of J. americana is related to acid drainage from bituminous coal mining operations in the upper Ohio River drainage basin. Possible fluvial and biological consequences of the colonization or absence of Justicia are considered. Luxuriant growths were noted on gravel bars and riffles of larger, unpolluted streams in the basin. Acid mine drainage severely depresses the growth of the plant, leaving gravel shoals and riffles in the acid streams either barren or dominated by other emergent species. Particular among these new species is Elecocharis acicularis. The elimination of J. americana from suitable habitat adversely affects channel morphology, substrate composition, general aesthetic quality and aquatic stream life in the region. 16 references, 2 figures, 3 tables.

  17. AMJoin: An Advanced Join Algorithm for Multiple Data Streams Using a Bit-Vector Hash Table

    Kwon, Tae-Hyung; Kim, Hyeon-Gyu; Kim, Myoung-Ho; Son, Jin-Hyun

    A multiple stream join is one of the most important but high cost operations in ubiquitous streaming services. In this paper, we propose a newly improved and practical algorithm for joining multiple streams called AMJoin, which improves the multiple join performance by guaranteeing the detection of join failures in constant time. To achieve this goal, we first design a new data structure called BiHT (Bit-vector Hash Table) and present the overall behavior of AMJoin in detail. In addition, we show various experimental results and their analyses for clarifying its efficiency and practicability.

  18. An efficient reversible privacy-preserving data mining technology over data streams.

    Lin, Chen-Yi; Kao, Yuan-Hung; Lee, Wei-Bin; Chen, Rong-Chang


    With the popularity of smart handheld devices and the emergence of cloud computing, users and companies can save various data, which may contain private data, to the cloud. Topics relating to data security have therefore received much attention. This study focuses on data stream environments and uses the concept of a sliding window to design a reversible privacy-preserving technology to process continuous data in real time, known as a continuous reversible privacy-preserving (CRP) algorithm. Data with CRP algorithm protection can be accurately recovered through a data recovery process. In addition, by using an embedded watermark, the integrity of the data can be verified. The results from the experiments show that, compared to existing algorithms, CRP is better at preserving knowledge and is more effective in terms of reducing information loss and privacy disclosure risk. In addition, it takes far less time for CRP to process continuous data than existing algorithms. As a result, CRP is confirmed as suitable for data stream environments and fulfills the requirements of being lightweight and energy-efficient for smart handheld devices.

  19. Call Admission Control Algorithm for pre-stored VBR video streams

    Tryfonas, Christos; Mehler, Andrew; Skiena, Steven


    We examine the problem of accepting a new request for a pre-stored VBR video stream that has been smoothed using any of the smoothing algorithms found in the literature. The output of these algorithms is a piecewise constant-rate schedule for a Variable Bit-Rate (VBR) stream. The schedule guarantees that the decoder buffer does not overflow or underflow. The problem addressed in this paper is the determination of the minimal time displacement of each new requested VBR stream so that it can be accomodated by the network and/or the video server without overbooking the committed traffic. We prove that this call-admission control problem for multiple requested VBR streams is NP-complete and inapproximable within a constant factor, by reducing it from the VERTEX COLOR problem. We also present a deterministic morphology-sensitive algorithm that calculates the minimal time displacement of a VBR stream request. The complexity of the proposed algorithm make it suitable for real-time determination of the time displacem...

  20. Study on Multi-stream Heat Exchanger Network Synthesis with Parallel Genetic/Simulated Annealing Algorithm

    魏关锋; 姚平经; LUOXing; ROETZELWilfried


    The multi-stream heat exchanger network synthesis (HENS) problem can be formulated as a mixed integer nonlinear programming model according to Yee et al. Its nonconvexity nature leads to existence of more than one optimum and computational difficulty for traditional algorithms to find the global optimum. Compared with deterministic algorithms, evolutionary computation provides a promising approach to tackle this problem. In this paper, a mathematical model of multi-stream heat exchangers network synthesis problem is setup. Different from the assumption of isothermal mixing of stream splits and thus linearity constraints of Yee et al., non-isothermal mixing is supported. As a consequence, nonlinear constraints are resulted and nonconvexity of the objective function is added. To solve the mathematical model, an algorithm named GA/SA (parallel genetic/simulated annealing algorithm) is detailed for application to the multi-stream heat exchanger network synthesis problem. The performance of the proposed approach is demonstrated with three examples and the obtained solutions indicate the presented approach is effective for multi-stream HENS.

  1. Activity of microorganisms in acid mine water. I. Influence of acid water on aerobic heterotrophs of a normal stream.

    Tuttle, J H; Randles, C I; Dugan, P R


    Comparison of microbial content of acid-contaminated and nonacid-contaminated streams from the same geographical area indicated that nonacid streams contained relatively low numbers of acid-tolerant heterotrophic microorganisms. The acid-tolerant aerobes survived when acid entered the stream and actually increased in number to about 2 x 10(3) per ml until the pH approached 3.0. The organisms then represented the heterotrophic aerobic microflora of the streams comprised of a mixture of mine drainage and nonacid water. A stream which was entirely acid drainage did not have a similar microflora. Most gram-positive aerobic and anaerobic bacteria died out very rapidly in acidic water, and they comprised a very small percentage of the microbial population of the streams examined. Iron- and sulfur-oxidizing autotrophic bacteria were present wherever mine water entered a stream system. The sulfur-oxidizing bacteria predominated over iron oxidizers. Ecological data from the field were verified by laboratory experiments designed to simulate stream conditions.

  2. A Frequent Pattern Mining Algorithm for Feature Extraction of Customer Reviews

    Seyed Hamid Ghorashi


    Full Text Available Online shoppers often have different idea about the same product. They look for the product features that are consistent with their goal. Sometimes a feature might be interesting for one, while it does not make that impression for someone else. Unfortunately, identifying the target product with particular features is a tough task which is not achievable with existing functionality provided by common websites. In this paper, we present a frequent pattern mining algorithm to mine a bunch of reviews and extract product features. Our experimental results indicate that the algorithm outperforms the old pattern mining techniques used by previous researchers.

  3. Algorithm of Intrusion Detection Based on Data Mining and Its Implementation

    SUN Hai-bin; XU Liang-xian; CHEN Yan-hua


    Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined from audit logs. Some attributes in audit logs are important for detecting intrusion but their values are distributed skewedly. A relative support concept is proposed to deal with such situation. To mine class association rules effectively, an algorithms based on FP-tree is exploited. Experiment result proves that this method has better performance.

  4. Multi-objective Genetic Algorithm for Association Rule Mining Using a Homogeneous Dedicated Cluster of Workstations

    S. Dehuri


    Full Text Available This study presents a fast and scalable multi-objective association rule mining technique using genetic algorithm from large database. The objective functions such as confidence factor, comprehensibility and interestingness can be thought of as different objectives of our association rule-mining problem and is treated as the basic input to the genetic algorithm. The outcomes of our algorithm are the set of non-dominated solutions. However, in data mining the quantity of data is growing rapidly both in size and dimensions. Furthermore, the multi-objective genetic algorithm (MOGA tends to be slow in comparison with most classical rule mining methods. Hence, to overcome these difficulties we propose a fast and scalability technique using the inherent parallel processing nature of genetic algorithm and a homogeneous dedicated network of workstations (NOWs. Our algorithm exploit both data and control parallelism by distributing the data being mined and the population of individuals across all available processors. The experimental result shows that the algorithm has been found suitable for large database with an encouraging speed up.

  5. A Streaming Distance Transform Algorithm for Neighborhood-Sequence Distances

    Nicolas Normand


    Full Text Available We describe an algorithm that computes a “translated” 2D Neighborhood-Sequence Distance Transform (DT using a look up table approach. It requires a single raster scan of the input image and produces one line of output for every line of input. The neighborhood sequence is specified either by providing one period of some integer periodic sequence or by providing the rate of appearance of neighborhoods. The full algorithm optionally derives the regular (centered DT from the “translated” DT, providing the result image on-the-fly, with a minimal delay, before the input image is fully processed. Its efficiency can benefit all applications that use neighborhood- sequence distances, particularly when pipelined processing architectures are involved, or when the size of objects in the source image is limited.

  6. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

    Kim Sun


    Full Text Available Abstract Background Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable, we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.

  7. A genetic algorithm to reduce stream channel cross section data

    Berenbrock, C.


    A genetic algorithm (GA) was used to reduce cross section data for a hypothetical example consisting of 41 data points and for 10 cross sections on the Kootenai River. The number of data points for the Kootenai River cross sections ranged from about 500 to more than 2,500. The GA was applied to reduce the number of data points to a manageable dataset because most models and other software require fewer than 100 data points for management, manipulation, and analysis. Results indicated that the program successfully reduced the data. Fitness values from the genetic algorithm were lower (better) than those in a previous study that used standard procedures of reducing the cross section data. On average, fitnesses were 29 percent lower, and several were about 50 percent lower. Results also showed that cross sections produced by the genetic algorithm were representative of the original section and that near-optimal results could be obtained in a single run, even for large problems. Other data also can be reduced in a method similar to that for cross section data.

  8. Fast Algorithms of Mining Probability Functional Dependency Rules in Relational Database

    TAO Xiaopeng; ZHOU Aoying; HU Yunfa


    This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presented to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.

  9. Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

    YI Yunfei; LIN Fang; QIN Jun


    Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules. The paper gives the method to encode for the rules, the fitness function, the selecting, crossover, mutation and migration operator for the DAGA at the same time are designed.

  10. 基于半结构特征分割的 Web数据挖掘算法%Web Data Mining Algorithm Based on Semi Structure Feature Segmentation



    提出一种基于半结构特征分割的Web数据挖掘算法。进行Web热点数据的信息流信号模型构建,对Web热点信息流进行包络特征分解,为了提高数据挖掘的纯度和抗干扰性能,采用前馈调制滤波器进行数据干扰滤波,采用半结构特征分割进行Web热点数据的特征提取,实现数据挖掘算法改进。仿真结果表明,采用该算法能提高对Web数据特征的检测性性能,数据挖掘中受到的旁瓣干扰较小,挖掘精度较高,性能优于传统算法。%A Web data mining algorithm based on semi structure feature segmentation is proposed .The information stream signal model of Web hot date is constructed and the characteristic erwelope decomposition of Web hot information stream is finished ,in order to improve the purity of data mining and the anti‐interference performance by feedforward filter modulation data interference filter ,using semi structural feature segmentation for web hot number according to feature extraction . The data mining algorithm is realized . Simulation results show that the new algorithm can improve the detection capability of characteristics of Web data , data mining has little sidelobe interference ,mining precision is high ,performance is better than traditional algorithm .

  11. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Amineh Amini


    Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  12. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut


    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  13. A Streaming Algorithm for Online Estimation of Temporal and Spatial Extent of Delays

    Kittipong Hiriotappa


    Full Text Available Knowing traffic congestion and its impact on travel time in advance is vital for proactive travel planning as well as advanced traffic management. This paper proposes a streaming algorithm to estimate temporal and spatial extent of delays online which can be deployed with roadside sensors. First, the proposed algorithm uses streaming input from individual sensors to detect a deviation from normal traffic patterns, referred to as anomalies, which is used as an early indication of delay occurrence. Then, a group of consecutive sensors that detect anomalies are used to temporally and spatially estimate extent of delay associated with the detected anomalies. Performance evaluations are conducted using a real-world data set collected by roadside sensors in Bangkok, Thailand, and the NGSIM data set collected in California, USA. Using NGSIM data, it is shown qualitatively that the proposed algorithm can detect consecutive occurrences of shockwaves and estimate their associated delays. Then, using a data set from Thailand, it is shown quantitatively that the proposed algorithm can detect and estimate delays associated with both recurring congestion and incident-induced nonrecurring congestion. The proposed algorithm also outperforms the previously proposed streaming algorithm.

  14. Study and Implementation of Web Mining Classification Algorithm Based on Building Tree of Detection Class Threshold

    CHEN Jun-jie; SONG Han-tao; LU Yu-chang


    A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4. 5 algorithm, the disadvantage of excessive adaptation in C4. 5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.

  15. Mining the IPTV Channel Change Event Stream to Discover Insight and Detect Ads

    Matej Kren


    Full Text Available IPTV has been widely deployed throughout the world, bringing significant advantages to users in terms of the channel offering, video on demand, and interactive applications. One aspect that has been often neglected is the ability of precise and unobtrusive telemetry. TV set-top boxes that are deployed in modern IPTV systems can be thought of as capable sensor nodes that collect vast amounts of data, representing both the user activity and the quality of service delivered by the system itself. In this paper we focus on the user-generated events and analyze how the data stream of channel change events received from the entire IPTV network can be mined to obtain insight about the content. We demonstrate that it is possible to predict the occurrence of TV ads with high probability and show that the approach could be extended to model the user behavior and classify the viewership in multiple dimensions.

  16. Research on coal-mine gas monitoring system controlled by annealing simulating algorithm

    Zhou, Mengran; Li, Zhenbi


    This paper introduces the principle and schematic diagram of gas monitoring system by means of infrared method. Annealing simulating algorithm is adopted to find the whole optimum solution and the Metroplis criterion is used to make iterative algorithm combination optimization by control parameter decreasing aiming at solving large-scale combination optimization problem. Experiment result obtained by the performing scheme of realizing algorithm training and flow of realizing algorithm training indicates that annealing simulating algorithm applied to identify gas is better than traditional linear local search method. It makes the algorithm iterate to the optimum value rapidly so that the quality of the solution is improved efficiently. The CPU time is shortened and the identifying rate of gas is increased. For the mines with much-gas gushing fatalness the regional danger and disaster advanced forecast can be realized. The reliability of coal-mine safety is improved.

  17. Pollution of the stream waters and sediments associated with the Crucea uranium mine (East Carpathians, Romania)

    Petrescu, L.; Bilal, E.; Iatan, E. L.


    Uranium and thorium are omnipresent in our environment. Various anthropogenic activities involving the processing or use of materials rich in uranium may modify the natural abundance of uranium in water. The study is related to uranium mineralization located within Crucea ore deposit, in the East Carpathians, Romania. The Crucea uranium ore deposit is located in the eastern part of the Bistrita Mountains (40 Km southeast of the town of Vatra Dornei) in the headwaters of Crucea, Lesu and Livezi valleys. At present, this is the largest uranium mine in the country. In the past, the mining area covered 18 km2, but was gradually overtaken by logging activities. The exploration and mining facilities include thirty-two galleries, situated between 780 and 1040 m above sea level. Radioactive waste resulted from mining are disposed next to the mining facilities. The waste rock was disposed in piles of variable size that are spread over an area of 364,000 m2. Older dumps (18) have been already naturally reclaimed by forest vegetation. The vegetation cover played an important role in stabilizing the waste dump cover and in slowing down the uranium migration processes. A number of 46 water samples were taken in order to evaluate the impact of ore deposit (including its exploitation process) on the chemical composition of waters down to the exploitation galleries. The sediment samples were collected at 16 sampling points from the bottom of the studied stream waters. ICP-OES, XRF and IC methods was used to evaluate the impact of uranium mine dumps on the surface waters from Crucea region. According to the analytical data the stream waters showed a Ca - carbonate character. In relation to salinity, the pH and the anion NO3-, CO32-and SO42- contents display generally non-linear relationships with chloride. Uranium is the most significant trace element in the river waters nearby the waste rock dumps, sometimes reaching levels up to 1-mgṡL-1, well in excess of the Romanian

  18. Environmental impact of mining activities in the Lousal area (Portugal): chemical and diatom characterization of metal-contaminated stream sediments and surface water of Corona stream.

    Luís, Ana Teresa; Teixeira, Paula; Almeida, Salomé Fernandes Pinheiro; Matos, João Xavier; da Silva, Eduardo Ferreira


    Lousal mine is a typical "abandoned mine" with all sorts of problems as consequence of the cessation of the mining activity and lack of infrastructure maintenance. The mine is closed at present, but the heavy metal enriched tailings remain at the surface in oxidizing conditions. Surface water and stream sediments revealed much higher concentrations than the local geochemical background values, which the "Contaminated Sediment Standing Team" classifies as very toxic. High concentrations of Cu, Pb, Zn, As, Cd and Hg occurred within the stream sediments downstream of the tailings sites (up to: 817 mg kg(-1) As, 6.7 mg kg(-1) Cd, 1568 mg kg(-1) Cu, 1059 mg kg(-1) Pb, 82.4 mg kg(-1) Sb, 4373 mg kg(-1) Zn). The AMD waters showed values of pH ranging from 1.9 to 2.9 and concentrations of 9249 to 20,700 mg L(-1) SO(4)(-2), 959 to 4830 mg L(-1) Fe and 136 to 624 mg L(-1) Al. Meanwhile, the acid effluents and mixed stream waters also carried high contents of SO(4)(2-,) Fe, Al, Cu, Pb, Zn, Cd, and As, generally exceeding the Fresh Water Aquatic Life Acute Criteria. Negative impacts in the diatom communities growing at different sites along a strong metal pollution gradient were shown through Canonical Correspondence Analysis: in the sites influenced by Acid Mine Drainage (AMD), the dominant taxon was Achnanthidium minutissimum. However, Pinnularia acoricola was the dominant species when the environmental conditions were extremely adverse: very low pH and high metal concentrations (sites 2 and 3). Teratological forms of Achnanthidium minutissimum (Kützing) Czarnecki, Brachysira vitrea (Grunow) Ross in Hartley, Fragilaria rumpens (Kützing) G. W. F. Carlson and Nitzschia hantzschiana Rabenhorst were found. A morphometric study of B. vitrea showed that a decrease in size was evident at the most contaminated sites. These results are evidence of metal and acidic pollution.

  19. Preservation procedures for arsenic speciation in a stream affected by acid mine drainage in southwestern Spain.

    Sánchez-Rodas, Daniel; Oliveira, Vanesa; Sarmiento, Aguasanta M; Gómez-Ariza, José Luis; Nieto, José Miguel


    A preservation study has been performed for arsenic speciation in surface freshwaters affected by acid mine drainage (AMD), a pollution source characterized by low pH and high metallic content. Two sample preservation procedures described in the literature were attempted using opaque glass containers and refrigeration: i) addition of 0.25 mol L(-1) EDTA to the samples, which maintained the stability of the arsenic species for 3 h; and ii) in situ sample clean-up with a cationic exchange resin, in order to reduce the metallic load, which resulted in a partial co-adsorption of arsenic onto Fe precipitates. A new proposed method was also tried: sample acidification with 6 mol L(-1) HCl followed by in situ clean-up with a cationic exchange resin, which allowed a longer preservation time of at least 48 h. The proposed method was successfully applied to water samples with high arsenic content, taken from the Aguas Agrias Stream (Odiel River Basin, SW Spain), which is severely affected by AMD that originates at the nearby polymetallic sulfide mine of Tharsis. The speciation results obtained by liquid chromatography-hydride generation-atomic fluorescence spectrometry (HPLC-HG-AFS) indicated that during the summer the main arsenic species was As(V) at the hundred microg L(-1) level, followed by DMA (dimethyl arsenic) and As(III) below the ten microg L(-1) level. In winter, As(V) and As(III) increased at least fivefold, whereas the DMA was not detected.

  20. Pattern Discovery and Change Detection of Online Music Query Streams

    Li, Hua-Fu

    In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.

  1. Survey on algorithm of mining frequent itemsets from uncertain data%不确定数据频繁项集挖掘方法综述

    汪金苗; 张龙波; 邓齐志; 王凤英; 王勇


    近几年来,不确定数据广泛出现在传感器网络、Web应用等领域中.不确定数据挖掘已经成为了新的研究热点,主要包括聚类、分类、频繁项集挖掘,孤立点检测等方面,其中频繁项集挖掘是重点研究的问题之一,综述了传统的频繁项集挖掘的两类基本算法,分析了在此基础上提出的适用于不确定数据以及不确定数据流的频繁项集挖掘的方法,并探讨了今后可能的研究方向.%Uncertain data is widespread in some application fields such as sensor network,Web applications and so on.Uncertain data mining has become a new hotspot.Uncertain data mining includes clustering, classification, frequent itemsets mining, outlier detection, etc., in which frequent itemsets mining is one of the focus issues.This paper introduces two kinds of basic algorithms of mining frequent itemsets from traditional data: Apriori algorithm and FP-growth algorithm, and then analyses the methods proposed for mining frequent itemsets from uncertain data and uncertain data streams.A summary of research direction on uncertain data frequent itemsets mining is given.

  2. Transmission Algorithm with QoS Considerations for a Sustainable MPEG Streaming Service

    Sang-Hyong Kim


    Full Text Available With the proliferation of heterogeneous networks, there is a need to provide multimedia stream services in a sustainable manner. It is especially critical to maintain the Quality of Service (QoS standards. Existing multimedia streaming services have been studied to guarantee QoS on the receiving side. QoS has not been ensured due to the fact that the loss of streaming data to be transmitted has not been considered in network conditions. With an algorithm that considers the QoS and can reduce the overhead of the network, it will be possible to reduce the transmission error and wastage of communication network resources. In this paper, we propose a scheme that improves the reliability of multimedia transmissions by using an adaptive algorithm that switches between UDP (User Datagram Protocol and TCP (Transmission Control Protocol based on the size of the data. In addition, we present a method that retransmits essential portions of the multimedia data, thus improving transmission efficiency. We simulate an MPEG (Moving Picture Experts Group stream service and evaluate the performance of the proposed adaptive MPEG stream service.

  3. Linked-Tree: An Aggregate Query Algorithm Based on Sliding Window over Data Stream

    YU Yaxin; WANG Guoren; SU Dong; ZHU Xinhua


    How to process aggregate queries over data streams efficiently and effectively have been becoming hot research topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.

  4. Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

    Řeh{\\ru}řek, Radim


    With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \\emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \\emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.

  5. Platinum group elements in stream sediments of mining zones: The Hex River (Bushveld Igneous Complex, South Africa)

    Almécija, Clara; Cobelo-García, Antonio; Wepener, Victor; Prego, Ricardo


    Assessment of the environmental impact of platinum group elements (PGE) and other trace elements from mining activities is essential to prevent potential environmental risks. This study evaluates the concentrations of PGE in stream sediments of the Hex River, which drains the mining area of the Bushveld Igneous Complex (South Africa), at four sampling points. Major, minor and trace elements (Fe, Ca, Al, Mg, Mn, V, Cr, Zn, Cu, As, Co, Ni, Cd, and Pb) were analyzed by FAAS and ETAAS in suspended particulate matter and different sediment fractions (rocks. The highest concentrations were observed closer to the mining area, decreasing with distance and in the cycle, increasing the presence of PGE in the fine fraction of river sediments. We propose that indicators such as airborne particulate matter, and soil and river sediment quality, should be added to the protocols for evaluating the sustainability of mining activities.

  6. Neural Network Based Algorithm and Simulation of Information Fusion in the Coal Mine


    The concepts of information fusion and the basic principles of neural networks are introduced.Neural networks were introduced as a way of building an information fusion model in a coal mine monitoring system.This assures the accurate transmission of the multi-sensor information that comes from the coal mine monitoring systems.The information fusion mode was analyzed.An algorithm was designed based on this analysis and some simulation results were given.Finally, conclusions that could provide auxiliary decision making information to the coal mine dispatching officers were presented.

  7. Selected Metals in Sediments and Streams in the Oklahoma Part of the Tri-State Mining District, 2000-2006

    Andrews, William J.; Becker, Mark F.; Mashburn, Shana L.; Smith, S. Jerrod


    The abandoned Tri-State mining district includes 1,188 square miles in northeastern Oklahoma, southeastern Kansas, and southwestern Missouri. The most productive part of the Tri-State mining district was the 40-square mile part in Oklahoma, commonly referred to as 'the Picher mining district' in north-central Ottawa County, Oklahoma. The Oklahoma part of the Tri-State mining district was a primary producing area of lead and zinc in the United States during the first half of the 20th century. Sulfide minerals of cadmium, iron, lead, and zinc that remained in flooded underground mine workings and in mine tailings on the land surface oxidized and dissolved with time, forming a variety of oxide, hydroxide, and hydroxycarbonate metallic minerals on the land surface and in streams that drain the district. Metals in water and sediments in streams draining the mining district can potentially impair the habitat and health of many forms of aquatic and terrestrial life. Lakebed, streambed and floodplain sediments and/or stream water were sampled at 30 sites in the Oklahoma part of the Tri-State mining district by the U.S. Geological Survey and the Oklahoma Department of Environmental Quality from 2000 to 2006 in cooperation with the U.S. Environmental Protection Agency, and the Quapaw and Seneca-Cayuga Tribes of Oklahoma. Aluminum and iron concentrations of several thousand milligrams per kilogram were measured in sediments collected from the upstream end of Grand Lake O' the Cherokees. Manganese and zinc concentrations in those sediments were several hundred milligrams per kilogram. Lead and cadmium concentrations in those sediments were about 10 percent and 0.1 percent of zinc concentrations, respectively. Sediment cores collected in a transect across the floodplain of Tar Creek near Miami, Oklahoma, in 2004 had similar or greater concentrations of those metals than sediment cores collected at the upstream end of Grand Lake O' the Cherokees. The greatest concentrations of

  8. A Novel High Dimensional and High Speed Data Streams Algorithm: HSDStream

    Irshad Ahmed


    Full Text Available This paper presents a novel high speed clustering scheme for high-dimensional data stream. Data stream clustering has gained importance in different applications, for example, network monitoring, intrusion detection, and real-time sensing. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high dimensionality make it non-trivial. In order to tackle this problem, projected subspace within the high dimensions and limited window sized data per unit of time are used for clustering purpose. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream which employs exponential mov-ing averages to reduce the size of the memory and speed up the processing of projected subspace data stream. It works in three steps: i initialization, ii real-time maintenance of core and outlier micro-clusters, and iii on-demand offline generation of the final clusters. The proposed algorithm is tested against high dimensional density-based projected clustering (HDDStream for cluster purity, memory usage, and the cluster sensitivity. Experi-mental results are obtained for corrected KDD intrusion detection dataset. These results show that HSDStream outperforms the HDDStream in all performance metrics, especially, the memory usage and the processing speed.

  9. Interactive evolutionary algorithms and data mining for drug design

    Lameijer, Eric Marcel Wubbo


    One of the main problems of drug design is that it is quite hard to discover compounds that have all the required properties to become a drug (efficacy against the disease, good biological availability, low toxicity). This thesis describes the use of data mining and interactive evolutionary algorith

  10. A New Algorithm for Cartographic Simplification of Streams and Lakes Using Deviation Angles and Error Bands

    Türkay Gökgöz


    Full Text Available Multi-representation databases (MRDBs are used in several geographical information system applications for different purposes. MRDBs are mainly obtained through model and cartographic generalizations. Simplification is the essential operator of cartographic generalization, and streams and lakes are essential features in hydrography. In this study, a new algorithm was developed for the simplification of streams and lakes. In this algorithm, deviation angles and error bands are used to determine the characteristic vertices and the planimetric accuracy of the features, respectively. The algorithm was tested using a high-resolution national hydrography dataset of Pomme de Terre, a sub-basin in the USA. To assess the performance of the new algorithm, the Bend Simplify and Douglas-Peucker algorithms, the medium-resolution hydrography dataset of the sub-basin, and Töpfer’s radical law were used. For quantitative analysis, the vertex numbers, the lengths, and the sinuosity values were computed. Consequently, it was shown that the new algorithm was able to meet the main requirements (i.e., accuracy, legibility and aesthetics, and storage.

  11. Investigation of Web Mining Optimization Using Microbial Genetic Algorithm

    Dipali Tungar


    Full Text Available In today's modern internet era peopleneed searching on the web and finding relevant information on the web to be efficient and fast. But traditional search engines like Google suppose to be more intelligent, still use the traditional crawling algorithms to find data relevant to the search query. But most of the times it returns irrelevant data as well which becomes confusing for the user. In a normal XML data the user inputs the search query in terms of a keyword or a question and the answer to the search query should be more precise and more relevant. So, using the traditional crawling algorithms over XML data would lead to irrelevant results. Genetic algorithms are the modern algorithms which replicates the Darwinian theory of the natural evolution. The genetic algorithms are best suited for the traditional search problem as the genetic algorithms always tend to return quality as solution for any domain data. It would be a good approach to investigate how the genetic algorithms would be suitable for the search over the XML data of different domains. So, this system implements a steady state tournament selection Microbial Genetic Algorithm over the XML data of the different domains. This would be an investigation of how the genetic algorithm would return accurate results over XML data of different domains.

  12. Sources of alkalinity and acidity along an acid mine drainage remediated stream in SE Ohio: Hewett Fork

    Schleich, K. L.; Lopez, D. A.; Bowman, J. R.; Kruse, N. A.; Mackey, A. L.; VanDervort, D.; Korenowsky, R.


    In the remediation of acid mine drainage impacted streams, it is important to locate and quantify the sources of acidity and alkalinity inputs. These parameters affect the long-term recovery of the stream habitat. Previous studies have focused on treating the remediation of AMD as point source pollution, targeting the main acid seep for remediation. However, in the interest of biological and chemical recovery, it is important to understand how sources of alkalinity and acidity, throughout the stream, affect water and sediment quality. The Hewett Fork watershed in Southeastern Ohio is impacted by AMD from the AS-14 mine complex in Carbondale, Ohio. In attempts to remediate the stream, the water is being treated with a continuous alkaline input from a calcium oxide doser. While the section of watershed furthest downstream from the doser is showing signs of recovery, the water chemistry and aquatic life near the doser are still impacted. The objective of this study is to examine and model the chemistry of the tributaries of Hewett Fork to see how they contribute to the alkalinity and acidity budgets of the main stem of the stream. By examining the inputs of tributaries into the main stem, this project aims to understand processes occurring during remediation throughout the entire stream. Discharge was measured during a dry period in October, 2012 and at a high flow in May, 2013. Field parameters such as pH, TDS, DO, alkalinity and acidity were also determined. Low flow data collected during fall sampling shows variable flow along the stream path, the stream gains water from ground water at some points while it loses water at others, potentially due to variable elevation of the water table. Flow data collected during spring sampling shows that Hewett Fork is a gaining stream during that period with inputs from groundwater contributing to increasing flow downstream. When using this data to calculate the net alkalinity load along the stream, there are areas with alkaline

  13. Fast Algorithms for Mining Co-evolving Time Series


    pattern of an on-line magazine site that provides information about diet, nutrition and fitness. The access count increases rapidly after meal times. We...February 1996. 11, 20, 22, 161 A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for

  14. Memory Copy Optimization for Streaming Gateway Transco ding:Mo dels and Algorithms

    LI Mingzhe; WANG Jinlin; CHEN Xiao; YE Xiaozhou


    Repeated memory copy during proto-col translation inhibits capacity of a streaming media gateway. Unlike existing optimization techniques that rely on platform-specific features, this paper investigates algorithm-level platform-independent strategies. A math-ematical concept of the buf-string is proposed to model the protocol transcoding process. Based on this model three payload extraction algorithms that can reduce mem-ory copy are presented. The streaming gateway used in the Next-generation broadcasting (NGB) and the Next-generation on-demand (NGOD) system is taken as an ex-ample to demonstrate and evaluate our strategies. Experi-mental results from an x86 host and an embedded system prove that our strategies can reduce CPU overhead by 15%to 45%, and optimize the linear space complexity to a con-stant one.

  15. The effects of camera jitter for background subtraction algorithms on fused infrared-visible video streams

    Becker, Stefan; Scherer-Negenborn, Norbert; Thakkar, Pooja; Hübner, Wolfgang; Arens, Michael


    This paper is a continuation of the work of Becker et al.1 In their work, they analyzed the robustness of various background subtraction algorithms on fused video streams originating from visible and infrared cameras. In order to cover a broader range of background subtraction applications, we show the effects of fusing infrared-visible video streams from vibrating cameras on a large set of background subtraction algorithms. The effectiveness is quantitatively analyzed on recorded data of a typical outdoor sequence with a fine-grained and accurate annotation of the images. Thereby, we identify approaches which can benefit from fused sensor signals with camera jitter. Finally conclusions on what fusion strategies should be preferred under such conditions are given.

  16. Analysis of data mining classification by comparison of C4.5 and ID algorithms

    Sudrajat, R.; Irianingsih, I.; Krisnawan, D.


    The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.

  17. Performance Evaluation of Multipath Discovery Algorithms for VoD Streaming in Wireless Mesh Network

    Praful C. Ramteke


    Full Text Available Transmission and routing of video data over wireless network is a challenging task because of wireless interferences. To improve the performance of video on demand transmission over wireless networks multipath algorithms are used. IPD/S (Iterative path discovery/ selection PPD/S (Parallel Path discovery/selection are two algorithms which is used for discovering maximum number of edge disjoint paths from source to destination, for each VoD request by considering the effects of wireless interferences. In this paper performance evaluation of these multipath discovery algorithms for VoD (Video on demand streaming in wireless mesh network is presented. These algorithms are evaluated on the bases of Number of Path discovers, Packet drop ratio. Simulation result shows that PPD/S works batter as compared to IPD/S because it’s able to discover more paths than IPD/S under same circumstances

  18. Evaluating remedial alternatives for an acid mine drainage stream: Application of a reactive transport model

    Runkel, R.L.; Kimball, B.A.


    A reactive transport model based on one-dimensional transport and equilibrium chemistry is applied to synoptic data from an acid mine drainage stream. Model inputs include streamflow estimates based on tracer dilution, inflow chemistry based on synoptic sampling, and equilibrium constants describing acid/base, complexation, precipitation/dissolution, and sorption reactions. The dominant features of observed spatial profiles in pH and metal concentration are reproduced along the 3.5-km study reach by simulating the precipitation of Fe(III) and Al solid phases and the sorption of Cu, As, and Pb onto freshly precipitated iron-(III) oxides. Given this quantitative description of existing conditions, additional simulations are conducted to estimate the streamwater quality that could result from two hypothetical remediation plans. Both remediation plans involve the addition of CaCO3 to raise the pH of a small, acidic inflow from ???2.4 to ???7.0. This pH increase results in a reduced metal load that is routed downstream by the reactive transport model, thereby providing an estimate of post-remediation water quality. The first remediation plan assumes a closed system wherein inflow Fe(II) is not oxidized by the treatment system; under the second remediation plan, an open system is assumed, and Fe(II) is oxidized within the treatment system. Both plans increase instream pH and substantially reduce total and dissolved concentrations of Al, As, Cu, and Fe(II+III) at the terminus of the study reach. Dissolved Pb concentrations are reduced by ???18% under the first remediation plan due to sorption onto iron-(III) oxides within the treatment system and stream channel. In contrast, iron(III) oxides are limiting under the second remediation plan, and removal of dissolved Pb occurs primarily within the treatment system. This limitation results in an increase in dissolved Pb concentrations over existing conditions as additional downstream sources of Pb are not attenuated by

  19. Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity

    Louis, S.J.; Raines, G.L.


    We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.

  20. Improved mine blast algorithm for optimal cost design of water distribution systems

    Sadollah, Ali; Guen Yoo, Do; Kim, Joong Hoon


    The design of water distribution systems is a large class of combinatorial, nonlinear optimization problems with complex constraints such as conservation of mass and energy equations. Since feasible solutions are often extremely complex, traditional optimization techniques are insufficient. Recently, metaheuristic algorithms have been applied to this class of problems because they are highly efficient. In this article, a recently developed optimizer called the mine blast algorithm (MBA) is considered. The MBA is improved and coupled with the hydraulic simulator EPANET to find the optimal cost design for water distribution systems. The performance of the improved mine blast algorithm (IMBA) is demonstrated using the well-known Hanoi, New York tunnels and Balerma benchmark networks. Optimization results obtained using IMBA are compared to those using MBA and other optimizers in terms of their minimum construction costs and convergence rates. For the complex Balerma network, IMBA offers the cheapest network design compared to other optimization algorithms.

  1. Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model

    Yu-Lung Hsieh


    Full Text Available Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  2. Effective application of improved profit-mining algorithm for the interday trading model.

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin


    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  3. Comparative Study of Clustering Algorithms in Text Mining Context

    Abdennour Mohamed Jalil


    Full Text Available The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% of world population using internet [1]; have created a problem related of the processing of the data exchanged, which is rising exponentially and that should be automatically treated. This paper presents a classical process of knowledge discovery databases, in order to treat textual data. This process is divided into three parts: preprocessing, processing and post-processing. In the processing step, we present a comparative study between several clustering algorithms such as KMeans, Global KMeans, Fast Global KMeans, Two Level KMeans and FWKmeans. The comparison between these algorithms is made on real textual data from the web using RSS feeds. Experimental results identified two problems: the first one quality results which remain for algorithms, which rapidly converge. The second problem is due to the execution time that needs to decrease for some algorithms.

  4. Sequential Extraction Results and Mineralogy of Mine Waste and Stream Sediments Associated With Metal Mines in Vermont, Maine, and New Zealand

    Piatak, N.M.; Seal, R.R.; Sanzolone, R.F.; Lamothe, P.J.; Brown, Z.A.; Adams, M.


    We report results from sequential extraction experiments and the quantitative mineralogy for samples of stream sediments and mine wastes collected from metal mines. Samples were from the Elizabeth, Ely Copper, and Pike Hill Copper mines in Vermont, the Callahan Mine in Maine, and the Martha Mine in New Zealand. The extraction technique targeted the following operationally defined fractions and solid-phase forms: (1) soluble, adsorbed, and exchangeable fractions; (2) carbonates; (3) organic material; (4) amorphous iron- and aluminum-hydroxides and crystalline manganese-oxides; (5) crystalline iron-oxides; (6) sulfides and selenides; and (7) residual material. For most elements, the sum of an element from all extractions steps correlated well with the original unleached concentration. Also, the quantitative mineralogy of the original material compared to that of the residues from two extraction steps gave insight into the effectiveness of reagents at dissolving targeted phases. The data are presented here with minimal interpretation or discussion and further analyses and interpretation will be presented elsewhere.

  5. Sediment and epilithon metabolism and hydrolytic activity in streams affected by mountaintop removal coal mining, West Virginia, U.S.A.

    Mountaintop removal and valley filling (MTR/VF) is a method of coal mining used in the Central Appalachians. Despite regulations requiring that potential mpacts to stream function be considered in determining compensatory mitigation associated with permitted fill activities, asse...

  6. An imperialist competitive algorithm for solving the production scheduling problem in open pit mine

    Mojtaba Mokhtarian Asl


    Full Text Available Production scheduling (planning of an open-pit mine is the procedure during which the rock blocks are assigned to different production periods in a way that the highest net present value of the project achieved subject to operational constraints. The paper introduces a new and computationally less expensive meta-heuristic technique known as imperialist competitive algorithm (ICA for long-term production planning of open pit mines. The proposed algorithm modifies the original rules of the assimilation process. The ICA performance for different levels of the control factors has been studied and the results are presented. The result showed that ICA could be efficiently applied on mine production planning problem.

  7. Pattern recognition algorithms for data mining scalability, knowledge discovery and soft granular computing

    Pal, Sankar K


    Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.

  8. Ecology of endangered damselfly \\kur{Coenagrion ornatum} in post-mining streams in relation to their restoration

    TICHÁNEK, Filip


    The thesis explores various aspects of ecology of endangered damselfly Coenagrion ornatum, the specialists for lowland headwaters, in post-mining streams of Radovesicka spoil. The first part of thesis is manuscript which has been already submitted in Journal of Insect Conservation. In the first part, we focused on population estimate of the local population using capture-recapture method, and explored its habitat requirements across life stages and spatial scales. In the next part, I assess m...

  9. Developing image processing meta-algorithms with data mining of multiple metrics.

    Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott


    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.

  10. Macroinvertebrate assemblages in agricultural, mining, and urban tropical streams: implications for conservation and management.

    Mwedzi, Tongayi; Bere, Taurai; Mangadze, Tinotenda


    The study evaluated the response of macroinvertebrate assemblages to changes in water quality in different land-use settings in Manyame catchment, Zimbabwe. Four land-use categories were identified: forested commercial farming, communal farming, Great Dyke mining (GDM) and urban areas. Macroinvertebrate community structure and physicochemical variables data were collected in two seasons from 41 sites following standard methods. Although not environmentally threatening, urban and GDM areas were characterised by higher conductivity, total dissolved solids, salinity, magnesium and hardness. Chlorides, total phosphates, total nitrogen, calcium, potassium and sodium were significantly highest in urban sites whilst dissolved oxygen (DO) was significantly higher in the forested commercial faming and GDM sites. Macroinvertebrate communities followed the observed changes in water quality. Macroinvertebrates in urban sites indicated severe pollution (e.g. Chironomidae) whilst those in forested commercial farming sites and GDM sites indicated relatively clean water (e.g. Notonemouridae). Forested watersheds together with good farm management practices are important in mitigating impacts of urbanisation and agriculture. Strategies that reduce oxygen-depleting substances must be devised to protect the health of Zimbabwean streams. The study affirms the wider applicability of the South African Scoring System in different land uses.

  11. Escape-Route Planning of Underground Coal Mine Based on Improved Ant Algorithm

    Guangwei Yan


    Full Text Available When a mine disaster occurs, to lessen disaster losses and improve survival chances of the trapped miners, good escape routes need to be found and used. Based on the improved ant algorithm, we proposed a new escape-route planning method of underground mines. At first, six factors which influence escape difficulty are evaluated and a weight calculation model is built to form a weighted graph of the underground tunnels. Then an improved ant algorithm is designed and used to find good escape routes. We proposed a tunnel network zoning method to improve the searching efficiency of the ant algorithm. We use max-min ant system method to optimize the meeting strategy of ants and improve the performance of the ant algorithm. In addition, when a small part of the mine tunnel network changes, the system may fix the optimal routes and avoid starting a new processing procedure. Experiments show that the proposed method can find good escape routes efficiently and can be used in the escape-route planning of large and medium underground coal mines.

  12. Association Rule Mining for Both Frequent and Infrequent Items Using Particle Swarm Optimization Algorithm



    Full Text Available In data mining research, generating frequent items from large databases is one of the important issues and the key factor for implementing association rule mining tasks. Mining infrequent items such as relationships among rare but expensive products is another demanding issue which have been shown in some recent studies. Therefore this study considers user assigned threshold values as a constraint which helps users mine those rules which are more interesting for them. In addition, in real world users may prefer to know relationships among frequent items along with infrequent ones. The particle swarm optimization algorithm is an important heuristic technique in recent years and this study uses this technique to mine association rules effectively. If this technique considers user defined threshold values, interesting association rules can be generated more efficiently. Therefore this study proposes a novel approach which includes using particle swarm optimization algorithm to mine association rules from databases. Our implementation of the search strategy includes bitmap representation of nodes in a lexicographic tree and from superset-subset relationship of the nodes it classifies frequent items along with infrequent itemsets. In addition, this approach avoids extra calculation overhead for generating frequent pattern trees and handling large memory which store the support values of candidate item sets. Our experimental results show that this approach efficiently mines association rules. It accesses a database to calculate a support value for fewer numbers of nodes to find frequent itemsets and from that it generates association rules, which dramatically reduces search time. The main aim of this proposed algorithm is to show how heuristic method works on real databases to find all the interesting association rules in an efficient way.

  13. Development and Testing of Data Mining Algorithms for Earth Observation

    Glymour, Clark


    The new algorithms developed under this project included a principled procedure for classification of objects, events or circumstances according to a target variable when a very large number of potential predictor variables is available but the number of cases that can be used for training a classifier is relatively small. These "high dimensional" problems require finding a minimal set of variables -called the Markov Blanket-- sufficient for predicting the value of the target variable. An algorithm, the Markov Blanket Fan Search, was developed, implemented and tested on both simulated and real data in conjunction with a graphical model classifier, which was also implemented. Another algorithm developed and implemented in TETRAD IV for time series elaborated on work by C. Granger and N. Swanson, which in turn exploited some of our earlier work. The algorithms in question learn a linear time series model from data. Given such a time series, the simultaneous residual covariances, after factoring out time dependencies, may provide information about causal processes that occur more rapidly than the time series representation allow, so called simultaneous or contemporaneous causal processes. Working with A. Monetta, a graduate student from Italy, we produced the correct statistics for estimating the contemporaneous causal structure from time series data using the TETRAD IV suite of algorithms. Two economists, David Bessler and Kevin Hoover, have independently published applications using TETRAD style algorithms to the same purpose. These implementations and algorithmic developments were separately used in two kinds of studies of climate data: Short time series of geographically proximate climate variables predicting agricultural effects in California, and longer duration climate measurements of temperature teleconnections.

  14. Distributed Stable-Group Differentiated Admission Control Algorithm in Mobile Peer-to-Peer Media Streaming System

    XUEGuangtao; SHIHua; YOUJinyuan; YAOWensheng


    Mobile peer-to-peer media streaming systems are expected to become as popular as the peer-to-peer file sharing systems. In this paper, we study two key problems arising from mobile peer-to-peer media streaming: the stability of interconnection between supplying peers and requesting peers in mobile peer-to-peer streaming system; and fast capacity amplification of the entire mobile peer-to-peer streaming system. We use the Stable group algorithm to characterize user mobility in mobile ad hoc networks. Based on the stable group, we then propose a distributed Stable-group differentiated admission control algorithm (SGDACp2p), which leads to fast amplifying the system's total streaming capacity using its self-growing. At last, the extensive simulation results are presented to compare between the SGDACp2p and traditional methods to prove the superiority of the algorithm.

  15. Influences of water and substrate quality for periphyton in a montane stream affected by acid mine drainage

    Niyogi, Dev K.; McKnight, Diane M.; Lewis, William M.


    St. Kevin Gulch, a headwater stream of the Rocky Mountains of Colorado, receives acid mine drainage that maintains low pH, high concentrations of heavy metals, and high rates of metal hydroxide deposition. An acid-tolerant alga, Ulothrix sp., is present below the source of mine drainage in St. Kevin Gulch, but its biomass is limited by the deposition rates of iron hydroxides, which are especially high near the source. An experimental diversion of the mine drainage increased the quality of water and improved the substrate condition through a reduction of deposition rates. During the first year of the experiment,Ulothrix ecame abundant in this reach. During the second year, pH increased to the point at which aluminum hydroxides precipitated from the stream water onto the streambed; this change inhibited the growth of all periphyton, includingUlothrixThe deposition rate of aluminum hydroxides, however, was less than that of iron hydroxides in stream reaches with high Ulothrix biomas uggesting that metal hydroxides vary by type in their effect on periphyton.

  16. An Associate Rules Mining Algorithm Based on Artificial Immune Network for SAR Image Segmentation

    Mengling Zhao


    Full Text Available As a computational intelligence method, artificial immune network (AIN algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new classification algorithm an associate rules mining algorithm based on artificial immune network (ARM-AIN. The new method uses the association rules to represent immune cells and mine the best association rules rather than searching optimal clustering centers. The proposed algorithm has been extensively compared with artificial immune network classification (AINC algorithm, artificial immune network classification algorithm based on self-adaptive PSO (SPSO-AINC, and PSO-AINC over several large-scale data sets, target recognition of remote sensing image, and segmentation of three different SAR images. The result of experiment indicates the superiority of ARM-AIN in classification accuracy and running time.

  17. Web mining based on chaotic social evolutionary programming algorithm


    With an aim to the fact that the K-means clustering algorithm usually ends in local optimization and is hard to harvest global optimization, a new web clustering method is presented based on the chaotic social evolutionary programming (CSEP) algorithm. This method brings up the manner of that a cognitive agent inherits a paradigm in clustering to enable the cognitive agent to acquire a chaotic mutation operator in the betrayal. As proven in the experiment, this method can not only effectively increase web clustering efficiency, but it can also practically improve the precision of web clustering.

  18. Mineralogy and geochemistry of trace metals and REE in volcanic massive sulfide host rocks, stream sediments, stream waters and acid mine drainage from the Lousal mine area (Iberian Pyrite Belt, Portugal)

    Ferreira da Silva, E. [GeoBioTec - GeoBiosciences, Technologies and Engineering Research Center, Departamento de Geociencias, Universidade de Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal)], E-mail:; Bobos, I. [Departamento de Geologia, Faculdade de Ciencias da Universidade do Porto, Rua Campo Alegre 687 4169-007 Porto (Portugal); Xavier Matos, J. [Centro de Estudos Geologicos e Mineiros de Beja, Rua Frei Amador Arrais No. 39 r/c, Apartado 104, 7801-902 Beja (Portugal); Patinha, C.; Reis, A.P.; Cardoso Fonseca, E. [GeoBioTec - GeoBiosciences, Technologies and Engineering Research Center, Departamento de Geociencias, Universidade de Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal)


    Acid mine drainage represents a major source of water pollution in the Lousal area. The concentrations of trace metals and the rare earth elements (REE) in the host rocks, stream sediment, surface waters and acid mine drainage (AMD) associated with abandoned mine adits and tailings impoundments were determined, in order to fingerprint their sources and to understand their mobility and water-rock interaction. The results show that the Fe-SO{sub 4}-rich acid waters vary substantially in composition both spatially and seasonally. These waters include very low pH (mostly in the range 1.9-3.0), extreme SO{sub 4} concentrations (4635-20,070 mg L{sup -1}SO{sub 4}{sup 2-}), high metal contents (Fe, Al, Cu, Zn and Mn) and very high REE contents. The trace metal concentrations decrease downstream from the discharge points either due to precipitation of neoformed phases or to dilution. The North-American shale composite (NASC)-normalized patterns corresponding to sediment from one stream (Corona stream) show a flat tendency or are slightly enriched in light-REE (LREE). The NASC-normalized patterns corresponding to acidic mine waters show enrichment in the middle REE (MREE) with respect to the LREE and heavy REE (HREE). Moreover, the REE concentrations in acidic mine waters are 2 or 3 orders of magnitude higher than those of the surface waters. Changes of REE concentrations and variation of Eu anomaly show two spatially distinct patterns: (a) pond and spring waters with higher REE concentrations (ranging from 375 to 2870 {mu}g L{sup -1}), that records conspicuous negative Eu anomaly, and (b) seeps from tailings impoundments corresponding to lower REE concentrations than the first pattern (ranging from 350 to 1139 {mu}g L{sup -1}) with typically negative Eu anomaly. The stream water samples collected from the impacted stream during the spring show a low pH (2.8-3.1) and contain high concentrations of Fe and trace elements (up to 61 mg L{sup -1}). Also, temporal variations of

  19. Near-infrared spectroscopy (NIRS) of epilithic material in streams has a potential for monitoring impact from mining.

    Persson, Jan; Nilsson, Mats; Bigler, Christian; Brooks, Stephen J; Renberg, Ingemar


    There is an increasing demand for cost-effective methods for environmental monitoring, and here we assess the potential of near-infrared spectroscopy (NIRS) on epilithic material from streams (material covering submerged stones) as a new method for monitoring the impact of pollution from mining and mining-related industries. NIRS, a routine technique in industry, registers the chemical properties of organic material on a molecular level and can detect minute alterations in the composition of epilithic material. Epilithic samples from 65 stream sites (42 uncontaminated and 23 contaminated) in northern Sweden were analyzed. The NIRS approach was evaluated by comparing it with the results of chemical analyses and diatom analyses of the same samples. Based on Principal Component Analysis, the NIRS data distinguished contaminated from uncontaminated sites and performed slightly betterthan chemical analyses and clearly betterthan diatom analyses. Of the streams designated a priori as contaminated, 74% were identified as contaminated by NIRS, 65% were identified by chemical analysis, and 26% were identified by diatom analysis. Unlike chemical analyses of water samples, NIRS data reflect biological impacts in the streams, and the epilithic material integrates impact over time. Given that, and the simplicity of NIRS-analyses, further studies to assess the use of NIRS of epilithic material as an inexpensive environmental monitoring method are justified.

  20. A Comparison Between Data Mining Prediction Algorithms for Fault Detection(Case study: Ahanpishegan co.)

    Amooee, Golriz; Bagheri-Dehnavi, Malihe


    In the current competitive world, industrial companies seek to manufacture products of higher quality which can be achieved by increasing reliability, maintainability and thus the availability of products. On the other hand, improvement in products lifecycle is necessary for achieving high reliability. Typically, maintenance activities are aimed to reduce failures of industrial machinery and minimize the consequences of such failures. So the industrial companies try to improve their efficiency by using different fault detection techniques. One strategy is to process and analyze previous generated data to predict future failures. The purpose of this paper is to detect wasted parts using different data mining algorithms and compare the accuracy of these algorithms. A combination of thermal and physical characteristics has been used and the algorithms were implemented on Ahanpishegan's current data to estimate the availability of its produced parts. Keywords: Data Mining, Fault Detection, Availability, Predictio...

  1. Potential risk assessment in stream sediments, soils and waters after remediation in an abandoned W>Sn mine (NE Portugal).

    Antunes, I M H R; Gomes, M E P; Neiva, A M R; Carvalho, P C S; Santos, A C T


    The mining complex of Murçós belongs to the Terras de Cavaleiros Geopark, located in Trás-os-Montes region, northeast Portugal. A stockwork of NW-SE-trending W>Sn quartz veins intruded Silurian metamorphic rocks and a Variscan biotite granite. The mineralized veins contain mainly quartz, cassiterite, wolframite, scheelite, arsenopyrite, pyrite, sphalerite, chalcopyrite, galena, rare pyrrhotite, stannite, native bismuth and also later bismuthinite, matildite, joseite, roosveltite, anglesite, scorodite, zavaritskite and covellite. The exploitation produced 335t of a concentrate with 70% of W and 150t of another concentrate with 70% of Sn between 1948 and 1976. The exploitation took place mainly in four open pit mines as well as underground. Three lakes were left in the area. Remediation processes of confination and control of tailings and rejected materials and phytoremediation with macrophytes from three lakes were carried out between 2005 and 2007. Stream sediments, soils and water samples were collected in 2008 and 2009, after the remediation process. Most stream sediments showed deficiency or minimum enrichment for metals. The sequential enrichment factor in stream sediments W>Bi>As>U>Cd>Sn=Ag>Cu>Sb>Pb>Be>Zn is mainly associated with the W>Sn mineralizations. Stream sediments receiving drainage of a mine dump were found to be significantly to extremely enriched with W, while stream sediments and soils were found to be contaminated with As. Two soil samples collected around mine dumps and an open pit lake were also found to be contaminated with U. The waters from the Murçós W>Sn mine area were acidic to neutral. After the remediation, the surface waters were contaminated with F(-), Al, As, Mn and Ni and must not be used for human consumption, while open pit lake waters must also not be used for agriculture because of contamination with F(-), Al, Mn and Ni. In most waters, the As occurred as As (III), which is toxic and is easily mobilized in the drainage

  2. Background Traffic-Based Retransmission Algorithm for Multimedia Streaming Transfer over Concurrent Multipaths

    Yuanlong Cao


    Full Text Available The content-rich multimedia streaming will be the most attractive services in the next-generation networks. With function of distribute data across multipath end-to-end paths based on SCTP's multihoming feature, concurrent multipath transfer SCTP (CMT-SCTP has been regarded as the most promising technology for the efficient multimedia streaming transmission. However, the current researches on CMT-SCTP mainly focus on the algorithms related to the data delivery performance while they seldom consider the background traffic factors. Actually, background traffic of realistic network environments has an important impact on the performance of CMT-SCTP. In this paper, we firstly investigate the effect of background traffic on the performance of CMT-SCTP based on a close realistic simulation topology with reasonable background traffic in NS2, and then based on the localness nature of background flow, a further improved retransmission algorithm, named RTX_CSI, is proposed to reach more benefits in terms of average throughput and achieve high users' experience of quality for multimedia streaming services.

  3. Sentiment Knowledge Discovery in Twitter Streaming Data

    Bifet, Albert; Frank, Eibe

    Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing on classification problems, and then consider these streams for opinion mining and sentiment analysis. To deal with streaming unbalanced classes, we propose a sliding window Kappa statistic for evaluation in time-changing data streams. Using this statistic we perform a study on Twitter data using learning algorithms for data streams.

  4. Incremental DataGrid Mining Algorithm for Mobility Prediction of Mobile Users

    U. Sakthi


    Full Text Available Problem statement: Mobility prediction is the important issue in Personal Communication Systems (PCS. Mobile users moving logs are stored in data grid located in different locations. Distributed data mining algorithm is applied on this moving logs to generate the mobility pattern of mobile users. As new moving logs are added to the data grid, existing mobility pattern becomes invalid and it should be updated. One of the existing work to derive the new mobility pattern is re-executing the algorithm from scratch results in excessive computation. Approach: We had designed new incremental algorithm by maintaining infrequent mobility patterns, which avoids unnecessary scan of full database. Incremental data mining algorithm taken lesser time to compute new mobility patterns. The discovered location patterns can be used to provide various location based services to the mobile user by the application server in mobile computing environment. Data grid provided geographically distributed database for computational grid which implements incremental data mining algorithm. We built data grid system on a cluster of workstation using open source globus toolkit 4.0 and Message Passing Interface extended with Grid Services (MPICH-G2. Results: The experiments were conducted on original data sets and data were added incrementally and the computation time was recorded for each data sets. The performance improvement for increment size of 100 K was about 55% for 0.20% support count and it is increased to 60% for 0.25% support count. The performance is increased about 65% for the support count 0.30%. Conclusion: We analyzed our results with various sizes of data sets and the proof shows the time taken to generate mobility pattern by incremental mining algorithm is less than re-computing approach. In future the execution time can further be reduced by balancing the workload of grid nodes.

  5. Effects of land use and surficial geology on flow and water quality of streams in the coal-mining region of southwestern Indiana, October 1979 through September 1980

    Wilber, William G.; Renn, Danny E.; Crawford, Charles G.


    An assessment of streams in the coal-mining region of southwestern Indiana was done from October 1979 through September 1980 during stable stream flows to provide baseline hydrologic and water-quality information and to document the effect of several natural and human-induced factors on water quality in the region.

  6. Reconnaissance of stream biota and physical and chemical water quality in areas of selected land use in the coal-mining region, southwestern Indiana, 1979-80

    Wangsness, D.J.


    To help meet the goals of the Surface-Mining Control and Reclamation Act of 1977, the U.S. Geological Survey is assessing the physical, chemical, and biological characteristics of surface water within the coal-mining region of southwestern Indiana. This report discusses benthic-invertebrate and periphyticalgal communities in streams draining homogeneous-agricultural, forested, active/reclaimed-mine, reclaimed-mine, and unreclaimed-mine watersheds--and relates the biological communities to the physical and chemical characteristics of the streams. Alkalinity and pH were lower and the concentrations of dissolved solids, suspended solids, calcium, magnesium, sodium, potassium, sulfate, iron, manganese, aluminum, and zinc were higher in unreclaimed-mine watersheds than in the other land-use watersheds. Numbers and community diversity of benthic invertebrates were less at sites affected by mining than at agricultural or forested sites, owing to (1) synergistic effects of low pH, metals, and unsuitable habitat and (2) lack of colonizing drift organisms because of the small drainage area upstream from the mined area. Only a few organisms, such as the caddisflies Cheumatopsyche and Hydropsyche and the chironomids Chironomus and Cricotopus were found in streams draining mine areas.

  7. Assessment of stream bottom sediment quality in the vicinity of the Caldas uranium mine

    Oliveira, Priscila E.S. de, E-mail: [Universidade Federal de Ouro Preto (ProAmb/UFOP), Ouro Preto, MG (Brazil). Programa de Pos-Graduacao em Engenharia Ambiental; Filho, Carlos A.C.; Moreira, Rubens M.; Ramos, Maria E.A.F.; Dutra, Pedro H.; Ferreira, Vinicius V.M., E-mail: [Centro de Desenvolvimento da Tecnologia Nuclear (CDTN/CNEN-MG), Belo Horizonte (Brazil); Silva, Nivaldo C., E-mail: [Comissao Nacional de Energia Nuclear (LAPOC/CNEN-MG), Pocos de Caldas, MG (Brazil). Laboratorio de Pocos de Caldas


    An evaluation of the quality of stream bottom sediments was performed in the surroundings of the Caldas Uranium Mining and Milling Facilities (UMMF), sited on Pocos de Caldas Plateau (southeastern Brazil), to verify whether the sediments in the water bodies downstream the plant, were impacted by effluents from a large waste rock pile, named Waste Rock Pile 4 (WRP4), and from the Tailings Dam (TD). In order to perform the research, twelve sampling stations were established in the watersheds around Caldas UMMF: the Soberbo creek, the Consulta brook, and the Taquari river. One of the stations was located inside the Bacia Nestor Figueiredo, a retention pond that receives effluents from WRP4, and another in a settling tank (D2) for radium, which receives the effluents from TD. A monitoring scheme has been developed, comprising four sampling campaigns in 2010 and 2011, and the samples were analyzed for selected metals-metalloids and radionuclides, using Inductively Coupled Plasma Mass Spectrometry (ICP-MS), Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES), Ultraviolet-Visible (UV-Vis) Spectroscopy and Gamma-ray Spectrometry. The results suggest that effluents discharged from retention ponds to watercourses, causing an increase in the concentration of As, B, Ba, Cr, Mo, Mn, Pb, Zn, {sup 238}U, {sup 232}Th, {sup 226}Ra, {sup 228}Ra and {sup 210}Pb in sediments. Detailed investigation in sub-superficial layers is recommended at these locations to evaluate the need of implementing mitigation actions such as lining and constructing hydraulic barriers downstream the ponds. Actually, the UTM/Caldas operator is already implementing control measures. (author)

  8. Stream-Sediment Geochemistry in Mining-Impacted Drainages of the Yankee Fork of the Salmon River, Custer County, Idaho

    Frost, Thomas P.; Box, Stephen E.


    This reconnaissance study was undertaken at the request of the USDA Forest Service, Region 4, to assess the geochemistry, in particular the mercury and selenium contents, of mining-impacted sediments in the Yankee Fork of the Salmon River in Custer County Idaho. The Yankee Fork has been the site of hard-rock and placer mining, primarily for gold and silver, starting in the 1880s. Major dredge placer mining from the 1930s to 1950s in the Yankee Fork disturbed about a 10-kilometer reach. Mercury was commonly used in early hard-rock mining and placer operations for amalgamation and recovery of gold. During the late 1970s, feasibility studies were done on cyanide-heap leach recovery of gold from low-grade ores of the Sunbeam and related deposits. In the mid-1990s a major open-pit bulk-vat leach operation was started at the Grouse Creek Mine. This operation shut down when gold values proved to be lower than expected. Mercury in stream sediments in the Yankee Fork ranges from below 0.02 ppm to 7 ppm, with the highest values associated with old mill locations and lode and placer mines. Selenium ranges from below the detection limit for this study of 0.2 ppm to 4 ppm in Yankee Fork sediment samples. The generally elevated selenium content in the sediment samples reflect the generally high selenium contents in the volcanic rocks that underlie the Yankee Fork and the presence of gold and silver selenides in some of the veins that were exploited in the early phases of mining.

  9. Compressed domain moving object extraction algorithm for MPEG-2 video stream

    Yang, Gaobo; Wang, Xiaojing; Zhang, Zhaoyang


    In this paper, a compressed domain moving object extraction algorithm is proposed for MPEG-2 video stream. It is mainly based on the histogram analysis of motion vectors, which can be easily obtained by partially decoding the MPEG-2 video stream. The whole algorithm framework can be divided into three key steps: motion vector pre-processing, histogram analysis of motion vector and motion vector similarity based region growing for final mask generation. A piecewise cubic hermit interpolation is utilized to form a dense motion field. The outputs of region growing algorithm based on similarity matching are the final segmentation results of moving object. These final segmentation results are further smoothed and interpolated by B-spline curve estimation. Experimental results on several test sequences demonstrate that desirable segmentation results are obtained. The accuracy of segmentation results is improved obviously, nearly to pixel level accuracy because of B-spline curve representation of segmented object. For segmentation efficiency, the processing speed is about 30ms per frame, which can meet the requirements of real time applications.

  10. HUITWU:An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases

    Shi-Ming Guo; Hong Gao


    Mining high-utility itemsets (HUIs) from a transaction database refers to the discovery of itemsets with high utilities like profits. Most of existing studies discover HUIs from a transaction database in two phases. In phase 1, different overestimation methods are applied to calculate the upper bounds of the utilities of itemsets. Since the overestimated utilities of itemsets are adopted, the itemsets whose overestimated utilities are no less than a user-specified threshold are selected as candidate HUIs, and they are verified by scanning the database one more time in phase 2. However, a large number of candidate HUIs incur two problems: 1) it requires excessive memory to store these candidates;2) it needs a large amount of running time to calculate their exact utilities. Vertical data format has been applied to mine HUIs recently. However this kind of method cannot deal with transactions with the same items effectively so that the size of database cannot be reduced sufficiently. The overall performance of algorithms is degraded consequently. Thus an algorithm HUITWU is proposed in this paper for mining HUIs. A novel data structure HUITWU-Tree is adopted to efficiently calculate the utilities of itemsets in a database. Extensive studies with both sparse and dense datasets have demonstrated that our proposed algorithm is more than an order of magnitude faster and consumes less memory than the state-of-the-art algorithms.

  11. Benthic Communities of Low-Order Streams Affected by Acid Mine Drainages: A Case Study from Central Europe

    Marek Svitok


    Full Text Available Only little attention has been paid to the impact of acid mine drainages (AMD on aquatic ecosystems in Central Europe. In this study, we investigate the physico-chemical properties of low-order streams and the response of benthic invertebrates to AMD pollution in the Banská Štiavnica mining region (Slovakia. The studied streams showed typical signs of mine drainage pollution: higher conductivity, elevated iron, aluminum, zinc and copper loads and accumulations of ferric precipitates. Electric conductivity correlated strongly with most of the investigated elements (weighted mean absolute correlation = 0.95 and, therefore, can be recommended as a good proxy indicator for rapid AMD pollution assessments. The diversity and composition of invertebrate assemblages was related to water chemistry. Taxa richness decreased significantly along an AMD-intensity gradient. While moderately affected sites supported relatively rich assemblages, the harshest environmental conditions (pH < 2.5 were typical for the presence of a limited number of very tolerant taxa, such as Oligochaeta and some Diptera (Limnophyes, Forcipomyiinae. The trophic guild structure correlated significantly with AMD chemistry, whereby predators completely disappeared under the most severe AMD conditions. We also provide a brief review of the AMD literature and outline the needs for future detailed studies involving functional descriptors of the impact of AMD on aquatic ecosystems.

  12. Optimizing Live Digital Evidence Mining Using Structural Subroutines of Apriori Algorithm

    Akshay Zadgaonkar,


    Full Text Available The Scope and Complexity of the Internet has grown exponentially. This growth hasmade digital forensic investigation a very challenging task. Even the modest intra-organizationalnetworks have sufficient network traffic to pose a problem for digital crime investigators topolice and collect evidences. Another problem in Network based Crime Investigation is thatOffline Mining Techniques do not yield pervasive evidence. At the same time due to voluminoustraffic, live evidence mining becomes a challenge. This paper presents a technique to optimize thelive evidence mining by using the principles of apriori algorithm to trigger the evidence collectionmechanism at right and opportune moment. The crux of this technique is answering “When &What Information” to Collect about a subject of investigation or Data.

  13. Advances in Educational Data Mining Models and the Application of Its Algorithms

    Chi Zhang∗; Huan Yan; Ying Fu; Guofeng Han; Fan Feng


    In order to find an effective way to improve the quality of school management, finding valuable information from students’ original data and providing feedback for student management are necessary. Firstly, some new and successful educational data mining models were analyzed and compared. These models have better performance than traditional models ( such as Knowledge Tracing Model) in efficiency, comprehensiveness, ease of use, stability and so on. Then, the neural network algorithm was conducted to explore the feasibility of the application of educational data mining in student management, and the results show that it has enough predictive accuracy and reliability to be put into practice. In the end, the possibility and prospect of the application of educational data mining in teaching management system for university students was assessed.

  14. Force-Based Incremental Algorithm for Mining Community Structure in Dynamic Network

    Bo Yang; Da-You Liu


    Community structure is an important property of network. Being able to identify communities can provide invaluable help in exploiting and understanding both social and non-social networks. Several algorithms have been developed up till now. However, all these algorithms can work well only with small or moderate networks with vertexes of order 104.Besides, all the existing algorithms are off-line and cannot work well with highly dynamic networks such as web, in which web pages are updated frequently. When an already clustered network is updated, the entire network including original and incremental parts has to be recalculated, even though only slight changes are involved. To address this problem, an incremental algorithm is proposed, which allows for mining community structure in large-scale and dynamic networks. Based on the community structure detected previously, the algorithm takes little time to reclassify the entire network including both the original and incremental parts. Furthermore, the algorithm is faster than most of the existing algorithms such as Girvan and Newman's algorithm and its improved versions. Also, the algorithm can help to visualize these community structures in network and provide a new approach to research on the evolving process of dynamic networks.


    V. Ganesh Kumar


    Full Text Available As shopping becomes a shared experience and joint process with friends or family members nowadays, the most important problems arise with variety of products and the product information available in the supermarkets. This study proposes a system that uses Intelligent Apriori algorithm to support consumers in getting the required items from various supermarkets. Also this work intelligently suggests the best movement and reducing unwanted movement of the customer and quickly finds out the next operation which includes the next supermarket which is visited by the customer for the next item he/she purchases. This approach can further be extended to the world of mobile communication where the next movement of the mobile user can be predicted and used intelligently to arrange necessary requirements at the destination before he actually reaches. The feasibility of this approach is tested under simple conditions and the results are presented in this study.

  16. Comparison of arsenic co-precipitation and adsorption by iron minerals and the mechanism of arsenic natural attenuation in a mine stream.

    Park, Jin Hee; Han, Young-Soo; Ahn, Joo Sung


    Mine stream precipitate collected from Ilkwang mine, Korea, contained high concentrations of arsenic (As), while water collected from the same site had negligible As concentrations, indicating natural attenuation of As occurred in the mine stream. The mechanism of attenuation was explained by comparison of X-ray absorption near edge structure (XANES) of As(V) co-precipitated with or adsorbed to iron (Fe) minerals in mine precipitates. Arsenic in the mine precipitate was present as As(V) and schwertmannite was the main Fe mineral. Arsenic co-precipitation with schwertmannite was the major mechanism of As removal in the mine stream, followed by As adsorption by goethite and As co-precipitation with ferrihydrite. Schwertmannite and ferrihydrite were formed in acid mine drainage and As was incorporated in their structure during formation. Additionally, schwertmannite and ferrihydrite may transform to goethite with As adsorbed onto the goethite surface. Based on the results of batch experiments of As co-precipitation and adsorption, co-precipitation of As with ferrihydrite and schwertmannite was the most effective As sequestration mechanism in the removal of As(V) from acid mine drainage. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Slope orientation assessment for open-pit mines, using GIS-based algorithms

    Grenon, Martin; Laflamme, Amélie-Julie


    Standard stability analysis in geomechanical rock slope engineering for open-pit mines relies on a simplified representation of slope geometry, which does not take full advantage of available topographical data in the early design stages of a mining project; consequently, this may lead to nonoptimal slope design. The primary objective of this paper is to present a methodology that allows for the rigorous determination of interramp and bench face slope orientations on a digital elevation model (DEM) of a designed open pit. Common GIS slope algorithms were tested to assess slope orientations on the DEM of the Meadowbank mining project's Portage pit. Planar regression algorithms based on principal component analysis provided the best results at both the interramp and the bench face levels. The optimal sampling window for interramp was 21×21 cells, while a 9×9-cell window was best at the bench level. Subsequent slope stability analysis relying on those assessed slope orientations would provide a more realistic geometry for potential slope instabilities in the design pit. The presented methodology is flexible, and can be adapted depending on a given mine's block sizes and pit geometry.

  18. Implementation of Web Usage Mining Using APRIORI and FP Growth Algorithms

    B.Santhosh Kumar


    Full Text Available Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. Web usage mining itself can be classified further depending on the kind of usage data considered. They are web server data, application server data and application level data. Web server data correspond to the user logs that are collected at Web server. Some of the typical data collected at a Web server include IP addresses, page references, and access time of the users and is the main input to the present Research. This Research work concentrates on web usage mining and in particular focuses on discovering the web usage patterns of websites from the server log files. The comparison of memory usage and time usage is compared using Apriori algorithm and Frequent Pattern Growth algorithm.

  19. A Business Intelligence Model to Predict Bankruptcy using Financial Domain Ontology with Association Rule Mining Algorithm

    Martin, A; Venkatesan, Dr V Prasanna


    Today in every organization financial analysis provides the basis for understanding and evaluating the results of business operations and delivering how well a business is doing. This means that the organizations can control the operational activities primarily related to corporate finance. One way that doing this is by analysis of bankruptcy prediction. This paper develops an ontological model from financial information of an organization by analyzing the Semantics of the financial statement of a business. One of the best bankruptcy prediction models is Altman Z-score model. Altman Z-score method uses financial rations to predict bankruptcy. From the financial ontological model the relation between financial data is discovered by using data mining algorithm. By combining financial domain ontological model with association rule mining algorithm and Zscore model a new business intelligence model is developed to predict the bankruptcy.

  20. Analysis of Process Mining Model Using Frequentgroup Based Noise Filtering Algorithm

    V. Priyadharshini


    Full Text Available Process mining is a process management system used to analyze business processes based on event logs. The knowledge is extracted from event logs by using knowledge retrieval techniques. The process mining algorithms are capable of automatically discover models to give details of all the events registered in some log traces provided as input. The theory of regions is a valuable tool in process discovery: it aims at learning a formal model (Petri nets from a set of traces. The main objective of this paper is to propose new concept Frequentgroup based noise filtering algorithm. The experiment is done based on standard bench mark dataset HELIX and RALIC datasets. The performance of the proposed system is better than existing method. Keywords:

  1. Research on Algorithm for Mining Negative Association Rules Based on Frequent Pattern Tree


    Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP-tree is presented to discover negative association rules.

  2. Data mining of public building energy consumption based on Apriori algorithm

    Cao, Ling; Zhang, Jichang


    Aiming at the problem of ineffective use of large amount of energy consumption data in public buildings at present, this article proposes data mining to subentry measuring data collected by data center of construction committees in various provinces and cities by utilizing improved Apriori algorithm, to find out the relation between different air conditioning refrigerating system forms and unit area energy consumption of air conditioning of the same building types in the same region. It briefly introduces the basic idea and process of the Apriori algorithm, and preliminarily designs preprocessing of experimental data and result analysis method; the result after data mining can provide reference for energy conservation design of new buildings and energy conservation reconstruction of old buildings.

  3. Pair Triplet Association Rule Generation in Streams

    Manisha Thool


    Full Text Available Many applications involve the generation and analysis of a new kind of data, called stream data, where data flows in and out of an observation platform or window dynamically. Such data streams have the unique features such as huge or possibly infinite volume, dynamically changing, flowing in or out in a fixed order, allowing only one or a small number of scans. An important problem in data stream mining is that of finding frequent items in the stream. This problem finds application across several domains such as financial systems, web traffic monitoring, internet advertising, retail and e-business. This raises new issues that need to be considered when developing association rule mining technique for stream data. The Space-Saving algorithm reports both frequent and top-k elements with tight guarantees on errors. We also develop the notion of association rules in streams of elements. The Streaming-Rules algorithm is integrated with Space-Saving algorithm to report 1-1 association rules with tight guarantees on errors, using minimal space, and limited processing per element and we are using Apriori algorithm for static datasets and generation of association rules and implement Streaming-Rules algorithm for pair, triplet association rules. We compare the top- rules of static datasets with output of stream datasets and find percentage of error.

  4. GriMa: a Grid Mining Algorithm for Bag-of-Grid-Based Classification

    Deville, Romain; Fromont, Elisa; Jeudy, Baptiste; Solnon, Christine


    International audience; General-purpose exhaustive graph mining algorithms have seldom been used in real life contexts due to the high complexity of the process that is mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we explain how to exploit grid-based representations of problems to efficiently extract frequent grid subgraphs and create Bag-of-Grids which can be used as new features for classification purposes. We provide an efficient grid minin...

  5. Numerical computation algorithm of explosion equations and thermodynamics parameters of mine explosives

    李守巨; 刘迎曦; 何翔; 周圆π


    A new numerical algorithm is presented to simulate the explosion reaction process of mine explosives based on the equation of state, the equation of mass conservation and thermodynamics balance equation of explosion products. With the affection of reversible reaction of explosion products to explosion reaction equations and thermodynamics parameters considered, the computer program has been developed. The computation values show that computer simulation results are identical with the testinq ones.

  6. Numerical computation algorithm of explosion equations and thermodynamics parameters of mine explosives

    LI Shou-ju; LIU Ying-xi; HE Xiang; ZHOU Y uan-pai


    A new numerical algorithm is presented to simulate the explosion reacti on process of mine explosives based on the equation of state, the equation of ma ss conservation and thermodynamics balance equation of explosion products. With the affection of reversible reaction of explosion products to explosion reaction equations and thermodynamics parameters considered, the computer program has be en developed. The computation values show that computer simulation results are i dentical with the testing ones.

  7. Data Stream Clustering Algorithm Based on Neighborhood Covering%一种领域覆盖的数据流聚类算法

    章季阳; 王伦文


    Data stream clustering analysis is one of the key techniques in data stream mining. To meet the requirement of evolution and high-speed processing, a data stream clustering algorithm based on Neighborhood Covering is proposed, namely NCStream. By building Neighborhood Covering model, the proposed algorithm for the evolving procedure of data stream is defined and analyzed at length, including the adjustment, creation, deletion and mergence of covering cluster, and simultaneously maintain the cluster feature online. Compared with the similar clustering method, NCStream has no assignment in the number of cluster in advance, which avoids the disadvantage of clustering result due to parameter setting. Moreover, NCStream benefits the establishment of spatial index. Hence, the evolution of data stream is more effectively reflected. The experimental results on real wireless monitor data sets demonstrate that NCStream is of better performance in clustering shape, quality and processing time.%数据流聚类分析是数据流挖掘的重要手段之一.为满足数据流不断演化及高速处理的要求,提出一种领域覆盖的数据流聚类算法NCStream( Stream clustering algorithm based on Neighborhood Covering).该算法通过建立领域覆盖模型,详细定义和分析了数据流演化过程中覆盖簇调整、创建、删除和合并的行为操作,并同时对覆盖簇的聚类特征予以在线维护.与同类算法相比,NCStream算法无需事先指定聚类簇数,避免参数设置对聚类结果造成的影响,而且易于建立空间索引,因此能够更加有效地反映数据流的演化情况.实验采用无线电实际监测数据集构造数据流,实验结果表明NCStream算法在聚类形状、聚类质量以及处理时间方面具有更好的性能.


    Pardeep Kumar


    Full Text Available In today’s business scenario, we percept major changes in how managers use computerized support inmaking decisions. As more number of decision-makers use computerized support in decision making,decision support systems (DSS is developing from its starting as a personal support tool and is becomingthe common resource in an organization. DSS serve the management, operations, and planning levels of anorganization and help to make decisions, which may be rapidly changing and not easily specified inadvance. Data mining has a vital role to extract important information to help in decision making of adecision support system. It has been the active field of research in the last two-three decades. Integration ofdata mining and decision support systems (DSS can lead to the improved performance and can enable thetackling of new types of problems. Artificial Intelligence methods are improving the quality of decisionsupport, and have become embedded in many applications ranges from ant locking automobile brakes tothese days interactive search engines. It provides various machine learning techniques to support datamining. The classification is one of the main and valuable tasks of data mining. Several types ofclassification algorithms have been suggested, tested and compared to determine the future trends based onunseen data. There has been no single algorithm found to be superior over all others for all data sets.Various issues such as predictive accuracy, training time to build the model, robustness and scalabilitymust be considered and can have tradeoffs, further complex the quest for an overall superior method. Theobjective of this paper is to compare various classification algorithms that have been frequently used indata mining for decision support systems. Three decision trees based algorithms, one artificial neuralnetwork, one statistical, one support vector machines with and without adaboost and one clusteringalgorithm are tested and compared on

  9. Gas Emission Prediction Model of Coal Mine Based on CSBP Algorithm

    Xiong Yan


    Full Text Available In view of the nonlinear characteristics of gas emission in a coal working face, a prediction method is proposed based on cuckoo search algorithm optimized BP neural network (CSBP. In the CSBP algorithm, the cuckoo search is adopted to optimize weight and threshold parameters of BP network, and obtains the global optimal solutions. Furthermore, the twelve main affecting factors of the gas emission in the coal working face are taken as input vectors of CSBP algorithm, the gas emission is acted as output vector, and then the prediction model of BP neural network with optimal parameters is established. The results show that the CSBP algorithm has batter generalization ability and higher prediction accuracy, and can be utilized effectively in the prediction of coal mine gas emission.

  10. A Novel Approach for Discovery Quantitative Fuzzy Multi-Level Association Rules Mining Using Genetic Algorithm

    Saad M. Darwish


    Full Text Available Quantitative multilevel association rules mining is a central field to realize motivating associations among data components with multiple levels abstractions. The problem of expanding procedures to handle quantitative data has been attracting the attention of many researchers. The algorithms regularly discretize the attribute fields into sharp intervals, and then implement uncomplicated algorithms established for Boolean attributes. Fuzzy association rules mining approaches are intended to defeat such shortcomings based on the fuzzy set theory. Furthermore, most of the current algorithms in the direction of this topic are based on very tiring search methods to govern the ideal support and confidence thresholds that agonize from risky computational cost in searching association rules. To accelerate quantitative multilevel association rules searching and escape the extreme computation, in this paper, we propose a new genetic-based method with significant innovation to determine threshold values for frequent item sets. In this approach, a sophisticated coding method is settled, and the qualified confidence is employed as the fitness function. With the genetic algorithm, a comprehensive search can be achieved and system automation is applied, because our model does not need the user-specified threshold of minimum support. Experiment results indicate that the recommended algorithm can powerfully generate non-redundant fuzzy multilevel association rules.

  11. Natural decrease of dissolved arsenic in a small stream receiving drainages of abandoned silver mines in Guanajuato, Mexico.

    Arroyo, Yann Rene Ramos; Muñoz, Alma Hortensia Serafín; Barrientos, Eunice Yanez; Huerta, Irais Rodriguez; Wrobel, Kazimierz; Wrobel, Katarzyna


    Arsenic release from the abandoned mines and its fate in a local stream were studied. Physicochemical parameters, metals/metalloids and arsenic species were determined. One of the mine drainages was found as a point source of contamination with 309 μg L(-1) of dissolved arsenic; this concentration declined rapidly to 10.5 μg L(-1) about 2 km downstream. Data analysis confirmed that oxidation of As(III) released from the primary sulfide minerals was favored by the increase of pH and oxidation reduction potential; the results obtained in multivariate approach indicated that self-purification of water was due to association of As(V) with secondary solid phase containing Fe, Mn, Ca.

  12. Occurrence and transport of selected constituents in streams near the Stibnite mining area, Central Idaho, 2012–14

    Etheridge, Alexandra B.


    Mining of stibnite (antimony sulfide), tungsten, gold, silver, and mercury near the town of Stibnite in central Idaho has left a legacy of trace element contamination in local streams. Water-quality and streamflow monitoring data from a network of five streamflow-gaging stations were used to estimate trace-element and suspended-sediment loads and flow-weighted concentrations in the Stibnite mining area between 2012 and 2014. Measured concentrations of arsenic exceeded human health-based water-quality criteria at each streamflow-gaging station, except for Meadow Creek (site 2), which was selected to represent background conditions in the study area. Measured concentrations of antimony exceeded human health-based water-quality criteria at sites 3, 4, and 5.

  13. Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining

    P. Kalaivani


    Full Text Available With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.

  14. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream

    Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katie


    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  15. A dietary assessment of selenium risk to aquatic birds on a coal mine affected stream in Alberta, Canada

    Wayland, M.; Casey, R.; Woodsworth, E. [Environmental Canada, Saskatoon, SK (Canada)


    In this article, we present the results of a dietary-based assessment of the risk that selenium may pose to two aquatic bird species, the American Dipper (Cinclus mexicanus) and the Harlequin Duck (Histrionicus histrionicus), on one of the coal mine-affected streams, the Gregg River. The study consisted of (1) a literature-based toxicity assessment, (2) simulation of selenium exposure in the diets and eggs of the two species, and (3) a risk assessment that coupled information on toxicity and exposure. Diet and egg selenium concentrations associated with a 20% hatch failure rate were 6.4 and 17 {mu} g {center_dot} g{sup -1} dry wt, respectively. Simulated dietary selenium concentrations were about 2.0-2.5 {mu} g {center_dot} g{sup -1} higher on the Gregg River than on reference streams for both species. When simulated dietary concentrations were considered, hatch failure rates on the Gregg River were predicted to average 12% higher in American Dippers and 8% higher in Harlequin Ducks than at reference streams. Corresponding values were only 3% for both species when predicted egg concentrations were used. Elevated levels of selenium in insects in some of the reference streams were unexpected and raised a question as to whether aquatic birds have evolved a higher tolerance level for dietary selenium in these areas.

  16. Acid mine pollution: effects on survival, reproduction and aging of stream bottom microinvertebrates. Completion report

    Hummon, W.D.


    Warbug manometry was used to assess the effect of acid mine water on respiratory processes in three species of aquatic insect larvae. Field collections and laboratory toxicity tests indicated short longevity under strong acid mine conditions. Mixed results were found with respect to weight-dependent respiratory rates. Sequential respiration determinations, under control-control or control-treatment fluids, indicated that acid mine water did not consistently alter rates. Animals maintained in mine water until death showed gradual decreases in respiratory rates over time, rather than stepwise drops that would accompany ionic interference. For these species the toxic mode of action of acid mine water does not appear to operate through mechanisms that are detectable by respirometry.

  17. Water quality of streams draining abandoned and reclaimed mined lands in the Kantishna Hills area, Denali National Park and Preserve, Alaska, 2008–11

    Brabets, Timothy P.; Ourso, Robert T.


    The Kantishna Hills are an area of low elevation mountains in the northwest part of Denali National Park and Preserve, Alaska. Streams draining the Kantishna Hills are clearwater streams that support several species of fish and are derived from rain, snowmelt, and subsurface aquifers. However, the water quality of many of these streams has been degraded by mining. Past mining practices generated acid mine drainage and excessive sediment loads that affected water quality and aquatic habitat. Because recovery through natural processes is limited owing to a short growing season, several reclamation projects have been implemented on several streams in the Kantishna Hills region. To assess the current water quality of streams in the Kantishna Hills area and to determine if reclamation efforts have improved water quality, a cooperative study between the U.S. Geological Survey and the National Park Service was undertaken during 2008-11. High levels of turbidity, an indicator of high concentrations of suspended sediment, were documented in water-quality data collected in the mid-1980s when mining was active. Mining ceased in 1985 and water-quality data collected during this study indicate that levels of turbidity have declined significantly. Turbidity levels generally were less than 2 Formazin Nephelometric Units and suspended sediment concentrations generally were less than 1 milligram per liter during the current study. Daily turbidity data at Rock Creek, an unmined stream, and at Caribou Creek, a mined stream, documented nearly identical patterns of turbidity in 2009, indicating that reclamation as well as natural revegetation in mined streams has improved water quality. Specific conductance and concentrations of dissolved solids and major ions were highest from streams that had been mined. Most of these streams flow into Moose Creek, which functions as an integrator stream, and dilutes the specific conductance and ion concentrations. Calcium and magnesium are the

  18. STREAM

    Godsk, Mikkel

    This paper presents a flexible model, ‘STREAM’, for transforming higher science education into blended and online learning. The model is inspired by ideas of active and collaborative learning and builds on feedback strategies well-known from Just-in-Time Teaching, Flipped Classroom, and Peer...... Instruction. The aim of the model is to provide both a concrete and comprehensible design toolkit for adopting and implementing educational technologies in higher science teaching practice and at the same time comply with diverse ambitions. As opposed to the above-mentioned feedback strategies, the STREAM...

  19. A fast calculating two-stream-like multiple scattering algorithm that captures azimuthal and elevation variations

    Fiorino, Steven T.; Elmore, Brannon; Schmidt, Jaclyn; Matchefts, Elizabeth; Burley, Jarred L.


    Properly accounting for multiple scattering effects can have important implications for remote sensing and possibly directed energy applications. For example, increasing path radiance can affect signal noise. This study describes the implementation of a fast-calculating two-stream-like multiple scattering algorithm that captures azimuthal and elevation variations into the Laser Environmental Effects Definition and Reference (LEEDR) atmospheric characterization and radiative transfer code. The multiple scattering algorithm fully solves for molecular, aerosol, cloud, and precipitation single-scatter layer effects with a Mie algorithm at every calculation point/layer rather than an interpolated value from a pre-calculated look-up-table. This top-down cumulative diffusivity method first considers the incident solar radiance contribution to a given layer accounting for solid angle and elevation, and it then measures the contribution of diffused energy from previous layers based on the transmission of the current level to produce a cumulative radiance that is reflected from a surface and measured at the aperture at the observer. Then a unique set of asymmetry and backscattering phase function parameter calculations are made which account for the radiance loss due to the molecular and aerosol constituent reflectivity within a level and allows for a more accurate characterization of diffuse layers that contribute to multiple scattered radiances in inhomogeneous atmospheres. The code logic is valid for spectral bands between 200 nm and radio wavelengths, and the accuracy is demonstrated by comparing the results from LEEDR to observed sky radiance data.

  20. Competitive evaluation of data mining algorithms for use in classification of leukocyte subtypes with Raman microspectroscopy.

    Maguire, A; Vega-Carrascal, I; Bryant, J; White, L; Howe, O; Lyng, F M; Meade, A D


    Raman microspectroscopy has been investigated for some time for use in label-free cell sorting devices. These approaches require coupling of the Raman spectrometer to complex data mining algorithms for identification of cellular subtypes such as the leukocyte subpopulations of lymphocytes and monocytes. In this study, three distinct multivariate classification approaches, (PCA-LDA, SVMs and Random Forests) are developed and tested on their ability to classify the cellular subtype in extracted peripheral blood mononuclear cells (T-cell lymphocytes from myeloid cells), and are evaluated in terms of their respective classification performance. A strategy for optimisation of each of the classification algorithm is presented with emphasis on reduction of model complexity in each of the algorithms. The relative classification performance and performance characteristics are highlighted, overall suggesting the radial basis function SVM as a robust option for classification of leukocytes with Raman microspectroscopy.

  1. A hybrid GA-TS algorithm for open vehicle routing optimization of coal mines material

    Yu, S.W.; Ding, C.; Zhu, K.J. [China University of Geoscience, Wuhan (China)


    In the open vehicle routing problem (OVRP), the objective is to minimize the number of vehicles and the total distance (or time) traveled. This study primarily focuses on solving an open vehicle routing problem (OVRP) by applying a novel hybrid genetic algorithm and the Tabu search (GA-TS), which combines the GA's parallel computing and global optimization with TS's Tabu search skill and fast local search. Firstly, the proposed algorithm uses natural number coding according to the customer demands and the captivity of the vehicle for globe optimization. Secondly, individuals of population do TS local search with a certain degree of probability, namely, do the local routing optimization of all customer sites belong to one vehicle. The mechanism not only improves the ability of global optimization, but also ensures the speed of operation. The algorithm was used in Zhengzhou Coal Mine and Power Supply Co., Ltd.'s transport vehicle routing optimization.


    Li Haiying; Zhuang Zhenquan; Li Bin; Wan Ke


    In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.

  3. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Yang Ou


    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  4. Denitrification potential in stream sediments impacted by acid mine drainage: effects of pH, various electron donors, and iron.

    Baeseman, J L; Smith, R L; Silverstein, J


    Acid mine drainage (AMD) contaminates thousands of kilometers of stream in the western United States. At the same time, nitrogen loading to many mountain watersheds is increasing because of atmospheric deposition of nitrate and increased human use. Relatively little is known about nitrogen cycling in acidic, heavy-metal-laden streams; however, it has been reported that one key process, denitrification, is inhibited under low pH conditions. The objective of this research was to investigate the capacity for denitrification in acidified streams. Denitrification potential was assessed in sediments from several Colorado AMD-impacted streams, ranging from pH 2.60 to 4.54, using microcosm incubations with fresh sediment. Added nitrate was immediately reduced to nitrogen gas without a lag period, indicating that denitrification enzymes were expressed and functional in these systems. First-order denitrification potential rate constants varied from 0.046 to 2.964 day(-1). The pH of the microcosm water increased between 0.23 and 1.49 pH units during denitrification. Additional microcosm studies were conducted to examine the effects of initial pH, various electron donors, and iron (added as ferrous and ferric iron). Decreasing initial pH decreased denitrification; however, increasing pH had little effect on denitrification rates. The addition of ferric and ferrous iron decreased observed denitrification potential rate constants. The addition of glucose and natural organic matter stimulated denitrification potential. The addition of hydrogen had little effect, however, and denitrification activity in the microcosms decreased after acetate addition. These results suggest that denitrification can occur in AMD streams, and if stimulated within the environment, denitrification might reduce acidity.

  5. Denitrification potential in stream sediments impacted by acid mine drainage: Effects of pH, various electron donors, and iron

    Baeseman, J.L.; Smith, R.L.; Silverstein, J.


    Acid mine drainage (AMD) contaminates thousands of kilometers of stream in the western United States. At the same time, nitrogen loading to many mountain watersheds is increasing because of atmospheric deposition of nitrate and increased human use. Relatively little is known about nitrogen cycling in acidic, heavy-metal-laden streams; however, it has been reported that one key process, denitrification, is inhibited under low pH conditions. The objective of this research was to investigate the capacity for denitrification in acidified streams. Denitrification potential was assessed in sediments from several Colorado AMD-impacted streams, ranging from pH 2.60 to 4.54, using microcosm incubations with fresh sediment. Added nitrate was immediately reduced to nitrogen gas without a lag period, indicating that denitrification enzymes were expressed and functional in these systems. First-order denitrification potential rate constants varied from 0.046 to 2.964 day-1. The pH of the microcosm water increased between 0.23 and 1.49 pH units during denitrification. Additional microcosm studies were conducted to examine the effects of initial pH, various electron donors, and iron (added as ferrous and ferric iron). Decreasing initial pH decreased denitrification; however, increasing pH had little effect on denitrification rates. The addition of ferric and ferrous iron decreased observed denitrification potential rate constants. The addition of glucose and natural organic matter stimulated denitrification potential. The addition of hydrogen had little effect, however, and denitrification activity in the microcosms decreased after acetate addition. These results suggest that denitrification can occur in AMD streams, and if stimulated within the environment, denitrification might reduce acidity. ?? Springer Science+Business Media, Inc. 2006.

  6. A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

    Ge Song


    Full Text Available Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM. The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs. An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.

  7. A Hybrid Algorithm of Traffic Accident Data Mining on Cause Analysis

    Jianfeng Xi


    Full Text Available Road traffic accident databases provide the basis for road traffic accident analysis, the data inside which usually has a radial, multidimensional, and multilayered structure. Traditional data mining algorithms such as association rules, when applied alone, often yield uncertain and unreliable results. An improved association rule algorithm based on Particle Swarm Optimization (PSO put forward by this paper can be used to analyze the correlation between accident attributes and causes. The new algorithm focuses on characteristics of the hyperstereo structure of road traffic accident data, and the association rules of accident causes can be calculated more accurately and in higher rates. A new concept of Association Entropy is also defined to help compare the importance between different accident attributes. T-test model and Delphi method were deployed to test and verify the accuracy of the improved algorithm, the result of which was a ten times faster speed for random traffic accident data sampling analyses on average. In the paper, the algorithms were tested on a sample database of more than twenty thousand items, each with 56 accident attributes. And the final result proves that the improved algorithm was accurate and stable.

  8. Environmental Geochemistry of Heavy Metal Contaminants in Soil and Stream Sediment in Panzhihua Mining and Smelting Area,Southwestern China

    滕彦国; 庹先国; 倪师军; 张成江; 徐争启


    Mining and smelting activities are the main causes for the increasing pollution ofheavy metals in soil, water body and stream sediment. An environmental geochemical investiga-tion was carried out in and around the Panzhihua mining and smelting area to determine the ex-tent of chemical contamination in soil and sediment. The main objective of this study was to in-vestigate the environmental geochemistry of Ti, V, Cr, Mn, Cu, Pb, Zn and As in soil andsediment and to assess the degree of pollution in the study area. The data of heavy metal con-centrations reveal that soils and sediments in the area have been slightly contaminated. Geo-chemical maps of Igeo of each heavy metal show that the contaminated sites are located in V-Ti-magnetite sloping and smelting, gangues dam. The pollution sources of the selected elementscome mainly from dusts resultant from mining activities and other three-waste-effluents. The areaneeds to be monitored regularly for trace metal, especially heavy metal enrichment.

  9. Streamflow, water-quality, and biological data on streams in an area of longwall coal mining, southern Ohio, water years 1987-89

    Coen, A. W.


    This report presents data on the first 3 years of a 5-year study of the effects of longwall coal mining on six streams near a mining complex in Meigs, Gallia, and Vinton Counties, Ohio. Longwall coal mining is method of underground mining in which 75 to 90 percent of the coal is removed; conventional methods, such as room-and-pillar mining, remove only about 50 percent of the coal. Use of the longwall method is expected to increase in Ohio. Collapse or subsidence of the overburden and land surface occurs immediately after the removal of the coal. Such collapse can disrupt surface drainage and the recharge of ground water. The data include streamflow, water quality, and the abundance and diversity of aquatic macroinvertebrates and fish. The data were collected from eight sites on six streams from July 1987 through September 1989. The drainage areas of these sites range from 2.04 to 80.8 square miles and include the major drainages of the area being mined. Total precipitation in 1987 and 1988 in the study area was 78 and 81 percent, respectively, of the annual average (from 1939 to 1989) of 39.59 inches. The total precipitation in 1989 was 135 percent of the annual average. Streams at six of the eight sites were dry for parts of the first 2 years. Specific conductance ranged from 180 to 3,500 microsiemens per centimeter at 25 degrees Celsius, pH ranged from 6.9 to 8.0, and the concentration of total recoverable iron ranged from 80 to 1,800 micrograms per liter. Macroinvertebrate and fish populations indicate a warmwater-habitat rating of fair to good according to Ohio Environmental Protection Agency standards. This information will help provide a data base from which the effects of longwall mining on streams in southern Ohio can be evaluated. Correlations of surface-water quality and quantity with longwall mining were not attempted in this study.

  10. A Fast Algorithm for Finding Point Sources in the Fermi Data Stream: FermiFAST

    Ashathaman, Asha; Heyl, Jeremy S


    This paper presents a new and efficient algorithm for finding point sources in the photon event data stream from the Fermi Gamma-Ray Space Telescope. It can rapidly construct about most significant half of the Fermi Third Point Source catalogue (3FGL) with nearly 80% purity from the four years of data used to construct the catalogue. If a higher purity sample is desirable, one can achieve a sample that includes the most significant third of the Fermi 3FGL with only five percent of the sources unassociated with Fermi sources. Outside the galaxy plane, the contamination is essentially negligible. This software allows for rapid exploration of the Fermi data, simulation of the source detection to calculate the selection function of various sources and the errors in the obtained parameters of the sources detected.

  11. The STRatospheric Estimation Algorithm from Mainz (STREAM): estimating stratospheric NO2 from nadir-viewing satellites by weighted convolution

    Beirle, Steffen; Hörmann, Christoph; Jöckel, Patrick; Liu, Song; Penning de Vries, Marloes; Pozzer, Andrea; Sihler, Holger; Valks, Pieter; Wagner, Thomas


    The STRatospheric Estimation Algorithm from Mainz (STREAM) determines stratospheric columns of NO2 which are needed for the retrieval of tropospheric columns from satellite observations. It is based on the total column measurements over clean, remote regions as well as over clouded scenes where the tropospheric column is effectively shielded. The contribution of individual satellite measurements to the stratospheric estimate is controlled by various weighting factors. STREAM is a flexible and robust algorithm and does not require input from chemical transport models. It was developed as a verification algorithm for the upcoming satellite instrument TROPOMI, as a complement to the operational stratospheric correction based on data assimilation. STREAM was successfully applied to the UV/vis satellite instruments GOME 1/2, SCIAMACHY, and OMI. It overcomes some of the artifacts of previous algorithms, as it is capable of reproducing gradients of stratospheric NO2, e.g., related to the polar vortex, and reduces interpolation errors over continents. Based on synthetic input data, the uncertainty of STREAM was quantified as about 0.1-0.2 × 1015 molecules cm-2, in accordance with the typical deviations between stratospheric estimates from different algorithms compared in this study.


    Yukinobu Fukushima


    Full Text Available Minimum Physical Hop (MPH has been proposed as a peer selection algorithm for decreasing inter-AS (Autonomous System traffic volume in P2P live streaming. In MPH, a newly joining peer selects a peer whose physical hop count (i.e., the number of ASes traversed on the content delivery path from it is the minimum as its providing peer. However, MPH shows high inter-AS traffic volume when the number of joining peers is large. In this paper, we propose IMPH that tries to further decrease the inter-AS traffic volume by distributing peers with one logical hop count (i.e., the number of peers or origin streaming servers (OSSes traversed on the content delivery path from an OSS to the peer to many ASes and encouraging the following peers to find their providing peers within the same AS. Numerical examples show that IMPH achieves at the maximum of 64% lower inter-AS traffic volume than MPH.

  13. Use of NTRIP for Optimizing the Decoding Algorithm for Real-Time Data Streams

    Zhanke He


    Full Text Available As a network transmission protocol, Networked Transport of RTCM via Internet Protocol (NTRIP is widely used in GPS and Global Orbiting Navigational Satellite System (GLONASS Augmentation systems, such as Continuous Operational Reference System (CORS, Wide Area Augmentation System (WAAS and Satellite Based Augmentation Systems (SBAS. With the deployment of BeiDou Navigation Satellite system(BDS to serve the Asia-Pacific region, there are increasing needs for ground monitoring of the BeiDou Navigation Satellite system and the development of the high-precision real-time BeiDou products. This paper aims to optimize the decoding algorithm of NTRIP Client data streams and the user authentication strategies of the NTRIP Caster based on NTRIP. The proposed method greatly enhances the handling efficiency and significantly reduces the data transmission delay compared with the Federal Agency for Cartography and Geodesy (BKG NTRIP. Meanwhile, a transcoding method is proposed to facilitate the data transformation from the BINary EXchange (BINEX format to the RTCM format. The transformation scheme thus solves the problem of handing real-time data streams from Trimble receivers in the BeiDou Navigation Satellite System indigenously developed by China.

  14. Use of NTRIP for Optimizing the Decoding Algorithm for Real-Time Data Streams

    He, Zhanke; Tang, Wenda; Yang, Xuhai; Wang, Liming; Liu, Jihua


    As a network transmission protocol, Networked Transport of RTCM via Internet Protocol (NTRIP) is widely used in GPS and Global Orbiting Navigational Satellite System (GLONASS) Augmentation systems, such as Continuous Operational Reference System (CORS), Wide Area Augmentation System (WAAS) and Satellite Based Augmentation Systems (SBAS). With the deployment of BeiDou Navigation Satellite system (BDS) to serve the Asia-Pacific region, there are increasing needs for ground monitoring of the BeiDou Navigation Satellite system and the development of the high-precision real-time BeiDou products. This paper aims to optimize the decoding algorithm of NTRIP Client data streams and the user authentication strategies of the NTRIP Caster based on NTRIP. The proposed method greatly enhances the handling efficiency and significantly reduces the data transmission delay compared with the Federal Agency for Cartography and Geodesy (BKG) NTRIP. Meanwhile, a transcoding method is proposed to facilitate the data transformation from the BINary EXchange (BINEX) format to the RTCM format. The transformation scheme thus solves the problem of handing real-time data streams from Trimble receivers in the BeiDou Navigation Satellite System indigenously developed by China. PMID:25310474

  15. pH dependence of iron photoreduction in a rocky mountain stream affected by acid mine drainage

    McKnight, Diane M.; Kimball, B.A.; Runkel, R.L.


    The redox speciation of dissolved iron and the transport of iron in acidic, metal-enriched streams is controlled by precipitation and dissolution of iron hydroxides, by photoreduction of dissolved ferric iron and hydrous iron oxides, and by oxidation of the resulting dissolved ferrous iron. We examined the pH dependence of these processes in an acidic mine-drainage stream, St Kevin Gulch, Colorado, by experimentally increasing the pH of the stream from about 4.0 to 6.5 and following the downstream changes in iron species. We used a solute transport model with variable flow to evaluate biogeochemical processes controlling downstream transport. We found that at pH 6.4 there was a rapid and large initial loss of ferrous iron concurrent with the precipitation of aluminium hydroxide. Below this reach, ferrous iron was conservative during the morning but there was a net downstream loss of ferrous iron around noon and in the afternoon. Calculation of net oxidation rates shows that the noontime loss rate was generally much faster than rates for the ferrous iron oxidation at pH 6 predicted by Singer and Stumm (1970. Science 167: 1121). The maintenance of ferrous iron concentrations in the morning is explained by the photoreduction of photoreactive ferric species, which are then depleted by noon. Copyright ?? 2001 John Wiley & Sons, Ltd.

  16. Water quality changes in acid mine drainage streams in Gangneung, Korea, 10 years after treatment with limestone

    Shim, Moo Joon; Choi, Byoung Young; Lee, Giehyeon; Hwang, Yun Ho; Yang, Jung-Seok; O' Loughlin, Edward J.; Kwon, Man Jae


    To determine the long-term effectiveness of the limestone treatment for acid mine drainage (AMD) in Gangneung, Korea, we investigated the elemental distribution in streams impacted by AMD and compared the results of previous studies before and approximately 10 years after the addition of limestone. Addition of limestone in 1999 leads to a pH increase in 2008, and with the exception of Ca, the elemental concentrations (e.g., Fe, Mn, Mg, Sr, Ni, Zn, S) in the streams decreased. The pH was 2.5–3 before the addition of limestone and remained stable at around 4.5–5 from 2008 to 2011, suggesting the reactivity of the added limestone was diminished and that an alternative approach is needed to increase the pH up to circumneutral range and maintain effective long-term treatment. To identify the processes causing the decrease in the elemental concentrations, we also examined the spatial (approximately 7 km) distribution over three different types of streams affected by the AMD. The elemental distribution was mainly controlled by physicochemical processes including redox reactions, dilution on mixing, and co-precipitation/adsorption with Fe (hydr)oxides.

  17. Modified Structural and Attribute Clustering Algorithm for Improving Cluster Quality in Data Mining: A Quality Oriented Approach

    G. Abel Thangaraja


    Full Text Available The need of Data mining is because of the explosive growth of data from terabytes to petabytes. Data mining preprocess aims to produce the quality mining result in descriptive and predictive analysis. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. A straightforward way to combine structural and attribute similarities is to use a weighted distance function. Clustering results are arrived based on attribute similarities. The clusters balance the attribute and structural similarities. The existing Structural and Attribute cluster algorithm is analyzed and a new algorithm is proposed. Both the algorithms are compared and results are analyzed. It is found that the modified algorithm gives better quality clusters.

  18. EM&AA: An Algorithm for Predicting the Course Selection by Student in e-Learning Using Data Mining Techniques

    Aher, Sunita B.


    Recommendation systems have been widely used in internet activities whose aim is to present the important and useful information to the user with little effort. Course Recommendation System is system which recommends to students the best combination of courses in engineering education system e.g. if student is interested in course like system programming then he would like to learn the course entitled compiler construction. The algorithm with combination of two data mining algorithm i.e. combination of Expectation Maximization Clustering and Apriori Association Rule Algorithm have been developed. The result of this developed algorithm is compared with Apriori Association Rule Algorithm which is an existing algorithm in open source data mining tool Weka.

  19. Hybrid ants-like search algorithms for P2P media streaming distribution in ad hoc networks


    Media streaming delivery in wireless ad hoc networks is challenging due to the stringent resource restrictions, potential high loss rate and the decentralized architecture. To support long and high-quality streams, one viable approach is that a media stream is partitioned into segments, and then the segments are replicated in a network and served in a peer-to-peer (P2P)fashion. However, the searching strategy for segments is one key problem with the approach. This paper proposes a hybrid ants-like search algorithm (HASA) for P2P media streaming distribution in ad hoc networks. It takes the advantages of random walks and ants-like algorithms for searching in unstructured P2P networks, such as low transmitting latency, less jitter times, and low unnecessary traffic. We quantify the performance of our scheme in terms of response time, jitter times, and network messages for media streaming distribution. Simulation results showed that it can effectively improve the search efficiency for P2P media streaming distribution in ad hoc networks.

  20. Reduction of Negative and Positive Association Rule Mining and Maintain Superiority of Rule Using Modified Genetic Algorithm

    Nikhil Jain,Vishal Sharma,Mahesh Malviya


    Full Text Available Association rule mining play important rule inmarket data analysis and also in medical diagnosisof correlated problem. For the generation ofassociation rule mining various technique are usedsuch as Apriori algorithm, FP-growth and treebased algorithm. Some algorithms are wonderperformance but generate negative association ruleand also suffered from Superiority measureproblem. In this paper we proposed a multi-objectiveassociation rule mining based on genetic algorithmand Euclidean distance formula. In this method wefind the near distance of rule set using Euclideandistance formula and generate two class higherclass and lower class .the validate of class check bydistance weight vector. Basically distance weightvector maintain a threshold value of rule itemsets.In whole process we used genetic algorithm foroptimization of rule set. Here we set population sizeis 1000 and selection process validate by distanceweight vector. Our proposed algorithm distanceweight optimization of association rule mining withgenetic algorithm compared with multi-objectiveassociation rule optimization using geneticalgorithm. Our proposed algorithm is better rule setgeneration instead of MORA method.


    LiHaiying; ZuangZhenquan; 等


    In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site.The algo-fithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value[0,1] input and self-definition vigilance parameter to design clustering-architecture.Vector Degree of Matching(VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic.Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99.This non-linear relation between vigilance parameter and classification upper limit helps mining out representa-tive classifications from net-users according to the actural web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and reapidly.

  2. An overview of data mining algorithms in drug induced toxicity prediction.

    Omer, Ankur; Singh, Poonam; Yadav, N K; Singh, R K


    The growth in chemical diversity has increased the need to adjudicate the toxicity of different chemical compounds raising the burden on the demand of animal testing. The toxicity evaluation requires time consuming and expensive undertaking, leading to the deprivation of the methods employed for screening chemicals pointing towards the need to develop more efficient toxicity assessment systems. Computational approaches have reduced the time as well as the cost for evaluating the toxicity and kinetic behavior of any chemical. The accessibility of a large amount of data and the intense need of turning this data into useful information have attracted the attention towards data mining. Machine Learning, one of the powerful data mining techniques has evolved as the most effective and potent tool for exploring new insights on combinatorial relationships among various experimental data generated. The article accounts on some sophisticated machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machine (SVM), k-mean clustering and Self Organizing Maps (SOM) with some of the available tools used for classification, sorting and toxicological evaluation of data, clarifying, how data mining and machine learning interact cooperatively to facilitate knowledge discovery. Addressing the association of some commonly used expert systems, we briefly outline some real world applications to consider the crucial role of data set partitioning.

  3. An Integrated Framework to Access and Mine Distributed Heterogeneous Data Streams with Uncertainty


    journal.pone.0054215 Xiaoxiao Shi, Jean-Francois Paiement, David Grangier , Philip S. Yu. GBC: Gradient boosting consensus model for heterogeneous data...Shi, J. Paiement, and D. Grangier , and P.S. Yu, "GBC: Gradient Boosting Consensus Model for Heterogeneous Data", Statistical Analysis and Data Mining

  4. Hydrolytic activity and metabolism of sediment and epilithon in streams draining mountaintop removal coal mining, West Virginia, U.S.A.

    Mountaintop removal and valley filling (MTR/VF) is a method of coal mining used in the Central Appalachians. Regulations require that potential impacts to stream functions must be considered when determining the compensatory mitigation necessary for replacing aquatic resources un...

  5. Assessing Lost Ecosystem Service Benefits Due to Mining-Induced Stream Degradation in the Appalachian Region: Economic Approaches to Valuing Recreational Fishing Impacts

    Sport fishing is a popular activity for Appalachian residents and visitors. The region’s coldwater streams support a strong regional outdoor tourism industry. We examined the influence of surface coal mining, in the context of other stressors, on freshwater sport fishing in...

  6. Stream-sediment geochemistry in mining-impacted streams : sediment mobilized by floods in the Coeur d'Alene-Spokane River system, Idaho and Washington

    Box, Stephen E.; Bookstrom, Arthur A.; Ikramuddin, Mohammed


    Environmental problems associated with the dispersion of metal-enriched sediment into the Coeur d'Alene-Spokane River system downstream from the Coeur d'Alene Mining District in northern Idaho have been a cause of litigation since 1903, 18 years after the initiation of mining for lead, zinc, and silver. Although direct dumping of waste materials into the river by active mining operations stopped in 1968, metal-enriched sediment continues to be mobilized during times of high runoff and deposited on valley flood plains and in Coeur d'Alene Lake (Horowitz and others, 1993). To gauge the geographic and temporal variations in the metal contents of flood sediment and to provide constraints on the sources and processes responsible for those variations, we collected samples of suspended sediment and overbank deposits during and after four high-flow events in 1995, 1996, and 1997 in the Coeur d'Alene-Spokane River system with estimated recurrence intervals ranging from 2 to 100 years. Suspended sediment enriched in lead, zinc, silver, antimony, arsenic, cadmium, and copper was detected over a distance of more than 130 mi (the downstream extent of sampling) downstream of the mining district. Strong correlations of all these elements in suspended sediment with each other and with iron and manganese are apparent when samples are grouped by reach (tributaries to the South Fork of the Coeur d'Alene River, the South Fork of the Coeur d'Alene River, the main stem of the Coeur d'Alene River, and the Spokane River). Elemental correlations with iron and manganese, along with observations by scanning electron microscopy, indicate that most of the trace metals are associated with Fe and Mn oxyhydroxide compounds. Changes in elemental correlations by reach suggest that the sources of metal-enriched sediment change along the length of the drainage. Metal contents of suspended sediment generally increase through the mining district along the South Fork of the Coeur d'Alene River, decrease

  7. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks


    Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism

  8. An Optimal Pull-Push Scheduling Algorithm Based on Network Coding for Mesh Peer-to-Peer Live Streaming

    Cui, Laizhong; Jiang, Yong; Wu, Jianping; Xia, Shutao

    Most large-scale Peer-to-Peer (P2P) live streaming systems are constructed as a mesh structure, which can provide robustness in the dynamic P2P environment. The pull scheduling algorithm is widely used in this mesh structure, which degrades the performance of the entire system. Recently, network coding was introduced in mesh P2P streaming systems to improve the performance, which makes the push strategy feasible. One of the most famous scheduling algorithms based on network coding is R2, with a random push strategy. Although R2 has achieved some success, the push scheduling strategy still lacks a theoretical model and optimal solution. In this paper, we propose a novel optimal pull-push scheduling algorithm based on network coding, which consists of two stages: the initial pull stage and the push stage. The main contributions of this paper are: 1) we put forward a theoretical analysis model that considers the scarcity and timeliness of segments; 2) we formulate the push scheduling problem to be a global optimization problem and decompose it into local optimization problems on individual peers; 3) we introduce some rules to transform the local optimization problem into a classical min-cost optimization problem for solving it; 4) We combine the pull strategy with the push strategy and systematically realize our scheduling algorithm. Simulation results demonstrate that decode delay, decode ratio and redundant fraction of the P2P streaming system with our algorithm can be significantly improved, without losing throughput and increasing overhead.

  9. Design a Weight Based sorting distortion algorithm using Association rule Hiding for Privacy Preserving Data mining



    Full Text Available The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong an efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Weight Based Sorting Distortion (WBSD algorithm is to distort certain data which satisfies a particular sensitive rule. Then hide those transactions which support a sensitive rule and assigns them a priority and sorts them in ascending order according to the priority value of each rule. Then it uses these weights to compute the priority value for each transaction according to how weak the rule is that a transaction supports. Data distortion is one of the important methods to avoid this kind of scalability issues

  10. Mining-impacted sources of metal loading to an alpine stream based on a tracer-injection study, Clear Creek County, Colorado

    Fey, David L.; Wirt, Laurie


    Base flow water in Leavenworth Creek, a tributary to South Clear Creek in Clear Creek County, Colorado, contains copper and zinc at levels toxic to aquatic life. The metals are predominantly derived from the historical Waldorf mine, and sources include an adit, a mine-waste dump, and mill-tailings deposits. Tracer-injection and water-chemistry synoptic studies were conducted during low-flow conditions to quantify metal loads of mining-impacted inflows and their relative contributions to nearby Leavenworth Creek. During the 2-year investigation, the adit was rerouted in an attempt to reduce metal loading to the stream. During the first year, a lithium-bromide tracer was injected continuously into the stream to achieve steady-state conditions prior to synoptic sampling. Synoptic samples were collected from Leavenworth Creek and from discrete surface inflows. One year later, synoptic sampling was repeated at selected sites to evaluate whether rerouting of the adit flow had improved water quality.

  11. Stream Processing for Solar Physics: Applications and Implications for Big Solar Data

    Battams, Karl


    Modern advances in space technology have enabled the capture and recording of unprecedented volumes of data. In the field of solar physics this is most readily apparent with the advent of the Solar Dynamics Observatory (SDO), which returns in excess of 1 terabyte of data daily. While we now have sufficient capability to capture, transmit and store this information, the solar physics community now faces the new challenge of analysis and mining of high-volume and potentially boundless data sets such as this: a task known to the computer science community as stream mining. In this paper, we survey existing and established stream mining methods in the context of solar physics, with a goal of providing an introductory overview of stream mining algorithms employed by the computer science fields. We consider key concepts surrounding stream mining that are applicable to solar physics, outlining existing algorithms developed to address this problem in other fields of study, and discuss their applicability to massive s...

  12. The precipitation of indium at elevated pH in a stream influenced by acid mine drainage.

    White, Sarah Jane O; Hussain, Fatima A; Hemond, Harold F; Sacco, Sarah A; Shine, James P; Runkel, Robert L; Walton-Day, Katherine; Kimball, Briant A


    Indium is an increasingly important metal in semiconductors and electronics and has uses in important energy technologies such as photovoltaic cells and light-emitting diodes (LEDs). One significant flux of indium to the environment is from lead, zinc, copper, and tin mining and smelting, but little is known about its aqueous behavior after it is mobilized. In this study, we use Mineral Creek, a headwater stream in southwestern Colorado severely affected by heavy metal contamination as a result of acid mine drainage, as a natural laboratory to study the aqueous behavior of indium. At the existing pH of ~3, indium concentrations are 6-29μg/L (10,000× those found in natural rivers), and are completely filterable through a 0.45μm filter. During a pH modification experiment, the pH of the system was raised to >8, and >99% of the indium became associated with the suspended solid phase (i.e. does not pass through a 0.45μm filter). To determine the mechanism of removal of indium from the filterable and likely primarily dissolved phase, we conducted laboratory experiments to determine an upper bound for a sorption constant to iron oxides, and used this, along with other published thermodynamic constants, to model the partitioning of indium in Mineral Creek. Modeling results suggest that the removal of indium from the filterable phase is consistent with precipitation of indium hydroxide from a dissolved phase. This work demonstrates that nonferrous mining processes can be a significant source of indium to the environment, and provides critical information about the aqueous behavior of indium. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. The precipitation of indium at elevated pH in a stream influenced by acid mine drainage

    White, Sarah Jane O.; Hussain, Fatima A.; Hemond, Harold F.; Sacco, Sarah A.; Shine, James P.; Runkel, Robert L.; Walton-Day, Katherine; Kimball, Briant A.


    Indium is an increasingly important metal in semiconductors and electronics and has uses in important energy technologies such as photovoltaic cells and light-emitting diodes (LEDs). One significant flux of indium to the environment is from lead, zinc, copper, and tin mining and smelting, but little is known about its aqueous behavior after it is mobilized. In this study, we use Mineral Creek, a headwater stream in southwestern Colorado severely affected by heavy metal contamination as a result of acid mine drainage, as a natural laboratory to study the aqueous behavior of indium. At the existing pH of ~ 3, indium concentrations are 6–29 μg/L (10,000 × those found in natural rivers), and are completely filterable through a 0.45 μm filter. During a pH modification experiment, the pH of the system was raised to > 8, and > 99% of the indium became associated with the suspended solid phase (i.e. does not pass through a 0.45 μm filter). To determine the mechanism of removal of indium from the filterable and likely primarily dissolved phase, we conducted laboratory experiments to determine an upper bound for a sorption constant to iron oxides, and used this, along with other published thermodynamic constants, to model the partitioning of indium in Mineral Creek. Modeling results suggest that the removal of indium from the filterable phase is consistent with precipitation of indium hydroxide from a dissolved phase. This work demonstrates that nonferrous mining processes can be a significant source of indium to the environment, and provides critical information about the aqueous behavior of indium.

  14. The Optimization of Algorithms in the Process of Temporal Data Mining Using the Compute Unified Device Architecture

    Alexandru PIRJAN


    Full Text Available Considering the importance and usefulness of real time data mining, in recent years the concern of researchers to discover new hardware architectures that can manage and process large volumes of data has increased significantly. In this paper the performance of algorithms for temporal data mining that are implemented in the new Compute Unified Device Architecture (CUDA from the latest generation of graphics processing units (GPU will be analyzed and reviewed. The performance will be evaluated taking into account the type of algorithm, data access, the problems` size, the GPU’s processor generation, the number of threads processed

  15. Symbolic Computing with Incremental Mindmaps to Manage and Mine Data Streams - Some Applications

    Brucks, Claudine; Schommer, Christoph; Wagner, Cynthia; Weires, Ralph


    In our understanding, a mind-map is an adaptive engine that basically works incrementally on the fundament of existing transactional streams. Generally, mind-maps consist of symbolic cells that are connected with each other and that become either stronger or weaker depending on the transactional stream. Based on the underlying biologic principle, these symbolic cells and their connections as well may adaptively survive or die, forming different cell agglomerates of arbitrary size. In this work, we intend to prove mind-maps' eligibility following diverse application scenarios, for example being an underlying management system to represent normal and abnormal traffic behaviour in computer networks, supporting the detection of the user behaviour within search engines, or being a hidden communication layer for natural language interaction.

  16. Which is a more accurate predictor in colorectal survival analysis? Nine data mining algorithms vs. the TNM staging system.

    Gao, Peng; Zhou, Xin; Wang, Zhen-ning; Song, Yong-xi; Tong, Lin-lin; Xu, Ying-ying; Yue, Zhen-yu; Xu, Hui-mian


    Over the past decades, many studies have used data mining technology to predict the 5-year survival rate of colorectal cancer, but there have been few reports that compared multiple data mining algorithms to the TNM classification of malignant tumors (TNM) staging system using a dataset in which the training and testing data were from different sources. Here we compared nine data mining algorithms to the TNM staging system for colorectal survival analysis. Two different datasets were used: 1) the National Cancer Institute's Surveillance, Epidemiology, and End Results dataset; and 2) the dataset from a single Chinese institution. An optimization and prediction system based on nine data mining algorithms as well as two variable selection methods was implemented. The TNM staging system was based on the 7(th) edition of the American Joint Committee on Cancer TNM staging system. When the training and testing data were from the same sources, all algorithms had slight advantages over the TNM staging system in predictive accuracy. When the data were from different sources, only four algorithms (logistic regression, general regression neural network, bayesian networks, and Naïve Bayes) had slight advantages over the TNM staging system. Also, there was no significant differences among all the algorithms (p>0.05). The TNM staging system is simple and practical at present, and data mining methods are not accurate enough to replace the TNM staging system for colorectal cancer survival prediction. Furthermore, there were no significant differences in the predictive accuracy of all the algorithms when the data were from different sources. Building a larger dataset that includes more variables may be important for furthering predictive accuracy.

  17. A fast algorithm for finding point sources in the Fermi data stream: FermiFAST

    Asvathaman, Asha; Omand, Conor; Barton, Alistair; Heyl, Jeremy S.


    We present a new and efficient algorithm for finding point sources in the photon event data stream from the Fermi Gamma-Ray Space Telescope, FermiFAST. The key advantage of FermiFAST is that it constructs a catalogue of potential sources very fast by arranging the photon data in a hierarchical data structure. Using this structure, FermiFAST rapidly finds the photons that could have originated from a potential gamma-ray source. It calculates a likelihood ratio for the contribution of the potential source using the angular distribution of the photons within the region of interest. It can find within a few minutes the most significant half of the Fermi Third Point Source catalogue (3FGL) with nearly 80 per cent purity from the 4 yr of data used to construct the catalogue. If a higher purity sample is desirable, one can achieve a sample that includes the most significant third of the Fermi 3FGL with only 5 per cent of the sources unassociated with Fermi sources. Outside the Galactic plane, all but eight of the 580 FermiFAST detections are associated with 3FGL sources. And of these eight, six yield significant detections of greater than 5σ when a further binned likelihood analysis is performed. This software allows for rapid exploration of the Fermi data, simulation of the source detection to calculate the selection function of various sources and the errors in the obtained parameters of the sources detected.

  18. A new alley in Opinion Mining using Senti Audio Visual Algorithm

    Mukesh Rawat,


    Full Text Available People share their views about products and services over social media, blogs, forums etc. If someone is willing to spend resources and money over these products and services will definitely learn about them from the past experiences of their peers. Opinion mining plays vital role in knowing increasing interests of a particular community, social and political events, making business strategies, marketing campaigns etc. This data is in unstructured form over internet but analyzed properly can be of great use. Sentiment analysis focuses on polarity detection of emotions like happy, sad or neutral. In this paper we proposed an algorithm i.e. Senti Audio Visual for examining Video as well as Audio sentiments. A review in the form of video/audio may contain several opinions/emotions, this algorithm will classify the reviews with the help of Baye’s Classifiers to three different classes i.e., positive, negative or neutral. The algorithm will use smiles, cries, gazes, pauses, pitch, and intensity as relevant Audio Visual features.

  19. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen


    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  20. Composition and spectra of copper-carotenoid sediments from a pyrite mine stream in Spain

    Garcia-Guinea, Javier; Furio, Marta; Sanchez-Moral, Sergio; Jurado, Valme; Correcher, Virgilio; Saiz-Jimenez, Cesareo


    Mine drainages of La Poderosa (El Campillo, Huelva, Spain), located in the Rio Tinto Basin (Iberian Pyrite Belt) generate carotenoid complexes mixed with copper sulfates presenting good natural models for the production of carotenoids from microorganisms. The environmental conditions of Rio Tinto Basin include important environmental stresses to force the microorganisms to accumulate carotenoids. Here we show as carotenoid compounds in sediments can be analyzed directly in the solid state by Raman and Luminescence spectroscopy techniques to identify solid carotenoid, avoiding dissolution and pre-concentration treatments, since the hydrous copper-salted paragenesis do not mask the Raman emission of carotenoids. Raman spectra recorded from one of these specimens' exhibit major features at approximately 1006, 1154, and 1520 cm-1. The bands at 1520 cm-1 and 1154 cm-1 can be assigned to in-phase Cdbnd C (γ-1) and Csbnd C stretching (γ-2) vibrations of the polyene chain in carotenoids. The in-plane rocking deformations of CH3 groups linked to this chain coupled with Csbnd C bonds are observed in the 1006 cm-1 region. X-irradiation pretreatments enhance the cathodoluminescence spectra emission of carotenoids enough to distinguish organic compounds including hydroxyl and carboxyl groups. Carotenoids in copper-sulfates could be used as biomarkers and useful proxies for understanding remote mineral formations as well as for terrestrial environmental investigations related to mine drainage contamination including biological activity and photo-oxidation processes.

  1. Microbial communities and geochemical dynamics in an extremely acidic, metal-rich stream at an abandoned sulfide mine (Huelva, Spain) underpinned by two functional primary production systems.

    Rowe, Owen F; Sánchez-España, Javier; Hallberg, Kevin B; Johnson, D Barrie


    An extremely acidic (pH 2.5-2.75) metal-rich stream draining an abandoned mine in the Iberian Pyrite Belt, Spain, was ramified with stratified macroscopic gelatinous microbial growths ('acid streamers' or 'mats'). Microbial communities of streamer/mat growths sampled at different depths, as well as those present in the stream water itself, were analysed using a combined biomolecular and cultivation-based approach. The oxygen-depleted mine water was dominated by the chemolithotrophic facultative anaerobe Acidithiobacillus ferrooxidans, while the streamer communities were found to be highly heterogeneous and very different to superficially similar growths reported in other extremely acidic environments. Microalgae accounted for a significant proportion of surface streamer biomass, while subsurface layers were dominated by heterotrophic acidophilic bacteria (Acidobacteriacae and Acidiphilium spp.). Sulfidogenic bacteria were isolated from the lowest depth streamer growths, where there was also evidence for selective biomineralization of copper sulfide. Archaeal clones (exclusively Euryarchaeota) were recovered from streamer samples, as well as the mine stream water. Both sunlight and reduced inorganic chemicals (predominantly ferrous iron) served as energy sources for primary producers in this ecosystem, promoting complex microbial interactions involving transfer of electron donors and acceptors and of organic carbon, between microorganisms in the stream water and the gelatinous streamer growths. Microbial transformations were shown to impact the biogeochemical cycling of iron and sulfur in the acidic stream, severely restricting the net oxidation of ferrous iron even when the initially anoxic waters were oxygenated by indigenous acidophilic algae. A model accounting for the biogeochemistry of iron and sulfur in the mine waters is described, and the significance of the acidophilic communities in regulating the geochemistry of acidic, metal-rich waters is described.

  2. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy


    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Using Data Mining to Find Patterns in Ant Colony Algorithm Solutions to the Travelling Salesman Problem

    YAN Shiliang; WANG Yinling


    Travelling Salesman Problem (TSP) is a classical optimization problem and it is one of a class of NP-Problem. The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by an Ant Colony Algorithm (ACA) performing a searching operation and to develop a rule set searcher which approximates the ACA'S searcher. An attribute-oriented induction methodology was used to explore the relationship between an operations' sequence and its attributes and a set of rules has been developed. At the end of this paper, the experimental results have shown that the proposed approach has good performance with respect to the quality of solution and the speed of computation.

  4. A P2P Botnet Virus Detection System Based on Data-Mining Algorithms

    Wernhuar Tarng


    Full Text Available A P2P botnet virus detection system based on data-mining algorithms is proposed in this study to detect theinfected computers quickly using Bayes Classifier and Neural Network (NN Classifier. The system candetect P2P botnet viruses in the early stage of infection and report to network managers to avoid furtherinfection. The system adopts real-time flow identification techniques to detect traffic flows produced by P2Papplication programs and botnet viruses by comparing with the known flow patterns in the database. Aftertrained by adjusting the system parameters using test samples, the experimental results show that theaccuracy of Bayes Classifier is 95.78% and that of NN Classifier is 98.71% in detecting P2P botnet virusesand suspected flows to achieve the goal of infection control in a short time.

  5. Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

    Eberhardt, D. S.; Baganoff, D.; Stevens, K.


    Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.

  6. A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems

    National Aeronautics and Space Administration — In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the...

  7. Optimization of bioenergy crop selection and placement based on a stream health indicator using an evolutionary algorithm.

    Herman, Matthew R; Nejadhashemi, A Pouyan; Daneshvar, Fariborz; Abouali, Mohammad; Ross, Dennis M; Woznicki, Sean A; Zhang, Zhen


    The emission of greenhouse gases continues to amplify the impacts of global climate change. This has led to the increased focus on using renewable energy sources, such as biofuels, due to their lower impact on the environment. However, the production of biofuels can still have negative impacts on water resources. This study introduces a new strategy to optimize bioenergy landscapes while improving stream health for the region. To accomplish this, several hydrological models including the Soil and Water Assessment Tool, Hydrologic Integrity Tool, and Adaptive Neruro Fuzzy Inference System, were linked to develop stream health predictor models. These models are capable of estimating stream health scores based on the Index of Biological Integrity. The coupling of the aforementioned models was used to guide a genetic algorithm to design watershed-scale bioenergy landscapes. Thirteen bioenergy managements were considered based on the high probability of adaptation by farmers in the study area. Results from two thousand runs identified an optimum bioenergy crops placement that maximized the stream health for the Flint River Watershed in Michigan. The final overall stream health score was 50.93, which was improved from the current stream health score of 48.19. This was shown to be a significant improvement at the 1% significant level. For this final bioenergy landscape the most often used management was miscanthus (27.07%), followed by corn-soybean-rye (19.00%), corn stover-soybean (18.09%), and corn-soybean (16.43%). The technique introduced in this study can be successfully modified for use in different regions and can be used by stakeholders and decision makers to develop bioenergy landscapes that maximize stream health in the area of interest.

  8. Geochemical study of stream waters affected by mining activities in the SE Spain

    Garcia-Lorenzo, Maria Luz; Perez-Sirvent, Carmen; Martinez-Sanchez, Maria Jose; Bech, Jaime


    Water pollution by dissolved metals in mining areas has mainly been associated with the oxidation of sulphide-bearing minerals exposed to weathering conditions, resulting in low quality effluents of acidic pH and containing a high level of dissolved metals. According to transport process, three types of pollution could be established: a) Primary contamination, formed by residues placed close to the contamination sources; b) Secondary contamination, produced as a result of transport out of its production areas; c) Tertiary contamination. The aim of this work was to study trace element in water samples affected by mining activities and to apply the MINTEQ model for calculating aqueous geochemical equilibria. The studied area constituted an important mining centre for more than 2500 years, ceasing activity in 1991. The ore deposits of this zone have iron, lead and zinc as the main metal components. As a result, a lot of contaminations sources, formed by mining steriles, waste piles and foundry residues are present. For this study, 36 surficial water samples were collected after a rain episode in 4 different areas. In these samples, the trace element content was determined by by flame atomic absorption spectrometry (Fe and Zn), electrothermal atomization atomic absorption spectrometry (Pb and Cd), atomic fluorescence spectrometry (As) and ICP-MS for Al. MINTEQA2 is a geochemical equilibrium speciation model capable of computing equilibria among the dissolved, adsorbed, solid, and gas phases in an environmental setting and was applied to collected waters. Zone A: A5 is strongly influenced by tailing dumps and showed high trace element content. In addition, is influenced by the sea water and then showed high bromide, chloride, sodium and magnesium content, together with a basic pH. The MINTEQ model application suggested that Zn and Cd could precipitate as carbonate (hidrocincite, smithsonite and otavite). A9 also showed acid pH and high trace element content; is

  9. [Fraction distribution and risk assessment of heavy metals in stream sediments from a typical nonferrous metals mining city].

    Li, Ru-Zhong; Jiang, Yan-Min; Pan, Cheng-Rong; Chen, Jing; Xu, Jing-Jing


    A modified Tessier's sequential extraction procedure was used to investigate the fraction of seven types of heavy metals (Cd, Cr, Cu, Zn, Ni, Pb, As) in the surface sediments from Huixi Stream in Tongling City, a typical nonferrous metals mining city, China. Based on speciation distribution analysis of these metals, contamination degree and ecological risk assessment of heavy metals were conducted by means of risk assessment code (RAC) and mean sediment quality guideline quotient (SQG-Q). The results show that: (1) Cr and As are major composed with residual fractions, Zn, Ni and Pb are mainly constituted of residual and bound to iron and manganese oxides fractions, and Cu is dominated by bounding to organic matter, while Cd exists in approximate mass fractions of exchangeable, bound to carbonates, bound to iron and manganese oxides, and residue. (2) Carbonate and exchangeable mass fractions of Cd, Cr, Cu, Zn, Ni, Pb and As reach 46.48%, 4.62%, 4.05%, 4.12%, 9.17%, 0.97% and 0.03%, respectively. According to the RAC, Cd is of high risk to the environment, Cr, Cu, Zn and Ni are of low risk to the environment, while Pb and As pose extreme low risk to the environment. (3) The SQG index, calculated with SQG-Q, is 10.42, which is far higher than the threshold value 1.0, indicating that the sediment in Huixi Stream has a very high potential for biological toxicity effect. The PEL-Q indexes corresponding to Cd, Cr, Cu, Zn, Ni, Pb and As approach 4.23, 1.14, 20.75, 6.04, 2.33, 4.58 and 41.71, respectively, suggesting that all these metals have great potentials for biological toxicity and the adverse effects will frequently occur.

  10. Periphyton communities in a pristine mountain stream above and below heavy metal mining operations

    Deniseger, J.; Austin, A.; Lucey, W.P.


    Changes in species composition of the periphyton on introduced substrates were determined in an oligotrophic mountain stream subject to long-term heavy metal contamination. At the upstream control site, the numerically most abundant taxa were Bacillariophyta (Achnanthes minutissima, Achnanthes microcephala and Achnanthes linearis) as well as, in summer, the Chlorophyta (Mougeotia spp. and Ulothrix subtilissima). At the downstream contaminated site the periphyton community was totally dominated by Bacillariophyta throughout the sampling period. A. minutissima and A. microcephala were co-dominants during spring. Seasonal succession patterns did not parallel those at the upstream site. Chlorophyta were virtually absent and A. minutissima comprised 94% of the community during summer. Species diversity, species evenness and dissimilarity index were utilized to detect differences in species composition, abundance and number. Slight differences were found in spring samples while summer samples indicated major differences between sampling sites.

  11. Real-Time Data Mining of Massive Data Streams from Synoptic Sky Surveys

    Djorgovski, S G; Donalek, C; Mahabal, A A; Drake, A J; Turmon, M; Fuchs, T


    The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time. Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets. Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a k...

  12. Effective classification of microRNA precursors using feature mining and AdaBoost algorithms.

    Zhong, Ling; Wang, Jason T L; Wen, Dongrong; Aris, Virginie; Soteropoulos, Patricia; Shapiro, Bruce A


    MicroRNAs play important roles in most biological processes, including cell proliferation, tissue differentiation, and embryonic development, among others. They originate from precursor transcripts (pre-miRNAs), which contain phylogenetically conserved stem-loop structures. An important bioinformatics problem is to distinguish the pre-miRNAs from pseudo pre-miRNAs that have similar stem-loop structures. We present here a novel method for tackling this bioinformatics problem. Our method, named MirID, accepts an RNA sequence as input, and classifies the RNA sequence either as positive (i.e., a real pre-miRNA) or as negative (i.e., a pseudo pre-miRNA). MirID employs a feature mining algorithm for finding combinations of features suitable for building pre-miRNA classification models. These models are implemented using support vector machines, which are combined to construct a classifier ensemble. The accuracy of the classifier ensemble is further enhanced by the utilization of an AdaBoost algorithm. When compared with two closely related tools on twelve species analyzed with these tools, MirID outperforms the existing tools on the majority of the twelve species. MirID was also tested on nine additional species, and the results showed high accuracies on the nine species. The MirID web server is fully operational and freely accessible at . Potential applications of this software in genomics and medicine are also discussed.

  13. A genetic algorithm approach for open-pit mine production scheduling

    Aref Alipour


    Full Text Available In an Open-Pit Production Scheduling (OPPS problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA is a well-known member of evolutionary algorithms that widely are utilized to solve NP-hard problems. Herein, GA is implemented in a hypothetical Two-Dimensional (2D copper orebody model. The orebody is featured as two-dimensional (2D array of blocks. Likewise, counterpart 2D GA array was used to represent the OPPS problem’s solution space. Thereupon, the fitness function is defined according to the OPPS problem’s objective function to assess the solution domain. Also, new normalization method was used for the handling of block sequencing constraint. A numerical study is performed to compare the solutions of the exact and GA-based methods. It is shown that the gap between GA and the optimal solution by the exact method is less than % 5; hereupon GA is found to be efficiently in solving OPPS problem.

  14. Clustering Text Data Streams

    Yu-Bao Liu; Jia-Rong Cai; Jian Yin; Ada Wai-Chee Fu


    Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organization and topic detection and tracing etc. However, most methods are similarity-based approaches and only use the TF*IDF scheme to represent the semantics of text data and often lead to poor clustering quality. Recently, researchers argue that semantic smoothing model is more efficient than the existing TF.IDF scheme for improving text clustering quality. However, the existing semantic smoothing model is not suitable for dynamic text data context. In this paper, we extend the semantic smoothing model into text data streams context firstly. Based on the extended model, we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams. In both algorithms, we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process. Some efficient implementations for our algorithms are also given. Finally, we present a series of experimental results illustrating the effectiveness of our technique.

  15. Influences of acid mine drainage and thermal enrichment on stream fish reproduction and larval survival

    Hafs, Andrew W.; Horn, C.D.; Mazik, P.M.; Hartman, K.J.


    Potential effects of acid mine drainage (AMD) and thermal enrichment on the reproduction of fishes were investigated through a larval-trapping survey in the Stony River watershed, Grant County, WV. Trapping was conducted at seven sites from 26 March to 2 July 2004. Overall larval catch was low (379 individuals in 220 hours of trapping). More larval White Suckers were captured than all other species. Vectors fitted to nonparametric multidimensional scaling ordinations suggested that temperature was highly correlated to fish communities captured at our sites. Survival of larval Fathead Minnows was examined in situ at six sites from 13 May to 11 June 2004 in the same system. Larval survival was lower, but not significantly different between sites directly downstream of AMD-impacted tributaries (40% survival) and non-AMD sites (52% survival). The lower survival was caused by a significant mortality event at one site that coincided with acute pH depression in an AMD tributary immediately upstream of the site. Results from a Cox proportional hazard test suggests that low pH is having a significant negative influence on larval fish survival in this system. The results from this research indicate that the combination of low pH events and elevated temperature are negatively influencing the larval fish populations of the Stony River watershed. Management actions that address these problems would have the potential to substantially increase both reproduction rates and larval survival, therefore greatly enhancing the fishery.

  16. Evaluating remedial alternatives for an acid mine drainage stream: A model post audit

    Runkel, Robert L.; Kimball, Briant A.; Walton-Day, Katherine; Verplanck, Philip L.; Broshears, Robert E.


    A post audit for a reactive transport model used to evaluate acid mine drainage treatment systems is presented herein. The post audit is based on a paired synoptic approach in which hydrogeochemical data are collected at low (existing conditions) and elevated (following treatment) pH. Data obtained under existing, low-pH conditions are used for calibration, and the resultant model is used to predict metal concentrations observed following treatment. Predictions for Al, As, Fe, H+, and Pb accurately reproduce the observed reduction in dissolved concentrations afforded by the treatment system, and the information provided in regard to standard attainment is also accurate (predictions correctly indicate attainment or nonattainment of water quality standards for 19 of 25 cases). Errors associated with Cd, Cu, and Zn are attributed to misspecification of sorbent mass (precipitated Fe). In addition to these specific results, the post audit provides insight in regard to calibration and sensitivity analysis that is contrary to conventional wisdom. Steps taken during the calibration process to improve simulations of As sorption were ultimately detrimental to the predictive results, for example, and the sensitivity analysis failed to bracket observed metal concentrations.

  17. New algorithm of mine slope reliability based on limiting state hyper-plane and its engineering application

    刘志祥; 唐志祥; 王卫华; 孙晶晶; 彭康


    Due to the influence of joint fissure, mining intensity, designed slope angle, underground water and rainfall, the failure process of mine slope project is extremely complicated. The current safety factor calculation method has certain limitations, and it would be difficult to obtain the reliability index when the performance function of reliability analysis is implicit or has high order terms. Therefore, with the help of the logistic equation of chaos theory, a new algorithm of mine slope reliability based on limiting state hyper-plane is proposed. It is shown that by using this new reliability algorithm the calculation of partial derivative of performance function is avoided, and it has the advantages of being simple and easy to program. The new algorithm is suitable for calculating the reliability index of complex performance function containing high order terms. Furthermore, the limiting state hyper-plane models of both simplified Bishop’s and Janbu’s method adaptive to slope project are obtained, and have achieved satisfactory effect in the study of mine slope stability in Dexing copper open pit.

  18. A survey of temporal data mining

    Srivatsan Laxman; P S Sastry


    Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

  19. Methylmercury degradation and exposure pathways in streams and wetlands impacted by historical mining

    Donovan, Patrick M.; Blum, Joel D.; Singer, Michael B.; Marvin-DiPasquale, Mark C.; Tsui, Martin T.K.


    Monomethyl mercury (MMHg) and total mercury (THg) concentrations and Hg stable isotope ratios (δ202Hg and Δ199Hg) were measured in sediment and aquatic organisms from Cache Creek (California Coast Range) and Yolo Bypass (Sacramento Valley). Cache Creek sediment had a large range in THg (87 to 3870 ng/g) and δ202Hg (−1.69 to −0.20‰) reflecting the heterogeneity of Hg mining sources in sediment. The δ202Hg of Yolo Bypass wetland sediment suggests a mixture of high and low THg sediment sources. Relationships between %MMHg (the percent ratio of MMHg to THg) and Hg isotope values (δ202Hg and Δ199Hg) in fish and macroinvertebrates were used to identify and estimate the isotopic composition of MMHg. Deviation from linear relationships was found between %MMHg and Hg isotope values, which is indicative of the bioaccumulation of isotopically distinct pools of MMHg. The isotopic composition of pre-photodegraded MMHg (i.e., subtracting fractionation from photochemical reactions) was estimated and contrasting relationships were observed between the estimated δ202Hg of pre-photodegraded MMHg and sediment IHg. Cache Creek had mass dependent fractionation (MDF; δ202Hg) of at least −0.4‰ whereas Yolo Bypass had MDF of +0.2 to +0.5‰. This result supports the hypothesis that Hg isotope fractionation between IHg and MMHg observed in rivers (−MDF) is unique compared to +MDF observed in non-flowing water environments such as wetlands, lakes, and the coastal ocean.

  20. Methylmercury degradation and exposure pathways in streams and wetlands impacted by historical mining.

    Donovan, Patrick M; Blum, Joel D; Singer, Michael Bliss; Marvin-DiPasquale, Mark; Tsui, Martin T K


    Monomethyl mercury (MMHg) and total mercury (THg) concentrations and Hg stable isotope ratios (δ(202)Hg and Δ(199)Hg) were measured in sediment and aquatic organisms from Cache Creek (California Coast Range) and Yolo Bypass (Sacramento Valley). Cache Creek sediment had a large range in THg (87 to 3870ng/g) and δ(202)Hg (-1.69 to -0.20‰) reflecting the heterogeneity of Hg mining sources in sediment. The δ(202)Hg of Yolo Bypass wetland sediment suggests a mixture of high and low THg sediment sources. Relationships between %MMHg (the percent ratio of MMHg to THg) and Hg isotope values (δ(202)Hg and Δ(199)Hg) in fish and macroinvertebrates were used to identify and estimate the isotopic composition of MMHg. Deviation from linear relationships was found between %MMHg and Hg isotope values, which is indicative of the bioaccumulation of isotopically distinct pools of MMHg. The isotopic composition of pre-photodegraded MMHg (i.e., subtracting fractionation from photochemical reactions) was estimated and contrasting relationships were observed between the estimated δ(202)Hg of pre-photodegraded MMHg and sediment IHg. Cache Creek had mass dependent fractionation (MDF; δ(202)Hg) of at least -0.4‰ whereas Yolo Bypass had MDF of +0.2 to +0.5‰. This result supports the hypothesis that Hg isotope fractionation between IHg and MMHg observed in rivers (-MDF) is unique compared to +MDF observed in non-flowing water environments such as wetlands, lakes, and the coastal ocean. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Use of microbes for cost reduction of metal removal from metals and mining industry waste streams

    Cohen, R.R.H. [Colorado School of Mines, Golden, CO (United States). Division for Environmental Science & Engineering


    Acid-rock drainage (ARD) - also known as acid-mine drainage (AMD) - results from the exposure of sulfide minerals, particularly pyritic and pyrrhotitic minerals, to atmospheric oxygen and water. Recent developments and improvements have resulted in construction of bioreactors that have a smaller footprint, and treat the metals and acidity more effectively. Many studies have demonstrated that the primary removal mechanisms for the metals are sulphate-reducing bacteria (SRB). These microbes facilitate the conversion of sulphate to sulphide. The sulphides react with metals to precipitate them as metal sulfides, many of which are stable in the anaerobic conditions of the treatment system. Plants have been shown to remove metals by uptake or oxidative precipitation near the roots. Plants seem to account for only a small percentage of the metal removal capacity of the wetland treatment systems. Adsorption of metals to the organic substrates of the treatment systems can result in metal removal, but adsorption capacity is saturated in short periods of time. The SRB are obligate anaerobes which prefer conditions between pH 5 and 8. Thus, the input water characteristics could impact the efficiency and life expectancy of the treatment systems. The most important characteristic of input waters seems to be pH. Oxyanions such as chromate and arsenate can be removed using the wetland treatment system (passive bioreactor) technology. Arsenic is removed as an arsenic sulfide compound and chromate is reduced to Cr(III) and precipitated as a hydroxide. The passive bioreactor - wetland treatment system offers a less expensive alternative to the conventional chemical precipitation technologies. There still are problems of system hydraulics and useful life to be addressed.

  2. Assessing the responses of creek chub (Semotilus atromaculatus) and pearl dace (Semotilus margarita) to metal mine effluents using in situ artificial streams in Sudbury, Ontario, Canada.

    Dubé, Monique G; MacLatchy, Deborah L; Hruska, Kimberly A; Glozier, Nancy E


    Mining of the world's second-largest nickel deposits in the area of Sudbury, Ontario, Canada, has caused acidification and metal saturation of some catchments. We conducted artificial stream studies in the years 2001 and 2002 to assess the effects of treated metal mine effluents (MMEs) from three different mining operations discharging to Junction Creek, Sudbury, on two fish species, creek chub (Semotilus atromaculatus) and pearl dace (Semotilus margarita). Treatments tested for 35 to 41 d included reference water, Garson MME (30%), Nolin MME (20%), and Copper Cliff MME (45%). In 2001, effects on chub included reduced survival and depressed testosterone levels (fivefold reduction) after exposure to all MMEs. In 2002, chub and dace survival were reduced to less than 60% in the Copper Cliff and Garson treatments. In addition, the total body weights of male and female dace were reduced after exposure to the Garson and Copper Cliff treatments. In 2001 and 2002, responses were most common to the 45% Copper Cliff and 30% Garson effluents, with consistent increases in nickel, rubidium, strontium, iron, lithium, thallium, and selenium observed across treatment waters and body tissues. More work is required to link observed effects to field effects and to identify multitrophic level responses of the ecosystem to the MMEs. The artificial stream studies provided a mechanism to identify changes in the endpoints of relevant fish species exposed to present-day metal mine discharges independent of historical depositions of metals in the Sudbury area.

  3. A potential causal association mining algorithm for screening adverse drug reactions in postmarketing surveillance.

    Ji, Yanqing; Ying, Hao; Dews, Peter; Mansour, Ayman; Tran, John; Miller, Richard E; Massanari, R Michael


    Early detection of unknown adverse drug reactions (ADRs) in postmarketing surveillance saves lives and prevents harmful consequences. We propose a novel data mining approach to signaling potential ADRs from electronic health databases. More specifically, we introduce potential causal association rules (PCARs) to represent the potential causal relationship between a drug and ICD-9 (CDC. (2010). International Classification of Diseases, Ninth Revision (ICD-9). [Online]. Available: coded signs or symptoms representing potential ADRs. Due to the infrequent nature of ADRs, the existing frequency-based data mining methods cannot effectively discover PCARs. We introduce a new interestingness measure, potential causal leverage, to quantify the degree of association of a PCAR. This measure is based on the computational, experience-based fuzzy recognition-primed decision (RPD) model that we developed previously (Y. Ji, R. M. Massanari, J. Ager, J. Yen, R. E. Miller, and H. Ying, "A fuzzy logic-based computational recognition-primed decision model," Inf. Sci., vol. 177, pp. 4338-4353, 2007) on the basis of the well-known, psychology-originated qualitative RPD model (G. A. Klein, "A recognition-primed decision making model of rapid decision making," in Decision Making in Action: Models and Methods, 1993, pp. 138-147). The potential causal leverage assesses the strength of the association of a drug-symptom pair given a collection of patient cases. To test our data mining approach, we retrieved electronic medical data for 16,206 patients treated by one or more than eight drugs of our interest at the Veterans Affairs Medical Center in Detroit between 2007 and 2009. We selected enalapril as the target drug for this ADR signal generation study. We used our algorithm to preliminarily evaluate the associations between enalapril and all the ICD-9 codes associated with it. The experimental results indicate that our approach has a potential to

  4. Selenium and other trace elements in aquatic insects in coal mine-affected streams in the Rocky Mountains of Alberta, Canada

    Wayland, M.; Crosley, R. [Environmental Canada, Saskatoon, SK (Canada)


    We determined levels of Se, As, Cd, Pb, and Zn in aquatic insects at coal mine-impacted and reference sites in streams in the Rocky Mountain foothills of west central Alberta from 2001-2003. Selenium levels were greater at coal mine-impacted sites than at reference sites in caddisflies but not in mayflies or stoneflies. Arsenic levels were greater at coal mine-impacted sites than at reference sites in caddisflies and stoneflies but not in mayflies. Zn levels were higher at coal mine-impacted sites than at reference sites in all three groups of insects. At coal mine-impacted sites, Se levels in mayflies and caddisflies were greater than those in stoneflies while at reference sites mayflies contained greater concentrations of Se than either caddisflies or stoneflies. Arsenic levels in mayflies were greater than those in caddisflies at reference and coal mine-impacted sites and were greater than those in stoneflies at reference sites. At both types of sites Cd differed amongst insect taxa in the order of mayflies < caddisflies < stoneflies. The same was true of Zn at coal mine-affected sites. At reference sites, stoneflies had greater concentrations of Zn than both mayflies and caddisflies. At both types of sites, Pb levels were greater in mayflies and caddisflies than they were in stoneflies. Of the five trace elements considered in this study, only Se was sufficiently elevated in aquatic invertebrates to be of potential concern for consumers such as fish and aquatic birds. Such was the case at both coal mine-impacted and reference sites.

  5. Occurrence, distribution, and volume of metals-contaminated sediment of selected streams draining the Tri-State Mining District, Missouri, Oklahoma, and Kansas, 2011–12

    Smith, D. Charlie


    Lead and zinc were mined in the Tri-State Mining District (TSMD) of southwest Missouri, northeast Oklahoma, and southeast Kansas for more than 100 years. The effects of mining on the landscape are still evident, nearly 50 years after the last mine ceased operation. The legacies of mining are the mine waste and discharge of groundwater from underground mines. The mine-waste piles and underground mines are continuous sources of trace metals (primarily lead, zinc, and cadmium) to the streams that drain the TSMD. Many previous studies characterized the horizontal extent of mine-waste contamination in streams but little information exists on the depth of mine-waste contamination in these streams. Characterizing the vertical extent of contamination is difficult because of the large amount of coarse-grained material, ranging from coarse gravel to boulders, within channel sediment. The U.S. Geological Survey, in cooperation with U.S. Fish and Wildlife service, collected channel-sediment samples at depth for subsequent analyses that would allow attainment of the following goals: (1) determination of the relation between concentration and depth for lead, zinc and cadmium in channel sediments and flood-plain sediments, and (2) determination of the volume of gravel-bar sediment from the surface to the maximum depth with concentrations of these metals that exceeded sediment-quality guidelines. For the purpose of this report, volume of gravel-bar sediment is considered to be distributed in two forms, gravel bars and the wetted channel, and this study focused on gravel bars. Concentrations of lead, zinc, and cadmium in samples were compared to the consensus probable effects concentration (CPEC) and Tri-State Mining District specific probable effects concentration (TPEC) sediment-quality guidelines.During the study, more than 700 sediment samples were collected from borings at multiple sites, including gravel bars and flood plains, along Center Creek, Turkey Creek, Shoal Creek

  6. A Huge Dimension Table Join Algorithm for Construction of StreamCube%一种构建StreamCube的超大维表连接算法

    甘亮; 贾焰; 李爱平; 金鑫


    表连接是关系数据库中最重要的操作之一,在数据流管理系统中同样重要.构建StreamCube的聚集查询时,数据流与超大维表(如IPaddress维表)作表连接将耗费大量有限的计算资源和内存.超大维表需划分为多个块,分块读入内存,造成磁盘I/O频繁.根据维表及其连接键层的特性,降低维表与数据流连接的连接键冗余,将维表无损压缩为可装入内存的连接键范围维表(RJ-DT),引出数据流上非等值连接问题;并提出一种超大维表多表连接算法--多动态索引嵌套循环连接算法(multi dynamic index nested-loop join),该算法实现数据流与压缩维表高效的非等值连接,并拓展为多表连接.理论分析及实验结果表明,该算法可使超大维表连接性能明显改善,最高可达到一个数量级的加速并具有很强的实用性.%Join is one of the most important operations in relational database, and is also important in data stream management system. In group-bys which construct StreamCube, join will be done before them, and join between data stream and huge dimension tables (such as IPaddress table) would consume limited power of CPU and capacity of memory. Generally, a huge dimension table must be partitioned into small tables and each partition table is loaded into memory in turn that causes frequent disk I/O. To avoid this shortage, it compress huge dimension tables losslessly by taking characters of dimension tables and their join-key layer into account and finding join-key redundancies in those tables. So, one dimension table with n concept columns is compressed into n ranged join-key dimension tables (RJ-DT) by reducing join-key redundancies and using decomposed of storage model of column-store. Each RJ-DT is composed of start and end columns and several concept columns.However, a new issue that non-equijoin called range join between data stream and RJ-DT is brought out. Then, it proposes a multi-join algorithm of huge

  7. A data mining algorithm for automated characterisation of fluctuations in multichannel timeseries

    Pretty, D. G.; Blackwell, B. D.


    We present a data mining technique for the analysis of multichannel oscillatory timeseries data and show an application using poloidal arrays of magnetic sensors installed in the H-1 heliac. The procedure is highly automated, and scales well to large datasets. The timeseries data is split into short time segments to provide time resolution, and each segment is represented by a singular value decomposition (SVD). By comparing power spectra of the temporal singular vectors, related singular values are grouped into subsets which define fluctuation structures. Thresholds for the normalised energy of the fluctuation structure and the normalised entropy of the SVD can be used to filter the dataset. We assume that distinct classes of fluctuations are localised in the space of phase differences Δψ(n,n+1) between each pair of nearest neighbour channels. An expectation maximisation clustering algorithm is used to locate the distinct classes of fluctuations and assign mode numbers where possible, and a cluster tree mapping is used to visualise the results.

  8. A Set Operation Based Algorithm for Association Rules Mining%基于集合运算的关联规则采掘算法

    铁治欣; 陈奇; 俞瑞钊


    Mining association rules are an important data mining problem. In this paper ,an association rules mining algorithm,ARDBSO,which is based on set operation,is given. It can find all large itemsets in the database while only scan the database once. So,the time for I/O is reduced enormously and the efficiency of ARDBSO is improved. The experiments show that the efficiency of ARDBSO is 80~ 150times of Apriori's.

  9. Quality of water and sediment in streams affected by historical mining, and quality of Mine Tailings, in the Rio Grande/Rio Bravo Basin, Big Bend Area of the United States and Mexico, August 2002

    Lambert, Rebecca B.; Kolbe, Christine M.; Belzer, Wayne


    The U.S. Geological Survey, in cooperation with the International Boundary and Water Commission - U.S. and Mexican Sections, the National Park Service, the Texas Commission on Environmental Quality, the Secretaria de Medio Ambiente y Recursos Naturales in Mexico, the Area de Proteccion de Flora y Fauna Canon de Santa Elena in Mexico, and the Area de Proteccion de Flora y Fauna Maderas del Carmen in Mexico, collected samples of stream water, streambed sediment, and mine tailings during August 2002 for a study to determine whether trace elements from abandoned mines in the area in and around Big Bend National Park have affected the water and sediment quality in the Rio Grande/Rio Bravo Basin of the United States and Mexico. Samples were collected from eight sites on the main stem of the Rio Grande/Rio Bravo, four Rio Grande/Rio Bravo tributary sites downstream from abandoned mines or mine-tailing sites, and 11 mine-tailing sites. Mines in the area were operated to produce fluorite, germanium, iron, lead, mercury, silver, and zinc during the late 1800s through at least the late 1970s. Moderate (relatively neutral) pHs in stream-water samples collected at the 12 Rio Grande/Rio Bravo main-stem and tributary sites indicate that water is well mixed, diluted, and buffered with respect to the solubility of trace elements. The highest sulfate concentrations were in water samples from tributaries draining the Terlingua mining district. Only the sample from the Rough Run Draw site exceeded the Texas Surface Water Quality Standards general-use protection criterion for sulfate. All chloride and dissolved solids concentrations in water samples were less than the general-use protection criteria. Aluminum, copper, mercury, nickel, selenium, and zinc were detected in all water samples for which each element was analyzed. Cadmium, chromium, and lead were detected in samples less frequently, and silver was not detected in any of the samples. None of the sample concentrations of

  10. Application of Tracer-Injection Techniques to Demonstrate Surface-Water and Ground-Water Interactions Between an Alpine Stream and the North Star Mine, Upper Animas River Watershed, Southwestern Colorado

    Wright, Winfield G.; Moore, Bryan


    Tracer-injection studies were done in Belcher Gulch in the upper Animas River watershed, southwestern Colorado, to determine whether the alpine stream infiltrates into underground mine workings of the North Star Mine and other nearby mines in the area. The tracer-injection studies were designed to determine if and where along Belcher Gulch the stream infiltrates into the mine. Four separate tracer-injec-tion tests were done using lithium bromide (LiBr), optical brightener dye, and sodium chloride (NaCl) as tracer solu-tions. Two of the tracers (LiBr and dye) were injected con-tinuously for 24 hours, one of the NaCl tracers was injected continuously for 12 hours, and one of the NaCl tracers was injected over a period of 1 hour. Concentration increases of tracer constituents were detected in water discharging from the North Star Mine, substantiating a surface-water and ground-water connection between Belcher Gulch and the North Star Mine. Different timing and magnitude of tracer breakthroughs indicated multiple flow paths with different residence times from the stream to the mine. The Pittsburgh and Sultan Mines were thought to physically connect to the North Star Mine, but tracer breakthroughs were inconclusive in water from these mines. From the tracer-injection tests and synoptic measure-ments of streamflow discharge, a conceptual model was devel-oped for surface-water and ground-water interactions between Belcher Gulch and the North Star Mine. This information, combined with previous surface geophysical surveys indicat-ing the presence of subsurface voids, may assist with decision-making process for preventing infiltration and for the remedia-tion of mine drainage from these mines.

  11. A Local Distributed Peer-to-Peer Algorithm Using Multi-Party Optimization Based Privacy Preservation for Data Mining Primitive Computation

    National Aeronautics and Space Administration — This paper proposes a scalable, local privacy-preserving algorithm for distributed peer-to-peer (P2P) data aggregation useful for many advanced data mining/analysis...

  12. Scalable Distributed Change Detection from Astronomy Data Streams using Local, Asynchronous Eigen Monitoring Algorithms

    National Aeronautics and Space Administration — This paper considers the problem of change detection using local distributed eigen monitoring algorithms for next generation of astronomy petascale data pipelines...

  13. A QoE Aware Fairness Bi-level Resource Allocation Algorithm for Multiple Video Streaming in WLAN

    Hu Zhou


    Full Text Available With the increasing of smart devices such as mobile phones and tablets, the scenario of multiple video users watching video streaming simultaneously in one wireless local area network (WLAN becomes more and more popular. However, the quality of experience (QoE and the fairness among multiple users are seriously impacted by the limited bandwidth and shared resources of WLAN. In this paper, we propose a novel bi-level resource allocation algorithm. To maximize the total throughput of the network, the WLAN is firstly tuned to the optimal operation point. Then the wireless resource is carefully allocated at the first level, i.e., between AP and uplink background traffic users, and the second level, i.e., among downlink video users. The simulation results show that the proposed algorithm can guarantee the QoE and the fairness for all the video users, and there is little impact on the average throughput of the background traffic users.

  14. Abandoned mine drainage in the Swatara Creek Basin, southern anthracite coalfield, Pennsylvania, USA: 1. stream quality trends coinciding with the return of fish

    Cravotta, Charles A.; Brightbill, Robin A.; Langland, Michael J.


    Acidic mine drainage (AMD) from legacy anthracite mines has contaminated Swatara Creek in eastern Pennsylvania. Intermittently collected base-flow data for 1959–1986 indicate that fish were absent immediately downstream from the mined area where pH ranged from 3.5 to 7.2 and concentrations of sulfate, dissolved iron, and dissolved aluminum were as high as 250, 2.0, and 4.7 mg/L, respectively. However, in the 1990s, fish returned to upper Swatara Creek, coinciding with the implementation of AMD treatment (limestone drains, limestone diversion wells, limestone sand, constructed wetlands) in the watershed. During 1996–2006, as many as 25 species of fish were identified in the reach downstream from the mined area, with base-flow pH from 5.8 to 7.6 and concentrations of sulfate, dissolved iron, and dissolved aluminum as high as 120, 1.2, and 0.43 mg/L, respectively. Several of the fish taxa are intolerant of pollution and low pH, such as river chub (Nocomis icropogon) and longnose dace (Rhinichthys cataractae). Cold-water species such as brook trout (Salvelinus fontinalis) and warm-water species such as rock bass (Ambloplites rupestris) varied in predominance depending on stream flow and stream temperature. Storm flow data for 1996–2007 indicated pH, alkalinity, and sulfate concentrations decreased as the stream flow and associated storm-runoff component increased, whereas iron and other metal concentrations were poorly correlated with stream flow because of hysteresis effects (greater metal concentrations during rising stage than falling stage). Prior to 1999, pH\\5.0 was recorded during several storm events; however, since the implementation of AMD treatments, pH has been maintained near neutral. Flow-adjusted trends for1997–2006 indicated significant increases in calcium; decreases in hydrogen ion, dissolved aluminum, dissolved and total manganese, and total iron; and no change in sulfate or dissolved iron in Swatara Creek immediately downstream from the

  15. Unison as a Self-Stabilizing Wave Stream Algorithm in Asynchronous Anonymous Networks

    Boulinier, Christian


    How to pass from local to global scales in anonymous networks? How to organize a selfstabilizing propagation of information with feedback. From the Angluin impossibility results, we cannot elect a leader in a general anonymous network. Thus, it is impossible to build a rooted spanning tree. Many problems can only be solved by probabilistic methods. In this paper we show how to use Unison to design a self-stabilizing barrier synchronization in an anonymous network. We show that the commuication structure of this barrier synchronization designs a self-stabilizing wave-stream, or pipelining wave, in anonymous networks. We introduce two variants of Wave: the strong waves and the wavelets. A strong wave can be used to solve the idempotent r-operator parametrized computation problem. A wavelet deals with k-distance computation. We show how to use Unison to design a self-stabilizing wave stream, a self-stabilizing strong wave stream and a self-stabilizing wavelet stream.

  16. Influence of the contaminated wastes/soils on the geochemical characteristics of the Bodelhão stream waters and sediments from Panasqueira mine area, Portugal

    Abreu, Maria Manuela; Godinho, Berta; Magalhães, Maria Clara F.; Anjos, Carla; Santos, Erika


    Panasqueira is a famous Portuguese tin-tungsten mine operating more or less continuously since the end of the nineteenth century. This mine is located in the Central Iberian Zone, northwest of Castelo Branco, about 35 km from Fundão, being the greatest producer of tungsten in Europe. Panasqueira mine also produces copper and tin. The ore exploitation has caused huge local visual and chemical impact from the large waste tailings, together with water drainage from mine galleries, seepage and effluents from water plant treatment. The objective of this work was to evaluate the influence of the contaminated wastes and soils on the water and sediments characteristics of the Bodelhão stream. This stream crosses the mine area at the bottom of the main tailings, receiving sediments, seepage and drainage waters from wastes and/or soils developed on the waste materials which cover the host rocks (schists), and also from the water treatment plant. Waste materials contain different levels of hazardous chemical elements depending on their age and degree of weathering (mg/kg - As: 466-632; Cd: 2.6-4.2; Cu: 264-457; Zn: 340-456; W: 40-1310). Soils developed on old wastes (60-80 years old) are mainly silty loam, acidic (except one soil (pH 8.2) developed on waste materials covered by leakage mud from a pipe conducting effluent to a pond), with relatively high concentration of organic carbon (median 48.6 g/kg). The majority of soils are heavily contaminated in As (158-7790 mg/kg), Cd (0.6-138 mg/kg), Cu (51-4081 mg/kg), W (19-1450 mg/kg), and Zn (142-12300 mg/kg). The fraction of these elements extracted with DTPA solution, relatively to total concentration, varies from low to As (bank sediments (g/kg, As: 5.56-44.0; Cu: 1.99- >10; Zn: 1.29-14.1; S: 7.2-66.9; W: 1.04-6.32, and Cd: 11.4-138 mg/kg) when compared with the same elements in soils, indicate high dispersion of the chemical elements through waters both in solution and particulate material. Bed and river banks are

  17. Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION(TM) sequencing.

    Cao, Minh Duc; Ganesamoorthy, Devika; Elliott, Alysha G; Zhang, Huihui; Cooper, Matthew A; Coin, Lachlan J M


    The recently introduced Oxford Nanopore MinION platform generates DNA sequence data in real-time. This has great potential to shorten the sample-to-results time and is likely to have benefits such as rapid diagnosis of bacterial infection and identification of drug resistance. However, there are few tools available for streaming analysis of real-time sequencing data. Here, we present a framework for streaming analysis of MinION real-time sequence data, together with probabilistic streaming algorithms for species typing, strain typing and antibiotic resistance profile identification. Using four culture isolate samples, as well as a mixed-species sample, we demonstrate that bacterial species and strain information can be obtained within 30 min of sequencing and using about 500 reads, initial drug-resistance profiles within two hours, and complete resistance profiles within 10 h. While strain identification with multi-locus sequence typing required more than 15x coverage to generate confident assignments, our novel gene-presence typing could detect the presence of a known strain with 0.5x coverage. We also show that our pipeline can process over 100 times more data than the current throughput of the MinION on a desktop computer.

  18. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    Rajendran, P.; M.Madheswaran


    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper....

  19. A Tracking Algorithm Based on Rank Two Modifications for Canonical Correlation Analysis of Multidimensional Data Streams%基于秩2更新的多维数据流典型相关跟踪算法

    杨静; 李文平; 张健沛


    现存的多维数据流典型相关分析(Canonical Correlation Analysis,简称CCA)算法主要是基于近似技术的求解方法,本质上并不是持续更新的精确算法.为了能在时变的环境中持续、快速而精确地跟踪数据流之间的相关性,本文提出一种多维数据流典型相关跟踪算法TCCA.该算法基于秩2更新理论,通过并行方式持续更新样本协方差矩阵的特征子空间,进而实现多维数据流典型相关的快速跟踪.理论分析及仿真实验结果表明,TCCA具有较好的稳定性、较高的计算效率和精度,可以作为基本工具应用于数据流相关性检测、特征融合、数据降维等数据流挖掘领域.%Existing algorithms for canonical correlation analysis(CCA) of multidimensional data streams are mostly based on approximate techniques,but are not the precise algorithms for updates in essence. In this study,a novel canonical correlation analysis algorithm, called TCCA( Tracking CCA) ,is proposed for tracking the correlations rapidly and accurately between two multidimensional data streams in the time-varying environments. By introducing the technique of rank two modifications to update the eigen-subspace of the sample covariance matrix in parallel,TCCA can rapidly track the correlations of data streams. Theoretical analysis and experimental results indicate that the TCCA algorithm has better stability, high computational efficiency and accuracy. It could be presented as a basic tool for correlation detection on data streams, feature fusion, dimension reduction and other areas of data streams mining.

  20. Research of Improved FP-Growth Algorithm in Association Rules Mining

    Yi Zeng


    with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.

  1. Applicability of data mining algorithms in the identification of beach features/patterns on high-resolution satellite data

    Teodoro, Ana C.


    The available beach classification algorithms and sediment budget models are mainly based on in situ parameters, usually unavailable for several coastal areas. A morphological analysis using remotely sensed data is a valid alternative. This study focuses on the application of data mining techniques, particularly decision trees (DTs) and artificial neural networks (ANNs) to an IKONOS-2 image in order to identify beach features/patterns in a stretch of the northwest coast of Portugal. Based on knowledge of the coastal features, five classes were defined. In the identification of beach features/patterns, the ANN algorithm presented an overall accuracy of 98.6% and a kappa coefficient of 0.97. The best DTs algorithm (with pruning) presents an overall accuracy of 98.2% and a kappa coefficient of 0.97. The results obtained through the ANN and DTs were in agreement. However, the ANN presented a classification more sensitive to rip currents. The use of ANNs and DTs for beach classification from remotely sensed data resulted in an increased classification accuracy when compared with traditional classification methods. The association of remotely sensed high-spatial resolution data and data mining algorithms is an effective methodology with which to identify beach features/patterns.

  2. Flood risk zoning using a rule mining based on ant colony algorithm

    Lai, Chengguang; Shao, Quanxi; Chen, Xiaohong; Wang, Zhaoli; Zhou, Xiaowen; Yang, Bing; Zhang, Lilan


    Risk assessment is a preliminary step in flood management and mitigation, and risk zoning provides a quantitative measure of flood risk. The difficulty in flood risk zoning is to deal with the complicated non-linear relationship among indices and risk levels. To solve this problem, the ant colony algorithm based on rule mining (Ant-Miner) is promoted in this paper to map the regional flood risk at grid scale. For the case study in the Dongjiang River Basin in Southern China, 11 and 14 indices (without and with the socio-economic indices considered) are respectively chosen to construct the zoning model based on Ant-Miner. The results show that Ant-Miner exhibits higher accuracy and more simple rules that can be used to generate flood risk zoning map quickly and easily than decision tree method (DT); compared to random forest (RF) and fuzzy comprehensive evaluation (FCE), Ant-Miner has significant advantages both in implementation step-reducing and computing time-saving. Although the comprehensive measure and natural hazard measure of flood risk distributed similarly over the entire region, the former one which considered the socio-economic indices is more reasonable in term of real impact to natural and socio-economy. The areas with high-risk level obtained in this paper matched well with the integrated risk zoning map and the inundation areas of historical floods, suggesting that the proposed Ant-Miner method is capable of zoning the flood risk at grid scale. This study shows the potential to provide a novel and successful approach to flood risk zoning. Evaluation results provide a reference for flood risk management, prevention, and reduction of natural disasters in the study basin.

  3. 浅谈数据挖掘算法研究与实现%Talking about Data Mining Algorithm Research and Implementation



      该文基于交互式、多层次挖掘、复杂数据类型——时间序列相似挖掘,集成化挖掘,从数据挖掘平台的构建以及行业应用的角度,对数据挖掘中的相关算法进行研究,并且在此基础上,探讨了数据挖掘算法在实际应用中应该如何实现。%  This paper is based on interactive, multi-level mining, complex data types - time series similar mining, integrated min⁃ing, data mining platform construction and industry application point of view, research on data mining algorithm, and in on this basis, to explore how to achieve data mining algorithms in practical applications.

  4. Data on stream-water and bed-sediment quality in the vicinity of Leviathan Mine, Alpine County, California, and Douglas County, Nevada, September 1998

    Thomas, Karen A.; Lico, Michael S.


    The U.S. Geological Survey (USGS) con- ducted a chemical assessment of streams in the Leviathan Mine and adjacent areas in September 1998. On-site measurements of streamflow, pH, dissolved oxygen, temperature, specific conductance, and at most sites alkalinity, bicarbonate, and carbonate were made at 14 sites. Water samples were collected for chemical analyses of nutrients, major ions, trace elements, and organic carbon. Bed-sediment samples of fine-grained sediment in representative depositional areas at each sampling location were collected for chemical analyses of major and trace elements, total carbon, inorganic carbon, and organic carbon.

  5. Sources and fates of heavy metals in a mining-impacted stream: temporal variability and the role of iron oxides.

    Schaider, Laurel A; Senn, David B; Estes, Emily R; Brabander, Daniel J; Shine, James P


    Heavy metal contamination of surface waters at mining sites often involves complex interactions of multiple sources and varying biogeochemical conditions. We compared surface and subsurface metal loading from mine waste pile runoff and mine drainage discharge and characterized the influence of iron oxides on metal fate along a 0.9-km stretch of Tar Creek (Oklahoma, USA), which drains an abandoned Zn/Pb mining area. The importance of each source varied by metal; mine waste pile runoff contributed 70% of Cd, while mine drainage contributed 90% of Pb, and both sources contributed similarly to Zn loading. Subsurface inputs accounted for 40% of flow and 40-70% of metal loading along this stretch. Streambed iron oxide aggregate material contained highly elevated Zn (up to 27,000 μg g(-1)), Pb (up to 550 μg g(-1)) and Cd (up to 200 μg g(-1)) and was characterized as a heterogeneous mixture of iron oxides, fine-grain mine waste, and organic material. Sequential extractions confirmed preferential sequestration of Pb by iron oxides, as well as substantial concentrations of Zn and Cd in iron oxide fractions, with additional accumulation of Zn, Pb, and Cd during downstream transport. Comparisons with historical data show that while metal concentrations in mine drainage have decreased by more than an order of magnitude in recent decades, the chemical composition of mine waste pile runoff has remained relatively constant, indicating less attenuation and increased relative importance of pile runoff. These results highlight the importance of monitoring temporal changes at contaminated sites associated with evolving speciation and simultaneously addressing surface and subsurface contamination from both mine waste piles and mine drainage. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Sources and fates of heavy metals in a mining-impacted stream: Temporal variability and the role of iron oxides

    Schaider, Laurel A.; Senn, David B.; Estes, Emily R.; Brabander, Daniel J.; Shine, James P.


    Heavy metal contamination of surface waters at mining sites often involves complex interactions of multiple sources and varying biogeochemical conditions. We compared surface and subsurface metal loading from mine waste pile runoff and mine drainage discharge and characterized the influence of iron oxides on metal fate along a 0.9-km stretch of Tar Creek (Oklahoma, USA), which drains an abandoned Zn/Pb mining area. The importance of each source varied by metal: mine waste pile runoff contributed 70% of Cd, while mine drainage contributed 90% of Pb, and both sources contributed similarly to Zn loading. Subsurface inputs accounted for 40% of flow and 40-70% of metal loading along this stretch. Streambed iron oxide aggregate material contained highly elevated Zn (up to 27,000 μg g−1), Pb (up to 550 μg g−1) and Cd (up to 200 μg g−1) and was characterized as a heterogeneous mixture of iron oxides, fine-grain mine waste, and organic material. Sequential extractions confirmed preferential sequestration of Pb by iron oxides, as well as substantial concentrations of Zn and Cd in iron oxide fractions, with additional accumulation of Zn, Pb, and Cd during downstream transport. Comparisons with historical data show that while metal concentrations in mine drainage have decreased by more than an order of magnitude in recent decades, the chemical composition of mine waste pile runoff has remained relatively constant, indicating less attenuation and increased relative importance of pile runoff. These results highlight the importance of monitoring temporal changes at contaminated sites associated with evolving speciation and simultaneously addressing surface and subsurface contamination from both mine waste piles and mine drainage. PMID:24867708

  7. Hydrogeochemical and mineralogical characteristics related to heavy metal attenuation in a stream polluted by acid mine drainage:A case study in Dabaoshan Mine, China

    Huarong Zhao; Beicheng Xia; Jianqiao Qin; Jiaying Zhang


    Dabaoshan Mine,the largest mine in south China,has been developed since the 1970s.Acid mine drainage (AMD) discharged from the mine has caused severe environmental pollution and human health problems.In this article,chemical characteristics,mineralogy of ocher precipitations and heavy metal attenuation in the AMD are discussed based on physicochemical analysis,mineral analysis,sequential extraction experiments and hydrogeochemistry.The AMD chemical characteristics were determined from the initial water composition,water-rock interactions and dissolved sulfide minerals in the mine tailings.The waters,affected and unaffected by AMD,were Ca-SO4 and Ca-HCO3 types,respectively.The affected water had a low pH,high SO42- and high heavy metal content and oxidation as determined by the Fe2+/Fe3+ couple.Heavy metal and SO42- contents of Hengshi River water decreased,while pH increased,downstream.Schwertmannite was the major mineral at the waste dump,while goethite and quartz were dominant at the tailings dam and streambed.Schwertmannite was transformed into goethite at the tailings dam and streambed.The sulfate ions of the secondary minerals changed from bidentate- to monodentate-complexes downstream.Fe-Mn oxide phases of Zn,Cd and Pb in sediments increased downstream.However,organic matter complexes of Cu in sediments increased further away from the tailings.Fe3+ mineral precipitates and transformations controlled the AMD water chemistry.

  8. Improved algorithm for mining weighted frequent itemsets.%一种挖掘加权频繁项集的改进算法

    李彦伟; 戴月明; 王金鑫


    分析了New-Apriori和MWFI(Mining-Weighted Frequent Itemsets)算法之不足,提出了一种挖掘加权频繁项集的New-MWFI算法.该算法按属性的权值对事务进行分类,并依次求出每个类别内的加权频繁项集.由于每个类别内的频繁项集满足Apriori性质,因而可以利用Apriori算法或其他改进算法进行挖掘,从而克服了原来算法的不合理和效率低下的缺陷.实验表明该算法能更有效地从数据集中挖掘出加权频繁项集.%The shortages of the New-Apriori and Mining Weighted Frequent Itemsets (MWFI) are analyzed, and the New-MWFI algorithm for mining weighted frequent itemsets is proposed. In this algorithm the transactions are classified according to the item's weight and the weighted frequent itemsets are mined within each category in turn. Since the frequent itemsets of each category satisfy the Apriori's property, the Apriori algorithm or other improved algorithms can be used,thus the deficiencies of the original algorithms can be overcome successfully. Experiments show that the new algorithm is more effective in mining the weighted frequent itemsets from the dataset.

  9. Analysis of Medical Domain Using CMARM: Confabulation Mapreduce Association Rule Mining Algorithm for Frequent and Rare Itemsets

    Dr. Jyoti Gautam


    Full Text Available In Human Life span, disease is a major cause of illness and death in the modern society. There are various factors that are responsible for diseases like work environment, living and working conditions, agriculture and food production, housing, unemployment, individual life style etc. The early diagnosis of any disease that frequently and rarely occurs with the growing age can be helpful in curing the disease completely or to some extent. The long-term prognosis of patient records might be useful to find out the causes that are responsible for particular diseases. Therefore, human being can take early preventive measures to minimize the risk of diseases that may supervene with the growing age and hence increase the life expectancy chances. In this paper, a new CMARM: Confabulation-MapReduce based association rule mining algorithm is proposed for the analysis of medical data repository for both rare and frequent itemsets using an iterative MapReduce based framework inspired by cogency. Cogency is the probability of the assumed facts being true if the conclusion is true, means it is based on pairwise item conditional probability, so the proposed algorithm mine association rules by only one pass through the file. The proposed algorithm is also valuable for dealing with infrequent items due to its cogency inspired approach.

  10. Algorithms for Regular Tree Grammar Network Search and Their Application to Mining Human-viral Infection Patterns.

    Smoly, Ilan; Carmel, Amir; Shemer-Avni, Yonat; Yeger-Lotem, Esti; Ziv-Ukelson, Michal


    Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.

  11. A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams.

    Mohamad, Saad; Bouchachia, Abdelhamid; Sayed-Mouchaweh, Moamar


    Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.

  12. L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

    Xu-Bin Deng; Yang-Yong Zhu


    In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.

  13. 基于团的频繁项集挖掘算法%Frequent itemsets mining algorithm based on clique

    吴六爱; 刘应东


    高效地找出所有的频繁项集是关联规则挖掘中的核心问题.通过对已有的基于矩阵的频繁项集挖掘算法的研究,提出一种基于团的频繁项集快速生成算法.该算法采用关联图存储频繁两项集信息,找关联图中团,逐步减少团中项来搜索所有最大频繁项集,并且其扫描数据库仅需一次.通过使用标准数据集进行验证测试并与其他算法进行比较,实验结果表明,该算法具有较快的挖掘速度.%Finding out all frequent itemsets effectively is the core problem of association rule mining. A fast algorithm of finding frequent itemsets based on clique is presented by researching the algorithm of mining frequent itemsets ( MFI) based on matrix. The algorithm uses association graph to save the information of frequent 2 - itemsets, finds clique of the association graph, searches all maximum frequent itemsets by reducing the items of clique gradually, and needs times of scanning database only once. The efficiency of the presented alogrithm is demonstrated with standard datasets and compared with other algorithms, the experimental results show the algorithm has a faster speed of finding frequent itemsets.

  14. Building Real-Time Network Intrusion Detection System Based on Parallel Time-Series Mining Techniques

    Zhao Feng; Li Qinghua


    A new real-time model based on parallel time-series mining is proposed to improve the accuracy and efficiency of the network intrusion detection systems. In this model, multidimensional dataset is constructed to describe network events, and sliding window updating algorithm is used to maintain network stream. Moreover, parallel frequent patterns and frequent episodes mining algorithms are applied to implement parallel time-series mining engineer which can intelligently generate rules to distinguish intrusions from normal activities. Analysis and study on the basis of DAWNING 3000 indicate that this parallel time-series mining-based model provides a more accurate and efficient way to building real-time NIDS.

  15. An Undersea Mining Microseism Source Location Algorithm Considering Wave Velocity Probability Distribution


    The traditional mine microseism locating methods are mainly based on the assumption that the wave velocity is uniform through the space, which leads to some errors for the assumption goes against the laws of nature. In this paper, the wave velocity is regarded as a random variable, and the probability distribution information of the wave velocity is fused into the traditional locating method. This paper puts forwards the microseism source location method for the undersea mining on condition o...

  16. An approach to quantify sources, seasonal change, and biogeochemical processes affecting metal loading in streams: Facilitating decisions for remediation of mine drainage

    Kimball, B.A.; Runkel, R.L.; Walton-Day, K.


    Historical mining has left complex problems in catchments throughout the world. Land managers are faced with making cost-effective plans to remediate mine influences. Remediation plans are facilitated by spatial mass-loading profiles that indicate the locations of metal mass-loading, seasonal changes, and the extent of biogeochemical processes. Field-scale experiments during both low- and high-flow conditions and time-series data over diel cycles illustrate how this can be accomplished. A low-flow experiment provided spatially detailed loading profiles to indicate where loading occurred. For example, SO42 - was principally derived from sources upstream from the study reach, but three principal locations also were important for SO42 - loading within the reach. During high-flow conditions, Lagrangian sampling provided data to interpret seasonal changes and indicated locations where snowmelt runoff flushed metals to the stream. Comparison of metal concentrations between the low- and high-flow experiments indicated substantial increases in metal loading at high flow, but little change in metal concentrations, showing that toxicity at the most downstream sampling site was not substantially greater during snowmelt runoff. During high-flow conditions, a detailed temporal sampling at fixed sites indicated that Zn concentration more than doubled during the diel cycle. Monitoring programs must account for diel variation to provide meaningful results. Mass-loading studies during different flow conditions and detailed time-series over diel cycles provide useful scientific support for stream management decisions.

  17. The Influence of Surface Coal Mining on Runoff Processes and Stream Chemistry in the Elk Valley, British Colubmbia, Canada

    Carey, S. K.; Wellen, C. C.; Shatilla, N. J.


    Surface mining is a common method of accessing coal. In high-elevation environments, vegetation and soils are typically removed prior to the blasting of overburden rock, thereby allowing access to mineable ore. Following this, the removed overburden rock is deposited in adjacent valleys as waste rock spoils. Previous research has identified that areas downstream of surface coal mining have impaired water quality, yet there is limited information about the interaction of hydrology and geochemistry across a range of mining conditions, particularly at the headwater scale. Here, we provide an analysis of an extensive long-term data set of geochemistry and flows across a gradient of coal mining in the Elk Valley, British Columbia, Canada. This work is part of a broader R&D program examining the influence of surface coal mining on hydrological and water quality responses in the Elk Valley aimed at informing effective management responses. Results indicate that water from waste rock piles has an ionic profile distinct from unimpacted catchments. While the concentration of geochemicals increased with the degree of mine impact, the control of hydrological transport capacity over geochemical export did not vary with degree of mine impact. Geochemical export in mine-influenced catchments was limited more strongly by transport capacity than supply, implying that more water moving through the waste rock mobilized more geochemicals. Placement of waste rock within the catchment (headwaters or outlet) did not affect chemical concentrations but did alter the timing with which chemically distinct water mixed. This work advances on results reported earlier using empirical models of selenium loading and further highlights the importance of limiting water inputs into waste rock piles.

  18. Evolution of Microbial “Streamer” Growths in an Acidic, Metal-Contaminated Stream Draining an Abandoned Underground Copper Mine

    D. Barrie Johnson; Hallberg, Kevin B.; Laura Rocchetti; Kris Coupland; Rowe, Owen F.; Catherine M. Kay


    A nine year study was carried out on the evolution of macroscopic “acid streamer” growths in acidic, metal-rich mine water from the point of construction of a new channel to drain an abandoned underground copper mine. The new channel became rapidly colonized by acidophilic bacteria: two species of autotrophic iron-oxidizers (Acidithiobacillus ferrivorans and “Ferrovum myxofaciens”) and a heterotrophic iron-oxidizer (a novel genus/species with the propos...

  19. Modeling and clustering users with evolving profiles in usage streams

    Zhang, Chongsheng


    Today, there is an increasing need of data stream mining technology to discover important patterns on the fly. Existing data stream models and algorithms commonly assume that users\\' records or profiles in data streams will not be updated or revised once they arrive. Nevertheless, in various applications such asWeb usage, the records/profiles of the users can evolve along time. This kind of streaming data evolves in two forms, the streaming of tuples or transactions as in the case of traditional data streams, and more importantly, the evolving of user records/profiles inside the streams. Such data streams bring difficulties on modeling and clustering for exploring users\\' behaviors. In this paper, we propose three models to summarize this kind of data streams, which are the batch model, the Evolving Objects (EO) model and the Dynamic Data Stream (DDS) model. Through creating, updating and deleting user profiles, these models summarize the behaviors of each user as a profile object. Based upon these models, clustering algorithms are employed to discover interesting user groups from the profile objects. We have evaluated all the proposed models on a large real-world data set, showing that the DDS model summarizes the data streams with evolving tuples more efficiently and effectively, and provides better basis for clustering users than the other two models. © 2012 IEEE.

  20. Estimate of heavy metals in soil and streams using combined geochemistry and field spectroscopy in Wan-sheng mining area, Chongqing, China

    Song, Lian; Jian, Ji; Tan, De-Jun; Xie, Hong-Bing; Luo, Zhen-Fu; Gao, Bo


    Heavy metals contaminated soils and water will become a major environmental issue in the mining areas. This paper intends to use field hyper-spectra to estimate the heavy metals in the soil and water in Wan-sheng mining area in Chongqing. With analyzing the spectra of soil and water, the spectral features deriving from the spectral of the soils and water can be found to build the models between these features and the contents of Al, Cu and Cr in the soil and water by using the Stepwise Multiple Linear Regression (SMLR). The spectral features of Al are: 480 nm, 500 nm, 565 nm, 610 nm, 680 nm, 750 nm, 1000 nm, 1430 nm, 1755 nm, 1887 nm, 1920 nm, 1950 nm, 2210 nm, 2260 nm; The spectral features of Cu are: 480 nm, 500 nm, 610 nm, 750 nm, 860 nm, 1300 nm, 1430 nm, 1920 nm, 2150 nm, 2260 nm; And the spectral features of Cr are: 480 nm, 500 nm, 610 nm, 715 nm, 750 nm, 860 nm, 1300 nm, 1430 nm, 1755 nm, 1920 nm, 1950 nm. With these features, the best models to estimate the heavy metals in the study area were built according to the maximal R2. The R2 of the models of estimating Al, Cu and Cr in the soil and water are 0.813, 0.638, 0.604 and 0.742, 0.584, 0.513 respectively. And the gradient maps of these three types of heavy metals' concentrations can be created by using the Inverse distance weighted (IDW).The gradient maps indicate that the heavy metals in the soil have similar patterns, but in the North-west of the streams in the study area, the contents are of great differences. These results show that it is feasible to predict contaminated heavy metals in the soils and streams due to mining activities by using the rapid and cost-effective field spectroscopy.

  1. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  2. Optimization of air quantity regulation in mine ventilation networks using the improved differential evolution algorithm and critical path method

    Chen Kaiyan; Si Junhong; Zhou Fubao; Zhang Renwei; Shao He; Zhao Hongmei


    In mine ventilation networks, the reasonable airflow distribution is very important for the production safety and economy. Three basic problems of the natural, full-controlled and semi-controlled splitting were reviewed in the paper. Aiming at the high difficulty semi-controlled splitting problem, the general nonlinear multi-objectives optimization mathematical model with constraints was established based on the theory of mine ventilation networks. A new algorithm, which combined the improved differential evaluation and the critical path method (CPM) based on the multivariable separate solution strategy, was put forward to search for the global optimal solution more efficiently. In each step of evolution, the feasible solutions of air quantity distribution are firstly produced by the improved differential evolu-tion algorithm, and then the optimal solutions of regulator pressure drop are obtained by the CPM. Through finite steps iterations, the optimal solution can be given. In this new algorithm, the population of feasible solutions were sorted and grouped for enhancing the global search ability and the individuals in general group were randomly initialized for keeping diversity. Meanwhile, the individual neighbor-hood in the fine group which may be closely to the optimal solutions were searched locally and slightly for achieving a balance between global searching and local searching, thus improving the convergence rate. The computer program was developed based on this method. Finally, the two ventilation networks with single-fan and multi-fans were solved. The results show that this algorithm has advantages of high effectiveness, fast convergence, good robustness and flexibility. This computer program could be used to solve large-scale generalized ventilation networks optimization problem in the future.

  3. An IPSO-SVM algorithm for security state prediction of mine production logistics system

    Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang


    A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.

  4. Improved SpiderMine Algorithm Based on Cloud Computing in Big Graph Mining%大图挖掘中一种基于云计算的改进SpiderMine算法

    刘莹; 杜奕智; 邹乐


    现有的图挖掘算法在云环境下难以有效地进行大规模图形的高频模式挖掘。为此,对SpiderMine算法做了改进,提出一种基于云的SpiderMine算法(c-SpiderMine)。首先,利用最小切割算法将大规模图形数据分为多个子图,使分区/融合成本最小,然后,利用 SpiderMine 进行模式挖掘,显著降低了大型模式生成时的组合复杂度。最后,采用一种模式键函数来保存模式,以保证所有模式可被成功恢复和融合。基于3种真实数据集的仿真实验结果表明,c-SpiderMine可高效挖掘云环境下的前K个大型模式,在不同数据规模和最小支持设置条件下,c-SpiderMine在内存使用和运行时间方面的性能均优于SpiderMine。%The existing graph mining algorithms in a cloud environment are difficult to carry out mining the high frequent patterns of a massive graph .To solve this problem, this paper has made the improvement to the SpiderMine algorithm, and an improved SpiderMine algorithm is proposed based on the cloud(c-SpiderMine). Firstly, one big graph data is divided into several sub graphs by minimum cut algorithm to minimize partition/merge costs. And then it exploits SpiderMine to mine the patterns, which generates large patterns with much lower combinational complexity. Finally, a pattern key (PK) function is proposed to preserve the patterns, which guarantees that all patterns can be successfully recovered and merged. The experiments are conducted with three real data sets, and the experimental results demonstrate that c-SpiderMine can efficiently mine top-k large patternsin the cloud, and performs well in memory usage and execution time with different data sizes and minimum supports than the SpiderMine.

  5. 基于GPU的视频流拼接算法研究%Research of video stream splicing algorithm based on GPU

    张燕; 赵新灿; 谭同德


    To solve the stability and real-time of video stream splicing, combined with the powerful graphics processor GPU's parallel computing capabilities, a design method of stream splicing algorithm based on GPU is presented Extracting video stream frame image, then image stitching which contains feature extraction and matching is implemented on the GPU using SIFT (scale invariant feature transform) algorithm, to realize the stable and real-time video stream splicing. The SIFT algorithm based on the GPU makes full use of the GPU's parallel processing capability, which accelerates the implementation of video streaming stitching algorithm and realizes the fast and stable video stream splicing with quite different but a public vision.%为解决视频流的稳定实时拼接,结合图形处理器GPU强大的并行计算能力,提出了一种基于GPU的视频流拼接算法.提取视频流的帧图像,利用尺度不变特征变换(scale invariant feature transform,SIFT)算法在GPU上实现帧图像的特征提取与匹配,实现图像拼接,进而实现视频流的稳定实时拼接.基于GPU的SIFT算法充分利用了GPU的并行处理能力,加快了视频流拼接算法执行的速度,真正意义上实现了几个差异较大但具有公共视野的视频流快速稳定的拼接.

  6. Data Mining Learning Models and Algorithms on a Scada System Data Repository

    Mircea Rîşteiu


    Full Text Available This paper presents three data mining techniques applied
    on a SCADA system data repository: Naijve Bayes, k-Nearest Neighbor and Decision Trees. A conclusion that k-Nearest Neighbor is a suitable method to classify the large amount of data considered is made finally according to the mining result and its reasonable explanation. The experiments are built on the training data set and evaluated using the new test set with machine learning tool WEKA.

  7. Potential utility of data-mining algorithms for early detection of potentially fatal/disabling adverse drug reactions: a retrospective evaluation.

    Hauben, Manfred; Reich, Lester


    The objective of this study was to apply 2 data-mining algorithms to a drug safety database to determine if these methods would have flagged potentially fatal/disabling adverse drug reactions that triggered black box warnings/drug withdrawals in advance of initial identification via "traditional" methods. Relevant drug-event combinations were identified from a journal publication. Data-mining algorithms using commonly cited disproportionality thresholds were then applied to the US Food and Drug Administration database. Seventy drug-event combinations were considered sufficiently specific for retrospective data mining. In a minority of instances, potential signals of disproportionate reporting were provided clearly in advance of initial identification via traditional pharmacovigilance methods. Data-mining algorithms have the potential to improve pharmacovigilance screening; however, for the majority of drug-event combinations, there was no substantial benefit of either over traditional methods. They should be considered as potential supplements to, and not substitutes for, traditional pharmacovigilance strategies. More research and experience will be needed to optimize deployment of data-mining algorithms in pharmacovigilance.

  8. An Associate Rules Mining Algorithm Based on Artificial Immune Network for SAR Image Segmentation

    Mengling Zhao; Hongwei Liu


    As a computational intelligence method, artificial immune network (AIN) algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new class...

  9. Hierarchical Approach for Online Mining--Emphasis towards Software Metrics

    Saradhi, M V Vijaya; Satish, P


    Several multi-pass algorithms have been proposed for Association Rule Mining from static repositories. However, such algorithms are incapable of online processing of transaction streams. In this paper we introduce an efficient single-pass algorithm for mining association rules, given a hierarchical classification amongest items. Processing efficiency is achieved by utilizing two optimizations, hierarchy aware counting and transaction reduction, which become possible in the context of hierarchical classification. This paper considers the problem of integrating constraints that are Boolean expression over the presence or absence of items into the association discovery algorithm. This paper present three integrated algorithms for mining association rules with item constraints and discuss their tradeoffs. It is concluded that the variation of complexity depends on the measure of DIT (Depth of Inheritance Tree) and NOC (Number of Children) in the context of Hierarchical Classification.

  10. 基于频繁链表的频繁集的挖掘算法%An Algorithm of Mining Frequent Set Based on Frequent Link

    袁鼎荣; 张师超


    The problem of mining frequent set is a key issue in data mining. In this paper, a new method of miningfrequent set based on the frequent link is proposed. The algorithm constructs alternate frequent link from the transac-tion, the alternate link is yielded by adding up the alternate frequent link which constructed by scanning the transac-tion database in proper order. The frequent link that comprises all the information is constructed with the frequentnode which is selected according requirement. Our algorithm need to scan the transaction database only once and easysupervises the change of frequent set in order to guarantee the right of association rule.

  11. Some physiochemical and heavy metal concentration in surface water streams of Tutuka in the Kenyasi mining catchment area

    Boateng, Louis [University of Education, Winneba Ghana, P. O. Box 40, Mampong (Ghana)


    This research was conducted in the Akantansu stream of Tutuka in Kenyasi in the Brong Ahafo Region of Ghana in the months of October and November 2010 and January 2011. The major objectives of the study were to measure levels of pH, BOD (biochemical oxygen demand), lead, chromium, and arsenic in the Akantansu stream of Tutuka and to find ways that the community could ensure safe water use. To achieve the objectives of the study, sampling was done over a period of three months and data was collected and analyzed into graphs and ANOVA tables. The research revealed that the levels of arsenic and BOD were high as compared to the standards of WHO and EPA. If the people of Tutuka continue to use the stream, they may experience negative health effects (e.g., nausea, vomiting, diarrhea, etc.). The level of pH, chromium and lead was acceptable as compared to the standard of WHO and EPA. (authors)

  12. Optimized adaptation algorithm for HEVC/H.265 dynamic adaptive streaming over HTTP using variable segment duration

    Irondi, Iheanyi; Wang, Qi; Grecos, Christos


    Adaptive video streaming using HTTP has become popular in recent years for commercial video delivery. The recent MPEG-DASH standard allows interoperability and adaptability between servers and clients from different vendors. The delivery of the MPD (Media Presentation Description) files in DASH and the DASH client behaviours are beyond the scope of the DASH standard. However, the different adaptation algorithms employed by the clients do affect the overall performance of the system and users' QoE (Quality of Experience), hence the need for research in this field. Moreover, standard DASH delivery is based on fixed segments of the video. However, there is no standard segment duration for DASH where various fixed segment durations have been employed by different commercial solutions and researchers with their own individual merits. Most recently, the use of variable segment duration in DASH has emerged but only a few preliminary studies without practical implementation exist. In addition, such a technique requires a DASH client to be aware of segment duration variations, and this requirement and the corresponding implications on the DASH system design have not been investigated. This paper proposes a segment-duration-aware bandwidth estimation and next-segment selection adaptation strategy for DASH. Firstly, an MPD file extension scheme to support variable segment duration is proposed and implemented in a realistic hardware testbed. The scheme is tested on a DASH client, and the tests and analysis have led to an insight on the time to download next segment and the buffer behaviour when fetching and switching between segments of different playback durations. Issues like sustained buffering when switching between segments of different durations and slow response to changing network conditions are highlighted and investigated. An enhanced adaptation algorithm is then proposed to accurately estimate the bandwidth and precisely determine the time to download the next

  13. Clustering Time Series Data Stream - A Literature Survey

    Kavitha, V


    Mining Time Series data has a tremendous growth of interest in today's world. To provide an indication various implementations are studied and summarized to identify the different problems in existing applications. Clustering time series is a trouble that has applications in an extensive assortment of fields and has recently attracted a large amount of research. Time series data are frequently large and may contain outliers. In addition, time series are a special type of data set where elements have a temporal ordering. Therefore clustering of such data stream is an important issue in the data mining process. Numerous techniques and clustering algorithms have been proposed earlier to assist clustering of time series data streams. The clustering algorithms and its effectiveness on various applications are compared to develop a new method to solve the existing problem. This paper presents a survey on various clustering algorithms available for time series datasets. Moreover, the distinctiveness and restriction ...

  14. Studies on Application of Mining Association Rules algorithm in Storage Location Configuration


    How to reduce in and out motion distance and improve work efficiency is not only the key question of logistics storage & distribution center, but also a primary factor in improving competitive power of enterprise . In view of this question, the method of using mining association rules to resolve the problem of storage location configuration was put forward in this article with the purpose of improving work efficiency.

  15. Mercury Concentrations in Fish and Sediment within Streams are Influenced by Watershed and Landscape Variables including Historical Gold Mining in the Sierra Nevada, California

    Alpers, C. N.; Yee, J. L.; Ackerman, J. T.; Orlando, J. L.; Slotton, D. G.; Marvin-DiPasquale, M. C.


    We compiled available data on total mercury (THg) and methylmercury (MeHg) concentrations in fish tissue and streambed sediment from stream sites in the Sierra Nevada, California, to assess whether spatial data, including information on historical mining, can be used to make robust predictions of fish fillet tissue THg concentrations. A total of 1,271 fish from five species collected at 103 sites during 1980-2012 were used for the modeling effort: 210 brown trout, 710 rainbow trout, 79 Sacramento pikeminnow, 93 Sacramento sucker, and 179 smallmouth bass. Sediment data were used from 73 sites, including 106 analyses of THg and 77 analyses of MeHg. The dataset included 391 fish (mostly rainbow trout) and 28 sediment samples collected explicitly for this study during 2011-12. Spatial data on historical mining included the USGS Mineral Resources Data System and publicly available maps and satellite photos showing the areas of hydraulic mine pits and other placer mines. Modeling was done using multivariate linear regression and multi-model inference using Akaike Information Criteria. Results indicate that fish THg, accounting for species and length, can be predicted using geospatial data on mining history together with other landscape characteristics including land use/land cover. A model requiring only geospatial data, with an R2 value of 0.61, predicted fish THg correctly with respect to over-or-under 0.2 μg/g wet weight (a California regulatory threshold) for 108 of 121 (89 %) size-species combinations tested. Data for THg in streambed sediment did not improve the geospatial-only model. However, data for sediment MeHg, loss on ignition (organic content), and percent of sediment less than 0.063 mm resulted in a slightly improved model, with an R2 value of 0.63. It is anticipated that these models will be useful to the State of California and others to predict areas where mercury concentrations in fish are likely to exceed regulatory criteria.

  16. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid


    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Relations of benthic macroinvertebrates to concentrations of trace elements in water, streambed sediments, and transplanted bryophytes and stream habitat conditions in nonmining and mining areas of the upper Colorado River basin, Colorado, 1995-98

    Mize, Scott V.; Deacon, Jeffrey R.


    Intensive mining activity and highly mineralized rock formations have had significant impacts on surface-water and streambed-sediment quality and aquatic life within the upper reaches of the Uncompahgre River in western Colorado. A synoptic study by the U.S. Geological Survey National Water-Quality Assessment Program was completed in the upper Uncompahgre River Basin in 1998 to better understand the relations of trace elements (with emphasis on aluminum, arsenic, copper, iron, lead, and zinc concentrations) in water, streambed sediment, and aquatic life. Water-chemistry, streambed-sediment, and benthic macroinvertebrate samples were collected during low-flow conditions between October 1995 and July 1998 at five sites on the upper Uncompahgre River, all downstream from historical mining, and at three sites in drainage basins of the Upper Colorado River where mining has not occurred. Aquatic bryophytes were transplanted to all sites for 15 days of exposure to the water column during which time field parameters were measured and chemical water-quality and benthic macroinvertebrate samples were collected. Stream habitat characteristics also were documented at each site. Certain attributes of surface-water chemistry among streams were significantly different. Concentrations of total aluminum, copper, iron, lead, and zinc in the water column and concentrations of dissolved aluminum, copper, and zinc were significantly different between nonmining and mining sites. Some sites associated with mining exceeded Colorado acute aquatic-life standards for aluminum, copper, and zinc and exceeded Colorado chronic aquatic-life standards for aluminum, copper, iron, lead, and zinc. Concentrations of copper, lead, and zinc in streambed sediments were significantly different between nonmining and mining sites. Generally, concentrations of arsenic, copper, lead, and zinc in streambed sediments at mining sites exceeded the Canadian Sediment Quality Guidelines probable effect level (PEL

  18. Some physiochemical and heavy metal concentration in surface water stream of Tutuka in the Kenyasi mining catchment area

    B.M. Tiimub


    Full Text Available The research was conducted in the Akantansu stream of Tutuka in Kenyasi in the Brong Ahafo Region of Ghana from October 2010 to January 2011. The objectives of the study were to find out the contamination levels of pH, BOD5, Lead, Chromium, and Arsenic in the Akantansu stream of Tutuka to promote public health safety of people patronizing the stream for bathing and cooking. Determination of pH was achieved using Etech instrument (PC 300 series where as BOD5 level was assessed by means of empirical standard laboratory test which determined the relative oxygen requirements of waste water, effluents and polluted water using the standard procedure as per America Public Health Association (2006. An AAS 220 atomic absorption spectrometer was used for the analyses of heavy metals (lead, chromium and arsenic. The Research revealed that, the geometric mean levels of (0.01- 0.02, 0.03 – 0.26, 0 - 0.01, 3.99 – 7.06 mg/L and 5.64 – 6.40 for Arsenic, Lead, Chromium, BOD5 and pH compared to the EPA Maximum Permissible Limits of ( 0.5, 0.1, 0.1, 50 mg/L and 6-9 were respectively within the acceptable standards. However, due to slightly higher concentration of chromium (0.26 mg/L up the stream, the people of Tutuka may develop health effects such as nausea, vomiting, diarrhea, hallucinations, headaches, depression, sleeping disorders, skin cancers, tumours in lungs, bladder, kidney and liver if they continue to use water from the stream for bathing and cooking.

  19. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework

    Guru Prasad M S


    Full Text Available The Huge amount of Big Data is constantly arriving with the rapid development of business organizations and they are interested in extracting knowledgeable information from collected data. Frequent item mining of Big Data helps with business decision and to provide high quality service. The result of traditional frequent item set mining algorithm on Big Data is not an effective way which leads to high computation time. An Apache Hadoop MapReduce is the most popular data intensive distributed computing framework for large scale data applications such as data mining. In this paper, the author identifies the factors affecting on the performance of frequent item mining algorithm based on Hadoop MapReduce technology and proposed an approach for optimizing the performance of large scale frequent item set mining. The Experiments result shows the potential of the proposed approach. Performance is significantly optimized for large scale data mining in MapReduce technique. The author believes that it has a valuable contribution in the high performance computing of Big Data

  20. A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms

    Ming Dong


    Full Text Available The primary objective of engineering asset management is to optimize assets service delivery potential and to minimize the related risks and costs over their entire life through the development and application of asset health and usage management in which the health and reliability prediction plays an important role. In real-life situations where an engineering asset operates under dynamic operational and environmental conditions, the lifetime of an engineering asset is generally described as monitored nonlinear time-series data and subject to high levels of uncertainty and unpredictability. It has been proved that application of data mining techniques is very useful for extracting relevant features which can be used as parameters for assets diagnosis and prognosis. In this paper, a tutorial on nonlinear time-series data mining in engineering asset health and reliability prediction is given. Besides that an overview on health and reliability prediction techniques for engineering assets is covered, this tutorial will focus on concepts, models, algorithms, and applications of hidden Markov models (HMMs and hidden semi-Markov models (HSMMs in engineering asset health prognosis, which are representatives of recent engineering asset health prediction techniques.

  1. Analysis Of Data Mining For Car Sales Sparepart Using Apriori Algorithm (Case Study: PT. IDK 1 FIELD

    Khairul Ummi


    Full Text Available PT. IDK 1 is one of the branch offices honda car dealership that sells various types of variants honda matic or manual car and motorcycle parts. Any sales or goods sold will be performed by inputting the database directly connected directly to the central office. But PT. IDK 1 do not know a couple items frequently purchased parts simultaneously. When the stock of spare parts which amount is low, the office is only asking them to send the stock of spare parts from the central office without knowing that the other parts if the parts were purchased then the other parts were also purchased. It was considered difficult when restocking of goods because of the many types of auto parts. Data mining techniques have been widely used to solve the existing problems with the implementation of the algorithm one A-Priori to obtain information about the association between the product of a database transaction. Sales transaction data honda car parts at PT. IDK 1 can be reprocessed using data mining applications resulting association rules is a strong link between itemset sales of spare parts so that it can provide recommendations and facilitate restocking items in the arrangement or placement of goods related to a strong interdependence.

  2. Temporal Data Mining Using Genetic Algorithm and Neural Network--A Case Study of Air Pollutant Forecasts

    Shine-Wei Lin; Chih-Hong Sun; Chin-Han Chen


    This paper integrates genetic algorithm and neural network techniques to build new temporal predicting analysis tools for geographic information system (GIS). These new GIS tools can be readily applied in a practical and appropriate manner in spatial and temporal research to patch the gaps in GIS data mining and knowledge discovery functions. The specific achievement here is the integration of related artificial intelligent technologies into GIS software to establish a conceptual spatial and temporal analysis framework. And, by using this framework to develop an artificial intelligent spatial and temporal information analyst (ASIA) system which then is fully utilized in the existing GIS package. This study of air pollutants forecasting provides a geographical practical case to prove the rationalization and justness of the conceptual temporal analysis framework.

  3. Assessing element distribution and speciation in a stream at abandoned Pb-Zn mining site by combining classical, in-situ DGT and modelling approaches.

    Omanović, Dario; Pižeta, Ivanka; Vukosav, Petra; Kovács, Elza; Frančišković-Bilinski, Stanislav; Tamás, János


    The distribution and speciation of elements along a stream subjected to neutralised acid mine drainage (NAMD) effluent waters (Mátra Mountain, Hungary; Toka stream) were studied by a multi-methodological approach: dissolved and particulate fractions of elements were determined by HR-ICPMS, whereas speciation was carried out by DGT, supported by speciation modelling performed by Visual MINTEQ. Before the NAMD discharge, the Toka is considered as a pristine stream, with averages of dissolved concentrations of elements lower than world averages. A considerable increase of element concentrations caused by effluent water inflow is followed by a sharp or gradual concentration decrease. A large difference between total and dissolved concentrations was found for Fe, Al, Pb, Cu, Zn and As in effluent water and at the first downstream site, with high correlation factors between elements in particulate fraction, indicating their common behaviour, governed by the formation of ferri(hydr)oxides (co)precipitates. In-situ speciation by the DGT technique revealed that Zn, Cd, Ni, Co, Mn and U were predominantly present as a labile, potentially bioavailable fraction (>90%). The formation of strong complexes with dissolved organic matter (DOM) resulted in a relatively low DGT-labile concentration of Cu (42%), while low DGT-labile concentrations of Fe (5%) and Pb (12%) were presumably caused by their existence in colloidal (particulate) fraction which is not accessible to DGT. Except for Fe and Pb, a very good agreement between DGT-labile concentrations and those predicted by the applied speciation model was obtained, with an average correlation factor of 0.96. This study showed that the in-situ DGT technique in combination with model-predicted speciation and classical analysis of samples could provide a reasonable set of data for the assessment of the water quality status (WQS), as well as for the more general study of overall behaviour of the elements in natural waters subjected

  4. 基于高维聚类的探索性文本挖掘算法%Exploratory text mining algorithm based on high-dimensional clustering

    张爱科; 符保龙


    建立了一种基于高维聚类的探索性文本挖掘算法,利用文本挖掘的引导作用实现数据类文本中的数据挖掘.算法只需要少量迭代,就能够从非常大的文本集中产生良好的集群;映射到其他数据与将文本记录到用户组,能进一步提高算法的结果.通过对相关数据的测试以及实验结果的分析,证实了该方法的可行性与有效性.%Because of the unstructured characteristics of free text, text mining becomes an important branch of data mining. In recent years, types of text mining algorithms emerged in large numbers. In this paper, an exploratory text mining algorithm was proposed based on high-dimensional clustering. The algorithm required only a small number of iterations to produce favorable clusters from very large text. Mapping to other recorded data and recording the text to the user group enabled the result of the algorithm be improved further. The feasibility and validity of the proposed method is verified by related data test and the analysis of experimental results.

  5. The Algorithm of Development the World Ocean Mining of the Industry During the Global Crisis

    Nyrkov, Anatoliy; Budnik, Vladislav; Sokolov, Sergei; Chernyi, Sergei


    In the article reviewed extraction effect of hydrocarbons on the general country's developing, under the impact of economical, demographical and technological factors, as well as it's future role in the world energy balance. Also adduced facts which designate offshore and deep water production of unconventional and conventional hydrocarbons including mining of marine mineral resources as perspective area of development in the future, despite all the difficulties of this sector. In the article considered the state and prospects of the Russian continental shelf, in consideration of its geographical location and its all existing problems.

  6. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    潘大胜; 屈迟文


    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  7. Efficient Data Mining Algorithms for Screening Potential Proteins of Drug Target

    Qi Wang


    Full Text Available The past few decades have witnessed the boom in pharmacology as well as the dilemma of drug development. Playing a crucial role in drug design, the screening of potential human proteins of drug targets from open access database with well-measured physical and chemical properties is a task of challenge but significance. In this paper, the screening of potential drug target proteins (DTPs from a fine collected dataset containing 5376 unlabeled proteins and 517 known DTPs was researched. Our objective is to screen potential DTPs from the 5376 proteins. Here we proposed two strategies assisting the construction of dataset of reliable nondrug target proteins (NDTPs and then bagging of decision trees method was employed in the final prediction. Such two-stage algorithms have shown their effectiveness and superior performance on the testing set. Both of the algorithms maintained higher recall ratios of DTPs, respectively, 93.5% and 97.4%. In one turn of experiments, strategy1-based bagging of decision trees algorithm screened about 558 possible DTPs while 1782 potential DTPs were predicted in the second algorithm. Besides, two strategy-based algorithms showed the consensus of the predictions in the results, with approximately 442 potential DTPs in common. These selected DTPs provide reliable choices for further verification based on biomedical experiments.

  8. Finding Recently Frequent Items over Online Data Streams

    YIN Zhi-wu; HUANG Shang-teng


    In this paper, a new algorithm HCOUNT + is proposed to find frequent items over data stream based on the HCOUNT algorithm. The new algorithm adopts aided measures to improve the precision of HCOUNT greatly. In addition,HCOUNT + is introduced to time critical applications and a novel sliding windows-based algorithm SL-HCOUNT + is proposed to mine the most frequent items occurring recently.This algorithm uses limited memory (nB · (1 +α) · e/ε·In(-M/lnρ)(α<1) counters), requires constant processing time per packet (only (1+α) · ln(-M/lnρ(α<1)) counters are updated), makes only one pass over the streaming data,and is shown to work well in the experimental results.

  9. Finite-volume versus streaming-based lattice Boltzmann algorithm for fluid-dynamics simulations: A one-to-one accuracy and performance study.

    Shrestha, Kalyan; Mompean, Gilmar; Calzavarini, Enrico


    A finite-volume (FV) discretization method for the lattice Boltzmann (LB) equation, which combines high accuracy with limited computational cost is presented. In order to assess the performance of the FV method we carry out a systematic comparison, focused on accuracy and computational performances, with the standard streaming lattice Boltzmann equation algorithm. In particular we aim at clarifying whether and in which conditions the proposed algorithm, and more generally any FV algorithm, can be taken as the method of choice in fluid-dynamics LB simulations. For this reason the comparative analysis is further extended to the case of realistic flows, in particular thermally driven flows in turbulent conditions. We report the successful simulation of high-Rayleigh number convective flow performed by a lattice Boltzmann FV-based algorithm with wall grid refinement.

  10. Data Mining at NASA: From Theory to Applications

    Srivastava, Ashok N.


    This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.

  11. A guest molecule-host cavity fitting algorithm to mine PDB for small molecule targets.

    Byrem, William C; Armstead, Stephen C; Kobayashi, Shunji; Eckenhoff, Roderic G; Eckmann, David M


    Inhaled anesthetic molecule occupancy of a protein internal cavity depends in part on the volumes of the guest molecule and the host site. Current algorithms to determine volume and surface area of cavities in proteins whose structures have been determined and cataloged make no allowance for shape or small degrees of shape adjustment to accommodate a guest. We developed an algorithm to determine spheroid dimensions matching cavity volume and surface area and applied it to screen the cavities of 6,658 nonredundant structures stored in the Protein Data Bank (PDB) for potential targets of halothane (2-bromo-2-chloro-1,1,1-trifluoroethane). Our algorithm determined sizes of prolate and oblate spheroids matching dimensions of each cavity found. If those spheroids could accommodate halothane (radius 2.91 A) as a guest, we determined the packing coefficient. 394,766 total cavities were identified. Of 58,681 cavities satisfying the fit criteria for halothane, 11,902 cavities had packing coefficients in the range of 0.46-0.64. This represents 20.3% of cavities large enough to hold halothane, 3.0% of all cavities processed, and found in 2,432 protein structures. Our algorithm incorporates shape dependence to screen guest-host relationships for potential small molecule occupancy of protein cavities. Proteins with large numbers of such cavities are more likely to be functionally altered by halothane.

  12. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew


    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  13. Research on the measurement of belt speed by video in coal mine based on improved template matching algorithm

    ZHU Ai-chun; HUA Wei; WANG Chun; WANG Yong-xing


    In order to improve the intelligence of video monitoring system of belt and make up the deficiency of higher failure rate and bad real-time performance in the traditional systems of measurement of belt speed,according to the fact that the light of coal mine is uneven,the strength of light changes greatly,the direction of belt movement is constant,and the position of camera was fixed,various algorithms of speed measurement by video were studied,and algorithm for template matching based on sum of absolute differences(SAD)and correlation coefficient was proposed and improved,besides,the tracking of feature regions was realized.Then,a camera calibration method using the invariance of the cross-ratio was adopted and the real-time measurement of belt speed by the hardware platform based on DM642 was realized.Finally,experiment results show that this method not only has advantages of high precision and strong anti-jamming capability but also can real-time reflect the changes of belt speed,so it has a comprehensive applicability.

  14. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    Amir Hossein Azadnia


    Full Text Available One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  15. An Unsupervised Opinion Mining Approach for Japanese Weblog Reputation Information Using an Improved SO-PMI Algorithm

    Wang, Guangwei; Araki, Kenji

    In this paper, we propose an improved SO-PMI (Semantic Orientation Using Pointwise Mutual Information) algorithm, for use in Japanese Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. When this algorithm was translated into Japanese naively, most phrases, whether positive or negative in meaning, received a negative SO. For dealing with this slanting phenomenon, we propose three improvements: to expand the reference words to sets of words, to introduce a balancing factor and to detect neutral expressions. In our experiments, the proposed improvements obtained a well-balanced result: both positive and negative accuracy exceeded 62%, when evaluated on 1,200 opinion sentences sampled from three different domains (reviews of Electronic Products, Cars and Travels from Kakaku. com). In a comparative experiment on the same corpus, a supervised approach (SA-Demo) achieved a very similar accuracy to our method. This shows that our proposed approach effectively adapted SO-PMI for Japanese, and it also shows the generality of SO-PMI.

  16. Landscape Characterization of Arctic Ecosystems Using Data Mining Algorithms and Large Geospatial Datasets

    Langford, Z. L.; Kumar, J.; Hoffman, F. M.


    Observations indicate that over the past several decades, landscape processes in the Arctic have been changing or intensifying. A dynamic Arctic landscape has the potential to alter ecosystems across a broad range of scales. Accurate characterization is useful to understand the properties and organization of the landscape, optimal sampling network design, measurement and process upscaling and to establish a landscape-based framework for multi-scale modeling of ecosystem processes. This study seeks to delineate the landscape at Seward Peninsula of Alaska into ecoregions using large volumes (terabytes) of high spatial resolution satellite remote-sensing data. Defining high-resolution ecoregion boundaries is difficult because many ecosystem processes in Arctic ecosystems occur at small local to regional scales, which are often resolved in by coarse resolution satellites (e.g., MODIS). We seek to use data-fusion techniques and data analytics algorithms applied to Phased Array type L-band Synthetic Aperture Radar (PALSAR), Interferometric Synthetic Aperture Radar (IFSAR), Satellite for Observation of Earth (SPOT), WorldView-2, WorldView-3, and QuickBird-2 to develop high-resolution (˜5m) ecoregion maps for multiple time periods. Traditional analysis methods and algorithms are insufficient for analyzing and synthesizing such large geospatial data sets, and those algorithms rarely scale out onto large distributed- memory parallel computer systems. We seek to develop computationally efficient algorithms and techniques using high-performance computing for characterization of Arctic landscapes. We will apply a variety of data analytics algorithms, such as cluster analysis, complex object-based image analysis (COBIA), and neural networks. We also propose to use representativeness analysis within the Seward Peninsula domain to determine optimal sampling locations for fine-scale measurements. This methodology should provide an initial framework for analyzing dynamic landscape

  17. Evolution of Microbial “Streamer” Growths in an Acidic, Metal-Contaminated Stream Draining an Abandoned Underground Copper Mine

    D. Barrie Johnson


    Full Text Available A nine year study was carried out on the evolution of macroscopic “acid streamer” growths in acidic, metal-rich mine water from the point of construction of a new channel to drain an abandoned underground copper mine. The new channel became rapidly colonized by acidophilic bacteria: two species of autotrophic iron-oxidizers (Acidithiobacillus ferrivorans and “Ferrovum myxofaciens” and a heterotrophic iron-oxidizer (a novel genus/species with the proposed name “Acidithrix ferrooxidans”. The same bacteria dominated the acid streamer communities for the entire nine year period, with the autotrophic species accounting for ~80% of the micro-organisms in the streamer growths (as determined by terminal restriction enzyme fragment length polymorphism (T-RFLP analysis. Biodiversity of the acid streamers became somewhat greater in time, and included species of heterotrophic acidophiles that reduce ferric iron (Acidiphilium, Acidobacterium, Acidocella and gammaproteobacterium WJ2 and other autotrophic iron-oxidizers (Acidithiobacillus ferrooxidans and Leptospirillum ferrooxidans. The diversity of archaea in the acid streamers was far more limited; relatively few clones were obtained, all of which were very distantly related to known species of euryarchaeotes. Some differences were apparent between the acid streamer community and planktonic-phase bacteria. This study has provided unique insights into the evolution of an extremophilic microbial community, and identified several novel species of acidophilic prokaryotes.

  18. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Schomburg Dietmar


    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at The source code of the algorithm is provided under the GNU General Public

  19. Mining Algorithm of Normalized Weighted Association Rules in Database%数据库中标准加权关联规则挖掘算法

    杜鹢; 藏海霞


    在原有的关联规则挖掘算法的研究中,认为所有的属性的重要程度相同,提出标准加权关联规则的挖掘算法,能够解决因属性重要程度不一样带来的问题。%Previous algorithms on mining association rules maintain that theimportance of each item in database is equal. This paper presents a method of mining weighted association rules in database, which can solve the problems caused by the unequal importance of the items.

  20. 面向子流的低延迟数据调度算法%A Novel Sub-Stream-Oriented Low-Delay Scheduling Algorithm

    吴国福; 窦强; 吴吉庆; 窦文华


    Peer-to-Peer streaming is an effectual and promising way to distribute media content. In this paper, we present a novel sub-stream-oriented low-delay scheduling strategy under the push-pull hybrid framework. First the sub-stream scheduling problem is transformed into the matching problem of the weighted bipartite graph. Then the well-known Hungarian Algorithm is ameliorated, and a minimum delay, maximum matching algorithm is presented. Not only maximum matching is reserved by the new improved algorithm, but also the transmitting delay of each sub-stream is as low as possible. The simulation results show that our method can greatly reduce the transmission delay.%P2P流媒体是分发流媒体数据的高效方式,而数据传输延迟是决定P2P流媒体系统性能的重要参数.在分析“拉”模式数据调度模式传输延迟的基础上,本文在“推”、“拉”混合的调度模式下提出一种新的面向子流的低延迟数据调度算法.首先子流的调度问题被转换成等价的带权二部图匹配问题,其次针对转换后的二部图改进匈牙利算法,提出最小延迟、最大匹配的启发式匹配算法.该算法在保证最大匹配的同时使得每条子流的延迟尽可能地低.模拟实验表明本文的算法能够极大降低数据传输延迟.

  1. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil


    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  2. 一种基于决策表的分类规则挖掘新算法%A New Algorithm of Mining Classification Rules Based on Decision Table

    谢娟英; 冯德民


    The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car's classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.

  3. Data Mining: The Art of Automated Knowledge Extraction

    Karimabadi, H.; Sipes, T.


    Data mining algorithms are used routinely in a wide variety of fields and they are gaining adoption in sciences. The realities of real world data analysis are that (a) data has flaws, and (b) the models and assumptions that we bring to the data are inevitably flawed, and/or biased and misspecified in some way. Data mining can improve data analysis by detecting anomalies in the data, check for consistency of the user model assumptions, and decipher complex patterns and relationships that would not be possible otherwise. The common form of data collected from in situ spacecraft measurements is multi-variate time series which represents one of the most challenging problems in data mining. We have successfully developed algorithms to deal with such data and have extended the algorithms to handle streaming data. In this talk, we illustrate the utility of our algorithms through several examples including automated detection of reconnection exhausts in the solar wind and flux ropes in the magnetotail. We also show examples from successful applications of our technique to analysis of 3D kinetic simulations. With an eye to the future, we provide an overview of our upcoming plans that include collaborative data mining, expert outsourcing data mining, computer vision for image analysis, among others. Finally, we discuss the integration of data mining algorithms with web-based services such as VxOs and other Heliophysics data centers and the resulting capabilities that it would enable.

  4. Discovery of Patterns and evaluation of Clustering Algorithms in SocialNetwork Data (Face book 100 Universities through Data Mining Techniques and Methods



    Full Text Available Data mining involves the use of advanced data analysis tools to find out new, suitable patterns and projectthe relationship among the patterns which were not known prior. In data mining, association rule learningis a trendy and familiar method for ascertaining new relations between variables in large databases. Oneof the emerging research areas under Data mining is Social Networks. The objective of this paper focuseson the formulation of association rules using which decisions can be made for future Endeavour. Thisresearch applies Apriori Algorithm which is one of the classical algorithms for deriving association rules.The Algorithm is applied to Face book 100 university dataset which has originated from Adam D’Angelo ofFace book. It contains self-defined characteristics of a person including variables like residence, year, andmajor, second major, gender, school. This paper to begin with the research uses only ten Universities andhighlights the formation of association rules between the attributes or variables and explores theassociation rule between a course and gender, and discovers the influence of gender in studying a course.This paper attempts to cover the main algorithms used for clustering, with a brief and simple description ofeach.The previous research with this dataset has applied only regression models and this is the first time toapply association rules.

  5. Dropping down the Maximum Item Set: Improving the Stylometric Authorship Attribution Algorithm in the Text Mining for Authorship Investigation

    Tareef K. Mustafa


    Full Text Available Problem statement: Stylometric authorship attribution is an approach concerned about analyzing texts in text mining, e.g., novels and plays that famous authors wrote, trying to measure the authors style, by choosing some attributes that shows the author style of writing, assuming that these writers have a special way of writing that no other writer has; thus, authorship attribution is the task of identifying the author of a given text. In this study, we propose an authorship attribution algorithm, improving the accuracy of Stylometric features of different professionals so it can be discriminated nearly as well as fingerprints of different persons using authorship attributes. Approach: The main target in this study is to build an algorithm supports a decision making systems enables users to predict and choose the right author for a specific anonymous author's novel under consideration, by using a learning procedure to teach the system the Stylometric map of the author and behave as an expert opinion. The Stylometric Authorship Attribution (AA usually depends on the frequent word as the best attribute that could be used, many studies strived for other beneficiary attributes, still the frequent word is ahead of other attributes that gives better results in the researches and experiments and still the best parameter and technique that's been used till now is the counting of the bag-of-word with the maximum item set. Results: To improve the techniques of the AA, we need to use new pack of attributes with a new measurement tool, the first pack of attributes we are using in this study is the (frequent pair which means a pair of words that always appear together, this attribute clearly is not a new one, but it wasn't a successive attribute compared with the frequent word, using the maximum item set counters. the words pair made some mistakes as we see in the experiment results, improving the winnow algorithm by combining it with the computational

  6. Productivity of Stream Definitions

    Endrullis, Jörg; Grabmayer, Clemens; Hendriks, Dimitri; Isihara, Ariya; Klop, Jan


    We give an algorithm for deciding productivity of a large and natural class of recursive stream definitions. A stream definition is called ‘productive’ if it can be evaluated continuously in such a way that a uniquely determined stream is obtained as the limit. Whereas productivity is undecidable

  7. Productivity of stream definitions

    Endrullis, J.; Grabmayer, C.A.; Hendriks, D.; Isihara, A.; Klop, J.W.


    We give an algorithm for deciding productivity of a large and natural class of recursive stream definitions. A stream definition is called ‘productive’ if it can be evaluated continually in such a way that a uniquely determined stream in constructor normal form is obtained as the limit. Whereas prod

  8. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    Alfarizy, A. D.; Indahwati; Sartono, B.


    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  9. Paediatric pharmacovigilance: use of pharmacovigilance data mining algorithms for signal detection in a safety dataset of a paediatric clinical study conducted in seven African countries

    Kajungu, Dan K.; Annette Erhart; Ambrose Otau Talisuna; Quique Bassat; Corine Karema; Carolyn Nabasumba; Michael Nambozi; Halidou Tinto; Peter Kremsner; Martin Meremikwu; Umberto D'Alessandro; Niko Speybroeck


    BACKGROUND: Pharmacovigilance programmes monitor and help ensuring the safe use of medicines which is critical to the success of public health programmes. The commonest method used for discovering previously unknown safety risks is spontaneous notifications. In this study we examine the use of data mining algorithms to identify signals from adverse events reported in a phase IIIb/IV clinical trial evaluating the efficacy and safety of several Artemisinin-based combination therapies (ACTs) for...

  10. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Yu Wang


    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  11. Algorithm of Web Hot Data Mining Based on Structured Segmentation%基于半结构化分割的Web热点数据挖掘算法



    随着大数据信息技术的发展,数据在线监测和数据挖掘成为计算机信息领域研究的热点。通过对Web热点数据分割挖掘,提高信息热点追踪和Web数据分类能力。传统算法采用非结构化数据挖掘算法,无法有效对Web热点数据进行准确定位和分层挖掘。提出一种基于半结构化分割的Web热点数据挖掘算法。采用半结构化数据进行特征分割,基于优秀基因位进行差分进化,使寻优曲线不断趋于平缓,在多个节点上并行的运行比较脚本,采用半结构化分割,使得Web热点特征挖掘实现自适应寻优,得到Web热点数据的分配因子,提高了挖掘性能。仿真结果表明,该算法获得了良好的效率和精度,提高了Web热点数据挖掘的自适应寻优能力。%With the development of big data information technology, online monitoring data and data mining has become a hot research field of computer information. The segmentation of Web hot data mining, improve the classification ability of information focus and Web data. Using the traditional algorithm of unstructured data mining algorithms, it is not valid for Web hot data for accurate positioning and layered mining. The paper proposed a mining algorithm Web hot data structured based on segmentation, feature segmentation using semi structured data, excellent genes are based on differential evolution, make the optimization curve tends to be gentle, parallel on multiple nodes running script, through the code makes the un⁃structured data mapped to the data block, make the data stored in the database relational data model, to get the distribution factor Web hot data, improve the mining performance.The simulation results show that the high efficiency and accuracy, it improved adaptive Web hotspot of data mining optimization ability.

  12. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: Mine Density Active Mines and Mineral Plants in the US

    U.S. Environmental Protection Agency — This dataset represents the mine density within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on mine plants and operations...

  13. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: Mine Density Active Mines and Mineral Plants in the US

    U.S. Environmental Protection Agency — This dataset represents the mine density within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on mine plants and operations...

  14. An Improved Concept Lattice-Based Data Mining Algorithm%一种改进的基于概念格的数据挖掘算法

    李志坚; 莫建麟


    In order to solve the multidimensional data model and relational data model, query between the two-way data system, data cleansing, data conversion, distributed data accuracy and consistency control problem, this paper described the concept of grid-related , the global data mining combined with local data mining is proposed based on local information based on the concept of a global grid of data mining algorithm, and the mining process was divided into ETL action, combined with the ETL process workflow, u-sing amounts of data distributed parallel sequence mining. Experiments show that the algorithm has a good effect on enhanced data processing capability.%为解决多维数据模型与关系数据模型之间的双向数据系统查询、数据清洗、数据转换、实现集中和分发数据的准确性与一致性等问题,通过对概念格的相关研究,将全局数据挖掘与局部数据挖掘相结合,提出一种改进的基于局部信息的全局概念格的数据挖掘算法,并将挖掘过程分解为ETL (Extraction-Transformation-Loading)动作,结合ETL处理工作流,实现并行分布式海量数据的时序挖掘.实验证明,该算法对增强数据加工能力具有一定的实用性.

  15. 一种基于NFP-tree的频繁项集挖掘算法%Frequent Item Set Mining Algorithm Based on NFP-tree

    常睿; 陈志伟


    针对频繁项集挖掘时间与空间效率低的问题,提出一种基于New FP-tree的高效频繁项集挖掘算法。此算法利用New FP-tree结构存储事务数据库中的频繁项集信息,无需递归构造条件模式树,仅需两次扫描数据库即可生成所有频繁项集。最后的实验证明了该算法的有效性。%Aiming at the problem of low time and space efficiencies for frequent item set mining, an algorithm for frequent item set mining based on New FP-tree is proposed. The algorithm constructs New FP-tree to compress business database. Without recursion condition pattern tree, the algorithm needs to scan database only two times to produce all frequent item set. Lastly the algorithm is realized on experiment and is proved to be valid.

  16. Data stream sliding window clustering algorithm applied in IDS%滑动窗口数据流聚类算法在IDS中的应用

    朱琳; 朱参世


    Aimming at the traditional intrusion detection system is difficult to adapt to the increasing amount of data demand for real-time processing capability, this paper uses of sliding window and the data stream clustering technology to design a clustering algorithm based on sliding window data streams, and build the IDS network security defense model based on the algorithm. The validation of model simulation proves that the network security defense model is able to adapt to the high-speed network intrusion detection requirements.%针对传统入侵检测系统难于适应日益增长数据量对实时处理能力的需求问题,运用滑动窗口、数据流聚类技术,设计了基于滑动窗口数据流聚类算法,并构建了基于该算法的IDS网络安全防御模型。通过对该模型仿真验证,证明该网络安全防御模型能较好地适应高速网络的入侵检测需求。

  17. Monitoring, field experiments, and geochemical modeling of Fe(II) oxidation kinetics in a stream dominated by net-alkaline coal-mine drainage, Pennsylvania, USA

    Cravotta, Charles A.


    Watershed-scale monitoring, field aeration experiments, and geochemical equilibrium and kinetic modeling were conducted to evaluate interdependent changes in pH, dissolved CO2, O2, and Fe(II) concentrations that typically take place downstream of net-alkaline, circumneutral coal-mine drainage (CMD) outfalls and during aerobic treatment of such CMD. The kinetic modeling approach, using PHREEQC, accurately simulates observed variations in pH, Fe(II) oxidation, alkalinity consumption, and associated dissolved gas concentrations during transport downstream of the CMD outfalls (natural attenuation) and during 6-h batch aeration tests on the CMD using bubble diffusers (enhanced attenuation). The batch aeration experiments demonstrated that aeration promoted CO2 outgassing, thereby increasing pH and the rate of Fe(II) oxidation. The rate of Fe(II) oxidation was accurately estimated by the abiotic homogeneous oxidation rate law −d[Fe(II)]/dt = k1·[O2]·[H+]−2·[Fe(II)] that indicates an increase in pH by 1 unit at pH 5–8 and at constant dissolved O2 (DO) concentration results in a 100-fold increase in the rate of Fe(II) oxidation. Adjusting for sample temperature, a narrow range of values for the apparent homogeneous Fe(II) oxidation rate constant (k1′) of 0.5–1.7 times the reference value of k1 = 3 × 10−12 mol/L/min (for pH 5–8 and 20 °C), reported by Stumm and Morgan (1996), was indicated by the calibrated models for the 5-km stream reach below the CMD outfalls and the aerated CMD. The rates of CO2 outgassing and O2ingassing in the model were estimated with first-order asymptotic functions, whereby the driving force is the gradient of the dissolved gas concentration relative to equilibrium with the ambient atmosphere. Although the progressive increase in DO concentration to saturation could be accurately modeled as a kinetic function for the conditions evaluated, the simulation of DO as an instantaneous equilibrium process did not affect the

  18. Optimization of explicit time-stepping algorithms and Stream-Function-Coordinate (SFC) concept for fluid dynamics problems

    Huang, Chung-Yuan

    A new formulation of the stream function based on a stream function coordinate (SFC) concept for inviscid flow field calculations is presented. In addition, a new method is developed not only to accelerate, but also to stabilize the iterative schemes for steady and unsteady, linear and non-linear, scalar and system of coupled, partial differential equations. With this theory, the limitation on the time step size of an explicit scheme for solving unsteady problems and the limitation on the relaxation factors of an iterative scheme for solving steady state problems could be analytically determined. Moreover, this theory allows the determination of the optimal time steps for explicit time-stepping schemes and the optimal values of the acceleration factors for iterative schemes, if the transient behavior is immaterial.

  19. Summary on Algorithms for Mining Spatial Co-location Patterns%空间 co-location 模式挖掘算法研究综述



    Spatial co-location patterns are traditionally defined as the subsets of features whose instances are frequently located together in geographic space .It is an important research direction for spatial data mining .Firstly ,the concepts of co-location patterns are reviewed .Then ,many popular algorithms based on different data fields are described ,which highlights the processes and dominant features of different co-location algorithms .Finally ,the future work on co-location patterns min-ing algorithms are discussed .%空间co-location模式代表的是一组空间特征的子集,它们的实例在空间中频繁的关联。它是空间数据挖掘的一个重要研究方向。首先给出co-location模式的基本概念;然后描述了针对不同数据领域提出的各种算法,并重点分析了算法提出的思路及主要特点;最后对Co-location模式挖掘未来的研究方向作了探讨。

  20. Distributed Frequent Item Sets Mining over P2P Networks

    Zahra Farzanyar; Mohammadreza Kangavari


    Data intensive peer-to-peer (P2P) networks are becoming increasingly popular in applications like social networking, file sharing networks, etc. Data mining in such P2P environments is the new generation of advanced P2P applications. Unfortunately, most of the existing data mining algorithms do not fit well in such environments since they require data that can be accessed in its entirety. It also is not easy due to the requirements of online transactional data streams. In this paper, we have ...

  1. Data mining with SPSS modeler theory, exercises and solutions

    Wendler, Tilo


    Introducing the IBM SPSS Modeler, this book guides readers through data mining processes and presents relevant statistical methods. There is a special focus on step-by-step tutorials and well-documented examples that help demystify complex mathematical algorithms and computer programs. The variety of exercises and solutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. While intended for students, the simplicity of the Modeler makes the book useful for anyone wishing to learn about basic and more advanced data mining, and put this knowledge into practice.

  2. Web Mining Based on Hybrid Simulated Annealing Genetic Algorithm and HMM%基于混合模拟退火-遗传算法和HMM的Web挖掘

    邹腊梅; 龚向坚


    The training algorithm which is used to training HMM is a sub-optimal algorithm and sensitive to initial parameters. Typical hidden Markov model often leads to sub-optimal when training it with random parameters. It is ineffective when mining Web information with typical HMM. GA has the excellent ability of global searching and has the defect of slow convergence rate. SA has the excellent ability of local searching and has the defect of randomly roaming. It combines the advantages of genetic algorithm and simulated annealing algorithm .proposes hybrid simulated annealing genetic algorithm (SGA). SGA chooses the best SGA parameters by experiment and optimizes HMM combining Baum-Welch during the course of Web mining. The experimental results show that the SGA significantly improves the performance in precision and recall.%隐马尔可夫模型训练算法是一种局部搜索算法,对初值敏感.传统方法采用随机参数训练隐马尔可夫模型时常陷入局部最优,应用于Web挖掘效果不佳.遗传算法具有较强的全局搜索能力,但容易早熟、收敛慢,模拟退火算法具有较强的局部寻优能力,但会随机漫游,全局搜索能力欠缺.综合考虑遗传算法和模拟退火算法的特点,提出混合模拟退火-遗传算法SGA,优化HMM初始参数,弥补Baum-Welch算法对初始参数敏感的缺陷,Web挖掘的实验结果表明五个域提取的REC和PRE都有明显的提高.

  3. 基于分类挖掘的网格资源分配研究%Research of resource allocation algorithm based on classification data mining



    Making use of classification data mining algorithm to analyze user' s historical access information, this paper got user' s classification access rules and patterns in cluster renvironment. Firstly, it constructed a classification mining-based resource scheduling model. Secondly, it designed a UA algorithm to allocate all of the users' tasks in each cluster. Finally, it applied a new resource algorithm CDMRA to assign users' tasks to idle CPU resources in every node. Experiments show that CD-MAR algorithm can reduce the resource reallocation times and improve the efficiency and accuracy of resource allocation compared to other algorithms. It can increase the utilization of grid resources.%根据用户访问网格资源的历史信息,采用分类算法对此信息进行挖掘,得出用户使用集群资源的访问规则和模式,在此基础上构造一种基于分类挖掘的资源调度模型、用户调度UA算法以及资源调度CDMRA算法,分别将用户请求调度到各个集群中闲置的CPU资源.实验证明,采用基于分类挖掘的资源分配策略相比其他算法可以减少资源分配过程中对资源的重新分配次数,可以提高网格资源的利用率.

  4. Implementing a parametric maximum flow algorithm for optimal open pit mine design under uncertain supply and demand

    M W A Asad; R Dimitrakopoulos


    Conventional open pit mine optimization models for designing mining phases and ultimate pit limit do not consider expected variations and uncertainty in metal content available in a mineral deposit (supply) and commodity prices (market demand). Unlike the conventional approach, a stochastic framework relies on multiple realizations of the input data so as to account for uncertainty in metal content and financial parameters, reflecting potential supply and demand. This paper presents a new met...

  5. 稀疏数据源频繁模式挖掘并行算法%Parallel Algorithms of Mining Frequent Patterns from Sparse Data Source

    郑晓艳; 孙济洲


    针对频繁模式挖掘中一类特殊的稀疏数据源,设计了一种链表结构体FI-list,并据此提出一个并行搜索频繁项集(PMFSD)的算法.该方法基于一个分布式共享内存系统--面向视图的分布式集群计算(VODCA)而设计.详细描述了链表结构体FI-list的设计和构建过程,论述了在VODCA上挖掘稀疏数据源频繁模式的问题中,视图的划分依据及划分结果,讨论了算法实现的动态任务分配策略.实验结果证明了PMFSD算法的正确性和有效性.%Concerning mining frequent patterns from a special kind of sparse data sources, a linked-list structure,named FI-list was designed, and accordingly a parallel algorithm PMFSD (parallel mining frequent itemsets from sparse data source) was proposed for searching frequent itemsets.PMFSD is based on a distributed shared memory system, called view-oriented, distributed, cluster-based approach (VODCA) to parallel computing.The design and construction process of the FI-list was described in detail.It disserted the views division in mining frequent patterns from the sparse data source on VODCA, and discussed the dynamic task allocation strategy of the algorithm.The experimental results have proved the correctness and effectiveness of PMFSD algorithm.

  6. Implementation of Vertical-Rectification and CNN Models for an Analogic Range-Estimation Algorithm from a Stream of Images

    Derrouich, Salah; Izumida, Kiichiro; Murao, Kenji; Shiiya, Kazuhisa

    The implementation of autonomous mobile robots in real life environments still has numerous challenges to face. The most crucial problem is real-time decision-making, using appropriate methods with the right hardware. Recovering the three-dimension scene geometry and detecting moving targets simultaneously from a stream of images are important tasks and have wide applicability in the creation of autonomous mobile robots, such as persistent choice of a safe route free of obstacles, targeting objects to avoid collisions, autonomous navigation and robot manipulation. In the present work, we focus on exploiting the robustness of the analogic-array-processing-aspect introduced by the Cellular Nonlinear Network paradigm to develop a real time tracking method for a stream of general signals coming from space-distributed sources for monocular autonomous mobile robots. The motivation for developing the new tracking method is from one hand the matching operation has to be performed in real-time, while from the other hand a 32 bit floating point accuracy is not often required, which, together with a vertical rectification, as an intermediate process to minimize the token relative displacements between two frames, can lead to a robust real-time object tracking system. The technique has been successfully applied to several indoor sequences of images. The results of the simulations are presented and discussed.

  7. 一种基于FEC-MDC的多用户码流快速分割算法%A FEC-MDC Based Fast Multiuser Stream Partition Algorithm

    赵明; 胡栋; 范德一


    基于前向纠错的多描述编码(FEC-MDC)是一种在包丢失严重的信道中传输可分级图像和视频数据的有效方法.本文针对单一信源多用户的网络应用模型,研究了在描述数N固定的情况下,根据各信道传输码率的不同,通过调整发送包的长度L实现最佳码流传输的问题,提出了一种码流快速优化分割的改进算法.该算法基于已计算出的参考信道码流分割方案,在期望失真最小的准则下,首先通过在各个目标码率的邻域进行搜索计算,将搜索域分割为低码率部分和高码率部分,然后在高码率部分进行粗的二次搜索,得到最终码流分割,这不仅减少了搜索次数,降低了计算的复杂度,而且保证搜索到在该码率情况下的最佳分割.实验结果表明,本文提出的改进算法与之前的方法相比能够得到相同的平均PSNR,但是总的运算时间减少了近40%.%FEC-based multiple description coding ( FEC-MDC) is an effective approach for sending scalable image and video dato over packet-loss networks. For the scenario where different clients access the server via separate links, we envestigate best stream transmission by adjusting packet length with fixed descriptions N, according to the channel rates, and propose a modified fast, nearly optimal stream partition algorithm. Based on the already computed optimal partition of a reference channel, the algorithm calculate the expected distortion in searching area of each target rate, devides the searching areas into low rate part and high rate part, and then, carries coarse secondary search, obtaines best stream partition afterwards. In this way, not only the searching time being saved, computational conmplexity being decreased, the optimal stream partitions are guaranteed for each channel rate. Experimental results show, the proposed algorithm holds comparable performance to known algorithms , while the total computation time are decreased to approximately 40%.

  8. Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.

    Matulef, Kevin Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)


    The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewer resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.

  9. 从网站中自动挖掘数据记录的算法%Algorithms of mining data records from website automatically

    邱勇; 兰永杰


    In order to improve the accuracy and integrality of mining data records from the web,the concepts of isomorphic page and directory page and three algorithms are proposed.An isomorphic web page is a set of web pages that have uniform structure,only differing in main information.A web page which contains many links that link to isomorphic web pages is called a directory page.Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method.It first sorts the link,and then counts the links in each directory.If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results.A function for an isomorphic web page judgment is also proposed.Algorithm 2 can mine data records from an isomorphic page using a noise information filter.It is based on the fact that the noise information is the same in two isomorphic pages,only the main information is different.Algorithm 3 can mine data records from an entire website using the technology of spider.The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms.Mining data records from isomorphic pages is an efficient method.%为了提高从web中挖掘数据记录的精确性和完整性,提出了同构页与目录页的概念及3个算法.如果一组网页结构相同,只是主信息不同,该网页称为同构页.一个包含有多个指向同构页连接的网页称为目录页.算法1用于发现目录页,它首先将连接排序,并对同一目录的链接记数,如果记数大于某一给定阀值,则对其链接子页进行相似比较并得到结果.同时给出了一个网页相似度判断的函数.算法2采用了噪声信息过滤方法从同构页中挖掘主信息并得到数据记录,该算法是基于在2个同构页中噪声信息相同而只有主信息不同.算法3通过采用Spider技术可以实现从整个网站中自动挖掘数据记录.实验表明所提算法比已有算

  10. The study of Kruskal's and Prim's algorithms on the Multiple Instruction and Single Data stream computer system

    A. Yu. Popov


    Full Text Available Bauman Moscow State Technical University is implementing a project to develop operating principles of computer system having radically new architecture. A developed working model of the system allowed us to evaluate an efficiency of developed hardware and software. The experimental results presented in previous studies, as well as the analysis of operating principles of new computer system permit to draw conclusions regarding its efficiency in solving discrete optimization problems related to processing of sets.The new architecture is based on a direct hardware support of operations of discrete mathematics, which is reflected in using the special facilities for processing of sets and data structures. Within the framework of the project a special device was designed, i.e. a structure processor (SP, which improved the performance, without limiting the scope of applications of such a computer system.The previous works presented the basic principles of the computational process organization in MISD (Multiple Instructions, Single Data system, showed the structure and features of the structure processor and the general principles to solve discrete optimization problems on graphs.This paper examines two search algorithms of the minimum spanning tree, namely Kruskal's and Prim's algorithms. It studies the implementations of algorithms for two SP operation modes: coprocessor mode and MISD one. The paper presents results of experimental comparison of MISD system performance in coprocessor mode with mainframes.

  11. Data mining and well logging interpretation: application to a conglomerate reservoir

    Shi, Ning; Li, Hong-Qi; Luo, Wei-Ping


    Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and-bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.

  12. A Comparison of Machine Learning Algorithms for Mapping of Complex Surface-Mined and Agricultural Landscapes Using ZiYuan-3 Stereo Satellite Imagery

    Xianju Li


    Full Text Available Land cover mapping (LCM in complex surface-mined and agricultural landscapes could contribute greatly to regulating mine exploitation and protecting mine geo-environments. However, there are some special and spectrally similar land covers in these landscapes which increase the difficulty in LCM when employing high spatial resolution images. There is currently no research on these mixed complex landscapes. The present study focused on LCM in such a mixed complex landscape located in Wuhan City, China. A procedure combining ZiYuan-3 (ZY-3 stereo satellite imagery, the feature selection (FS method, and machine learning algorithms (MLAs (random forest, RF; support vector machine, SVM; artificial neural network, ANN was proposed and first examined for both LCM of surface-mined and agricultural landscapes (MSMAL and classification of surface-mined land (CSML, respectively. The mean and standard deviation filters of spectral bands and topographic features derived from ZY-3 stereo images were newly introduced. Comparisons of three MLAs, including their sensitivities to FS and whether FS resulted in significant influences, were conducted for the first time in the present study. The following conclusions are drawn. Textures were of little use, and the novel features contributed to improve classification accuracy. Regarding the influence of FS: FS substantially reduced feature set (by 68% for MSMAL and 87% for CSML, and often improved classification accuracies (with an average value of 4.48% for MSMAL using three MLAs, and 11.39% for CSML using RF and SVM; FS showed statistically significant improvements except for ANN-based MSMAL; SVM was most sensitive to FS, followed by ANN and RF. Regarding comparisons of MLAs: for MSMAL based on feature subset, RF achieved the greatest overall accuracy of 77.57%, followed by SVM and ANN; for CSML, SVM had the highest accuracies (87.34%, followed by RF and ANN; based on the feature subsets, significant differences were

  13. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis.

    Gardiner, Eleanor J; Gillet, Valerie J


    Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development

  14. Algorithm of Automatic Recommender System Based on Data Mining%基于数据挖掘的自动化推荐系统算法



    A typical online recommendation system is described.By using ART neural network and data mining technology,it can automatically cluster population characteristics and can dig out the associated characteristics.Aim at the online recommendation system applied on network,how effectively use data mining techniques to mine the complete knowledge from a large number of databases is discussed,then the appropriate information is recommended to users to help them to find really needed and useful documents or information in the vast flow of information.A pattern is put forward that combines ART neural network and data mining technology.Aim at the characteristics of recommendation system,a new modified ART algorithm(MART algorithm) is proposed.The result shows that the proposed algorithm is effective.%结合人工神经网络中的自适应共振理论(ART)及数据挖掘(Data Mining)技术来建构一个可自动聚类族群特征且能挖掘出关联特质的自动化在线推荐系统。探讨如何有效地运用数据挖掘技术从大量的数据库中挖掘出完整知识,以推荐适当的信息给使用者,帮助他们在浩大的信息流中找到真正需要、有用的文件或信息。整合ART及数据挖掘技术,并针对推荐系统的特性提出一种改进的ART算法(MART算法)。实例验证了算法的有效性。

  15. Nonnegative Matrix Factorization-Based Spatial-Temporal Clustering for Multiple Sensor Data Streams

    Di-Hua Sun


    Full Text Available Cyber physical systems have grown exponentially and have been attracting a lot of attention over the last few years. To retrieve and mine the useful information from massive amounts of sensor data streams with spatial, temporal, and other multidimensional information has become an active research area. Moreover, recent research has shown that clusters of streams change with a comprehensive spatial-temporal viewpoint in real applications. In this paper, we propose a spatial-temporal clustering algorithm (STClu based on nonnegative matrix trifactorization by utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams. Instead of directly clustering multiple data streams periodically, STClu incorporates the spatial relationship between two sensors in proximity and integrates the historical information into consideration. Furthermore, we develop an iterative updating optimization algorithm STClu. The effectiveness and efficiency of the algorithm STClu are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed STClu algorithm outperforms existing methods for clustering sensor data streams.

  16. Situation-Aware Adaptive Processing (SAAP) of Data Streams

    Haghighi, Pari Delir; Gaber, Mohamed Medhat; Krishnaswamy, Shonali; Zaslavsky, Arkady

    The growth and proliferation of technologies in the field of sensor networking and mobile computing have led to the emergence of diverse applications that process and analyze sensory data on mobile devices such as a smart phone. However, the real power to make a significant impact on the area of developing these applications rests not merely on deploying the technologies, but on the ability to perform real-time, intelligent analysis of the data streams that are generated by the various sensors. In this chapter, we present a novel approach for Situation-Aware Adaptive Processing (SAAP) of data streams for pervasive computing environments. This approach uses fuzzy logic principles for modelling and reasoning about uncertain situations, and performs gradual adaptation of parameters of data stream mining algorithms in real-time according to availability of resources and the occurring situations.

  17. Scientific Data Mining in Astronomy

    Borne, Kirk


    We describe the application of data mining algorithms to research problems in astronomy. We posit that data mining has always been fundamental to astronomical research, since data mining is the basis of evidence-based discovery, including classification, clustering, and novelty discovery. These algorithms represent a major set of computational tools for discovery in large databases, which will be increasingly essential in the era of data-intensive astronomy. Historical examples of data mining...

  18. The influence of Zihe Stream on the groundwater resources of the Dawu well field and on the discharge at the Heiwang iron mine, Zibo City area, Shandong Province, China

    Zhu, Xue-Yu; Liu, Jian-Li; Qian, Xiao-Xing

    The Dawu well field, one of the largest in China, supplies most of the water for the Zibo City urban area in Shandong Province. The field yields 522,400-535,400m3/d from an aquifer in fractured karstic Middle Ordovician carbonate rocks. Much of the recharge to the aquifer is leakage of surface water from Zihe Stream, the major drainage in the area. Installation of the Taihe Reservoir in 1972 severely reduced the downstream flow in Zihe Stream, resulting in a marked reduction in the water table in the Dawu field. Since 1994, following the installation of a recharge station on Zihe Stream upstream from the well field that injects water from the Taihe Reservoir into the stream, the groundwater resources of the field have recovered. An average of 61.2×103m3/d of groundwater, mostly from the Ordovician aquifer, is pumped from the Heiwang iron mine, an open pit in the bed of Zihe Stream below the Taihe Reservoir. A stepwise regression equation, used to evaluate the role of discharge from the reservoir into the stream, confirms that reservoir water is one of the major sources of groundwater in the mine. Résumé Le champ captant de Dawu, l'un des plus importants de Chine, fournit l'essentiel de l'eau à la communauté urbaine de Zibo, dans la province de Shandong. Ce champ captant fournit entre 522,400 et 535,400m3/j à partir d'un aquifère fracturé karstique des carbonates de l'Ordovicien moyen. La plupart de la recharge de cet aquifère est assurée par des pertes d'eau de surface de la rivière Zihe, principal cours d'eau de la région. La mise en eau du réservoir de Taihe en 1972 a sévèrement réduit en aval l'écoulement de la Zihe, ce qui a provoqué une diminution nette du niveau de la nappe dans le champ captant de Dawu. Depuis 1974, après la mise en fonctionnement d'une station de recharge sur la rivière Zihe, injectant, en amont du champ captant, de l'eau du réservoir de Taihe dans la rivière, les ressources en eau souterraine ont été reconstitu

  19. A Memory-Based Learning Approach as Compared to Other Data Mining Algorithms for the Prediction of Soil Texture Using Diffuse Reflectance Spectra

    Asa Gholizadeh


    Full Text Available Successful determination of soil texture using reflectance spectroscopy across Visible and Near-Infrared (VNIR, 400–1200 nm and Short-Wave-Infrared (SWIR, 1200–2500 nm ranges depends largely on the selection of a suitable data mining algorithm. The objective of this research was to explore whether the new Memory-Based Learning (MBL method performs better than the other methods, namely: Partial Least Squares Regression (PLSR, Support Vector Machine Regression (SVMR and Boosted Regression Trees (BRT. For this purpose, we chose soil texture (contents of clay, silt and sand as testing attributes. A selected set of soil samples, classified as Technosols, were collected from brown coal mining dumpsites in the Czech Republic (a total of 264 samples. Spectral readings were taken in the laboratory with a fiber optic ASD FieldSpec III Pro FR spectroradiometer. Leave-one-out cross-validation was used to optimize and validate the models. Comparisons were made in terms of the coefficient of determination (R2cv and the Root Mean Square Error of Prediction of Cross-Validation (RMSEPcv. Predictions of the three soil properties by MBL outperformed the accuracy of the remaining algorithms. We found that the MBL performs better than the other three methods by about 10% (largest R2cv and smallest RMSEPcv, followed by the SVMR. It should be pointed out that the other methods (PLSR and BRT still provided reliable results. The study concluded that in this examined dataset, reflectance spectroscopy combined with the MBL algorithm is rapid and accurate, offers major efficiency and cost-saving possibilities in other datasets and can lead to better targeting of management interventions.

  20. Continuous Outlier Monitoring on Uncertain Data Streams

    曹科研; 王国仁; 韩东红; 丁国辉; 王爱侠; 石凌旭


    Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach - Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.

  1. Mining text data

    Aggarwal, Charu C


    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. ""Mining Text Data"" introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including

  2. Anxiety and depression factors mining based on improved BUS algorithm%基于改进BUS算法的焦虑抑郁障碍因素挖掘

    刘峰斌; 袁志勇; 肖玲; 王惠玲; 王高华


    For early prevention and diagnosis of patients with anxiety and depression, this paper applies association rule mining and summarization methods to medical records to discover sets of risk factors associated with anxiety and depression. Separate use of frequent itemsets mining algorithm would produce too many frequent itemsets and association rules, causing its practicability greatly reduced. It preprocesses the medical records. Then it uses the FP-growth algorithm to find frequent itemsets in the data after pretreatment. At last, it uses the latest improvement Bottom-Up Summarization(BUS)algorithm to summarize the discovered frequent itemsets. At the same time, it compares the association rules obtained at last with the association rules uncompressed and the association rules obtained by the original BUS algorithm and Top-K. Experi-mental results show that the rules obtained by improved BUS algorithm have moderate number, less redundant information and the people covered by these rules are at high risk of anxiety or depression.%针对焦虑抑郁患者的早期预防和诊断需求,将关联规则挖掘和压缩方法应用于焦虑抑郁障碍因素的研究,在病人数据中挖掘出与焦虑抑郁障碍相关性较高的因素集合。单独使用频繁项集挖掘算法会产生过多的频繁项集和关联规则,导致其实用性大为降低。对收集的病人数据进行预处理,采用FP-growth算法,挖掘出预处理后数据中的频繁项集,采用最新改进Bottom-Up Summarization(BUS)算法,对挖掘出的频繁项集进行压缩。同时将最后得到的关联规则与未压缩得到的关联规则、原始BUS算法及Top-K算法压缩后得到的关联规则进行对比。实验结果表明,使用改进BUS算法得到的规则数量适中、信息冗余较少而且覆盖的人群具有更高的患病风险。

  3. A new algorithm to create a profile for users of web site benefiting from web usage mining

    masomeh khabazfazli


    Full Text Available Upon integration of internet and its various applications and increase of internet pages, access to information in search engines becomes difficult. To solve this problem, web page recommendation systems are used. In this paper, recommender engine are improved and web usage mining methods are used for this purpose. In recommendation system, clustering was used for classification of users’ behavior. In fact, we implemented usage mining operation on the data related to each user for making its movement pattern. Then, web pages were recommended using neural network and markov model. So, performance of recommendation engine was improved using user’s movement patterns and clustering and neural network and Markov model, and obtained better results than other methods. To predict the data recovery quality on web, two factors including accuracy and coverage were used

  4. Classification Rule Mining Based on Improved Ant-miner Algorithm%基于改进Ant-miner算法的分类规则挖掘

    肖菁; 梁燕辉


    为提高基于传统Ant-miner算法分类规则的预测准确性,提出一种基于改进Ant-miner的分类规则挖掘算法.利用样例在总样本中的密度及比例构造启发式函数,以避免在多个具有相同概率的选择条件下造成算法偏见.对剪枝规则按变异系数进行单点变异,由此扩大规则的搜索空间,提高规则的预测准确度.在Ant-miner算法的信息素更新公式中加入挥发系数,使其更接近现实蚂蚁的觅食行为,防止算法过早收敛.基于UCI标准数据的实验结果表明,该算法相比传统Ant-miner算法具有更高的预测准确度.%In order to improve the classification rule accuracy of the classical Ant-miner algorithm, this paper proposes an improved Ant-miner algorithm for classification rule mining. Heuristic function with sample density and sample proportion is constructed to avoid the bias caused by the same probability in Ant-miner. A pruning strategy with mutation probability is emploied to expand the search space and improve the rule accuracy. An evaporation coefficient in Ant-miner's pheromone update formula is added to slow down the convergence rate of the algorithm. Experimental results on UCI datasets show that the proposed algorithm is promising and can obtain higher predication accuracy than the original Ant-miner algorithm.

  5. Distributed Data Mining Algorithm based on Rough Set Theory and BP Neural Network%基于粗糙集与BP神经网络的分布式数据挖掘算法



    In the research and application of Wireless Sensor Networks(WSN),the use of data mining to improve energy efficiency is an important direction.A distributed data mining algorithm based on rough set theory and BP network was designed and applied to wireless sensor networks.Raw data are discretized and reduced rough set attributes.Minimun condition attributes set is obtained by distributed data mining algorithm.Finally,the reduced decision attributes were used to construct BP neural network classification data.Constructed data mining algorithm can be integrated in each sensor network node.We simulated the distributed data mining algorithm.The simulation result had indicated: This distributed data mining algorithm can reduce data dimension,eliminate data redundancy,decrease communication traffic and lengthen the WSN working hours.%利用数据挖掘来提高网络中能量利用率是无线传感器网络(WSN)的一个重要研究方向.本文构建了基于粗糙集与神经网络相结合的无线传感器网络分布式数据挖掘算法.该算法用粗糙集对节点内的原始数据进行离散化与属性约简后得到的最简决策表训练BP神经网络,再将构造好的BP神经网络集成在每个传感器节点上.仿真结果表明,该算法可以降低数据维数,消除冗余数据、减少网络通信量、延长网络寿命.

  6. Performance Analysis of Anti-Phishing Tools and Study of Classification Data Mining Algorithms for a Novel Anti-Phishing System

    Rajendra Gupta


    Full Text Available The term Phishing is a kind of spoofing website which is used for stealing sensitive and important information of the web user such as online banking passwords, credit card information and user's password etc. In the phishing attack, the attacker generates the warning message to the user about the security issues, ask for confidential information through phishing emails, ask to update the user's account information etc. Several experimental design considerations have been proposed earlier to countermeasure the phishing attack. The earlier systems are not giving more than 90 percentage successful results. In some cases, the system tool gives only 50-60 percentage successful result. In this paper, a novel algorithm is developed to check the performance of the anti-phishing system and compared the received data set with the data set of existing anti-phishing tools. The performance evaluation of novel anti-phishing system is studied with four different classification data mining algorithms which are Class Imbalance Problem (CIP, Rule based Classifier (Sequential Covering Algorithm (SCA, Nearest Neighbour Classification (NNC, Bayesian Classifier (BC on the data set of phishing and legitimate websites. The proposed system shows less error rate and better performance as compared to other existing system tools.

  7. On the Mining Algorithm Based on BDIF Association Rule%基于BDIF的关联规则挖掘算法研究



    This article describes research on association rule mining and classification methods of association rules, analyzes and evaluates the classic Apriori algorithm, which gives rise to an efficient frequent BDIF (Based Transactional Databases Including Frequent Item Set) algorithm. It thereby reduces scanning data block and improves algorithm efficiency by dividing data block and quickly searching for frequent item set.%阐述了关联规则挖掘的研究情况,关联规则的分类方法等,对经典Apriori算法进行了分析和评价,在此基础上提出了一种高效产生频繁集的BDIF(Based Transactional Databases Including Frequent ItemSet)算法;它通过划分数据块,快速的搜寻频繁项目集,从而减少对数据块的扫描次数,提高了算法的效率。并用BorlandC++Builder6.0开发环境来调试、验证该算法。

  8. Efficient constraint-based Sequential Pattern Mining (SPM algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

    Niti Ashish Kumar Desai


    Full Text Available Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user’s interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency, active for certain time span (Compactness, profitable and indicative of next timeline for purchase (Length―Item―Gap. The article also attempts to throw light on how proposed Constraint-based Prefix Span algorithm is helpful to understand buying behaviour of customer which is in formative stage.

  9. Personal continuous route pattern mining

    Qian YE; Ling CHEN; Gen-cai CHEN


    In the daily life, people often repeat regular routes in certain periods. In this paper, a mining system is developed to find the continuous route patterns of personal past trips. In order to count the diversity of personal moving status, the mining system employs the adaptive GPS data recording and five data filters to guarantee the clean trips data. The mining system uses a client/server architecture to protect personal privacy and to reduce the computational load. The server conducts the main mining procedure but with insufficient information to recover real personal routes. In order to improve the scalability of sequential pattern mining, a novel pattern mining algorithm, continuous route pattern mining (CRPM), is proposed. This algorithm can tolerate the different disturbances in real routes and extract the frequent patterns. Experimental results based on nine persons' trips show that CRPM can extract more than two times longer route patterns than the traditional route pattern mining algorithms.

  10. 数据流分类器算法在水质环境中的应用%The Application of Data Stream Classification Algorithm in Water Quality Environment

    曹红; 郑鑫


    许多现实应用中,由于数据流的特性,使人们难以获得全部数据的类标签。为了解决类标签不完整数据流的分类问题,本文首先分析了有标签数据集对基于聚类假设半监督分类算法分类误差的影响;然后,利用分类误差影响分析以及数据流的特点,提出一种基于聚类假设半监督数据流集成分类器算法(semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC),并针对个体分类器的权值设定进行了探讨;最后,利用仿真实验验证本文算法的有效性。%In many real-world applications, due to the characteristics of the data stream, makes it difficult to get the class labels of all data. This paper first analyzes in order to solve the problem of the class label incomplete data stream classification, labeled data set based on clustering assuming semi-supervised classification algorithms classification error; then use classification errors affect the analysis as well as the characteristics of the data stream is proposed semi-supervised data stream the integrated classifier algorithm (Semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC), and assigning weights for individual classifier based clustering assumptions; Finally, the simulation results verify the proposed algorithm effectiveness.

  11. Use of diatom assemblages as biomonitor of the impact of treated uranium mining effluent discharge on a stream: case study of the Ritord watershed (Center-West France).

    Herlory, Olivier; Bonzom, Jean-Marc; Gilbin, Rodolphe; Frelon, Sandrine; Fayolle, Stéphanie; Delmas, François; Coste, Michel


    The rehabilitation of French former uranium mining sites has not prevented the contamination of the surrounding aquatic ecosystems with metal elements. This study assesses the impact of the discharge of treated uranium mining effluents on periphytic diatom communities to evaluate their potential of bioindication. A 7-month survey was conducted on the Ritord watercourse to measure the environmental conditions of microalgae, the non-taxonomic attributes of periphyton (photosynthesis and biomass) and to determine the specific composition of diatom assemblages grown on artificial substrates. The environmental conditions were altered by the mine waters, that contaminate the watercourse with uranium and with chemicals used in the pit-water treatment plants (BaCl2 and Al2(SO4)3). The biomass and photosynthetic activity of periphyton seemed not to respond to the stress induced by the treated mining effluents whereas the altered environmental conditions clearly impacted the composition of diatom communities. Downstream the discharges, the communities tended to be characterized by indicator species belonging to the genera Fragilaria, Eunotia and Brachysira and were highly similar to assemblages at acid mine drainage sites. The species Eunotia pectinalis var. undulata, Psammothidium rechtensis, Gomphonema lagenula and Pinnularia major were found to be sensitive to uranium effluents whereas Neidium alpinum and several species of Gomphonema tolerated this contamination. The relevance of diatoms as ecological indicator was illustrated through the changes in structure of communities induced by the discharge of uranium mining effluents and creates prospects for development of a bioindicator tool for this kind of impairment of water quality.

  12. 基于DWT与改进中值滤波的矿井视频监控图像去噪%Mine Video Surveilance Image Algorithm Based on DWT and Improved Median Filtering Algorithm

    吕振雷; 吴丰


    井下光照不均、煤尘浓度大以及视频图像获取设备电路电压不稳定等各类因素的存在,导致矿井视频监控系统获取的图像存在大量噪声,影响了对矿井各类生产信息的准确判读.为此,将离散小波变换(Discrete wavelet transform,DWT)与改进中值滤波算法相结合,提出了一种矿井视频监控图像高效去噪算法.首先,对获取的矿井视频图像进行自适应噪声检测,根据检测结果,对图像采用改进中值滤波算法处理;然后对滤波后的图像进行3层离散小波变换,鉴于图像的噪声信息绝大部分集中分布于高频分解系数中,故对低频分解系数不作处理;最后对高频分解系数采用一种改进软阈值去噪函数模型进行去噪,将去噪后的高频分解系数与原始低频分解系数进行重构,得到去噪后清晰度较高的图像.采用实地获取的山西潞安某煤矿井下视频图像进行试验,并与小波软阈值去噪、中值滤波等算法进行去噪效果对比分析,此外,对各算法的试验结果分别采用信噪比(Signal noise ratio,SNR)以及算法运行时间进行评价,结果表明:新算法对于矿井视频监控图像的去噪效果优于其余2类算法,且算法运算时间也具有一定的优势.%The existing factors of uneven illumination,coal dust and circuit voltage instability of video surveilance image acquistition devices,resulting in a lot of noises are existed in video surveilance image,the accurate interpretation of mine all kinds of production information is affected. Combined with discrete wavelet transform ( DWT) and improved median filtering al-gorithm,a filtering algorithm of mine video surveilance with high efficiency is proposed. Firstly,according to the distribution characteristics of the noise in mine video surveilance image,the adaptive noise detection operator is proposed,according to the noise detection results,the improved median filtering algorithm is adopted to filtering

  13. 基于改进的FP-tree的频繁模式挖掘算法%Frequent pattern mining algorithm based on improved FP-tree

    李也白; 唐辉; 张淳; 贺玉明


    FP-growth is an efficient frequent pattern mining algorithm based on the data structure of FP-tree, which does not generate candidate sets. Constructing frequent pattern tree TP-tree requires to scanning data twice. What's more,transactions which only contain non-frequent items are also scanned during the second scanning. In order to solve this problem, after analyzing particularity of FP-tree deeply, this paper improved construction process of FP-tree and employed an auxiliary storage structure that bases on hash table, which saves time of searching items and enhances mining efficiency.%FP-growth算法是一种基于FP-tree数据结构的高效的频繁模式挖掘算法,它不产生候选集.构造频繁模式树FP-tree需扫描数据库两次,在第二遍扫描中还扫描了那些仅包含了非频繁项的事务,针对此问题,在深入分析了FP-tree特性的基础上,改进了FP-tree构造过程,同时用一种基于Hash表的辅助存储结构,节省了项目查找时间,提高了挖掘效率.

  14. 基于关联规则的动态数据库快速挖掘算法%Dynamic Fast Database Mining Algorithm Based on Association Rules



    关联规则的动态快速挖掘算法(Dynamic Fast Mining Algorithm,DFMA),不需要重复扫描原始数据库,克服关联规则挖掘最具代表性的方法Apriori算法耗时多、无法在线挖掘等诸多弱点.可支持在线挖掘及渐进式挖掘的需求.利用DFMA多层同步处理与更新的特性,搭配敏感度指数的定义,可以被用来挖掘对决策者有用的实时性信息.

  15. 蚁群算法在数据挖掘分类中的研究%Application Research on the classification of Data Mining Using Ant Colony Algorithm

    熊斌; 熊娟


    Classification is an important task in data mining, using ant foraging theory in the database search to introduce the ant algorithm classification in rules discovery,to chose and optimize a group of rules which is produced random, until the database can be covered, thereby dig the implicit rules in the database, set up the optimal classification model.%对蚁群算法杂数据挖掘中的分类任务的应用进行了研究,算法实质上是利用蚁群觅食原理在数据库中进行搜索,对随机产生的一组规则进行选择优化,直到数据库能被该组规则覆盖,从而挖掘出隐含在数据库中的规则。

  16. Interestingness Rule Mining Algorithm Based on Information Entropy%基于信息熵的兴趣度规则挖掘算法

    金洲; 王儒敬


    With the development of data collection and storage techniques, excessive and unorderly rules are generated by traditional association rule mining, which can not meet interest of users. To solve this problem, an interestingness measure of association rules based on information entropy is proposed to mine interestingness association rules. Correlation analysis for categorical variables is adopted to eliminate false and erroneous rules from the primitive set, and a framework for evaluating the interestingness degree of rules based on information entropy is proposed. Since the method does not depend on the prior knowledge of users, it can represent the information hidden in the data accurately. Simulation results on both real and synthetic datasets show that the proposed algorithm performs better than the traditional algorithms, and it discovers interestingness rules from large database efficiently.%传统关联规则挖掘方法通常产生海量杂乱的规则,它们对用户而言是冗余的。为解决该问题,文中提出一种基于信息熵的兴趣度规则挖掘算法。通过变量相关性分析剔除原始规则集中虚假、错误的规则,并在信息熵的基础上提出度量关联规则兴趣度的框架。该算法不依赖用户先验知识,能无偏地表达数据包含的信息。在真实和仿真数据集上的实验验证该算法能有效挖掘兴趣度规则,且性能比传统算法更优。

  17. Advance Mining of Temporal High Utility Itemset

    Swati Soni


    Full Text Available The stock market domain is a dynamic and unpredictable environment. Traditional techniques, such as fundamental and technical analysis can provide investors with some tools for managing their stocks and predicting their prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the financial markets and help in stock-price forecasting. Therefore, we propose in this paper a portfolio management solution with business intelligence characteristics. We know that the temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. We proposed the novel algorithm for temporal association mining with utility approach. This make us to find the temporal high utility itemset which can generate less candidate itemsets.

  18. Effects of a small-scale, abandoned gold mine on the geochemistry of fine stream-bed and floodplain sediments in the Horsefly River watershed, British Columbia, Canada

    Clark, Deirdre E.; Vogels, Marjolein; van der Perk, Marcel; Owens, Philip N.; Petticrew, Ellen L.


    Mining is known to be a major source of metal contamination for fluvial systems worldwide. Monitoring and understanding the effects on downstream water and sediment quality is essential for its management and to mitigate against detrimental environmental impacts. This study aimed to examine the effe

  19. Classification and Segregation of Abnormal Lymphocytes through Image Mining for Diagnosing Rheumatoid Arthritis Using Min-max Algorithm

    S.P. Chokkalingam


    Full Text Available Advances in the acquisition of complex medical images and storing it for further analysis through image mining have significantly helped to identify the root causes for various diseases. Mining of medical image data set such as scanned images or blood cell images require extraction of implicit knowledge from the data set through hierarchical image processing techniques and identifying the relationships and patterns that are not explicitly stored in a single image. Rheumatoid Arthritis (RA is an autoimmune disease and it cause chronic inflammation of the joints. Causes of the RA is unknown due to that need to find out in the early stage is required. Diagnosis of RA based on blood cell types and shapes requires computational analysis. An assistive technology for the doctor to detect and investigate rheumatoid arthritis is therefore required. The objective of the proposed work is to analyze the shapes of lymphocytes, a key component of blood cells that causes RA complications, to automate the process of identifying abnormal lymphocytes by estimating the centroids of lymphocytes using AIT centroid technique and thereby finding a differential count. The process involves cropping nucleus from the blood cell image, segmenting it and to investigate further whether the shapes of the lymphocytes are irregular and dissimilar. Features are extracted from each cell components for comparison and the abnormal lymphocytes are segregated from the normal. To enhance the segregation process, neural network based perceptron classifier tool is used.

  20. Multidimensional data mining using a K-mean algorithm based on the forest management inventory of Fujian Province, China

    Yanrong Guo


    Full Text Available To determine relationships between stand volume and site factors in the absence of information about stand age and density, a classification pattern was established using a clustering analysis algorithm and applied to China fir in Fujian Province. The results showed that slope position, elevation, elevation and humus depth were important factors affecting the stand volumes of young/immature forests, near-mature forests, and mature/overmature forests, respectively. The K-mean algorithm could be used to evaluate the influences of site factors on stand volume under different stand age groups and density conditions.

  1. A Puzzle-Based Genetic Algorithm with Block Mining and Recombination Heuristic for the Traveling Salesman Problem

    Pei-Chann Chang; Wei-Hsiu Huang; Zhen-Zhen Zhang


    In this research,we introduce a new heuristic approach using the concept of ant colony optimization (ACO)to extract patterns from the chromosomes generated by previous generations for solving the generalized traveling salesman problem.The proposed heuristic is composed of two phases.In the first phase the ACO technique is adopted to establish an archive consisting of a set of non-overlapping blocks and of a set of remaining cities (nodes) to be visited.The second phase is a block recombination phase where the set of blocks and the rest of cities are combined to form an artificial chromosome.The generated artificial chromosomes (ACs) will then be injected into a standard genetic algorithm (SGA) to speed up the convergence.The proposed method is called "Puzzle-Based Genetic Algorithm" or "p-ACGA".We demonstrate that p-ACGA performs very well on all TSPLIB problems,which have been solved to optimality by other researchers.The proposed approach can prevent the early convergence of the genetic algorithm (GA) and lead the algorithm to explore and exploit the search space by taking advantage of the artificial chromosomes.

  2. Web日志挖掘中的用户识别算法%User Identification Algorithm in Web Log Mining

    肖慧; 王立华


    The paper introduces some existing user identification algorithms, proposes IASR (IP, Agent, Session and Referrer) user identification algorithm to solve existing problems on user identification.The proposed algorithm overwrite URL in order to track users, efficiently and accurately identifies different users accessing the same proxy, and satisfactorily solves “Multi-User Problem” due to accessing Web via directly inputting URL in browser's address bar.At last, the paper prospects future development of user identification algorithm.%介绍了现有的用户识别算法,针对用户识别目前存在的问题提出了IASR(IP,Agent,Session and Referrer)用户识别算法.该算法采用重写URL的用户跟踪技术,引入会话(Session)来识别用户,能够高效准确地识别访问同一代理服务器的不同用户,很好地解决同一用户直接从浏览器地址输入URL信息访问站点造成的"多用户问题".最后,对用户识别算法的发展趋势进行了展望.

  3. Streams with Strahler Stream Order

    Minnesota Department of Natural Resources — Stream segments with Strahler stream order values assigned. As of 01/08/08 the linework is from the DNR24K stream coverages and will not match the updated...

  4. Data Mining Approaches for Intrusion Detection


    In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques...two general data mining algorithms that we have implemented: the association rules algorithm and the frequent episodes algorithm. These algorithms can

  5. 面向海量数据的空间co-location模式挖掘新算法%Spatial Co-location Patterns Mining Algorithm over Massive Spatial Data Sets

    姚华传; 王丽珍; 陈红梅; 邹目权


    空间co-location模式挖掘是空间数据挖掘的一个重要任务,目前无论是挖掘确定数据,还是不确定数据,算法的时间和空间效率都不高,更谈不上对海量数据进行挖掘。为此,在深入分析传统挖掘方式过度消耗时间和空间资源的根本原因的基础上,提出了网格微分挖掘co-location模式的算法。新算法在传统网格基础上实施微分,求出各微分格中属于同一特征的实例质心,并基于这些质心进行多分辨剪枝co-location模式挖掘。算法在保证具有较高准确率的前提下,较好地解决了传统挖掘方式中存在的效率问题,从而解决了面向海量数据进行空间co-location模式挖掘的难题。大量实验证明,网格微分算法具有高效性、稳健性和高准确率等优点。%Spatial co-location patterns mining is an important task in spatial data mining, but the efficiencies of running time and space are low for traditional mining algorithms of determination data and uncertain data, not to mention the massive data. Therefore, based on the analysis of why traditional mining algorithms consumed excessive time and space resources, this paper proposes a grid differential algorithm to mine spatial co-location patterns. The new algorithm divides the traditional grids into differential ones, and then calculates the centroids of instances that belong to the same feature for each differential grid. Finally, based on these centroids, the co-location patterns are mined with multiresolution pruning method. The proposed algorithm greatly improves the overall efficiency and has a high accuracy rate, which better solves the problem of mining spatial co-location patterns from a massive data set. Extensive experiments show that the grid differential algorithm has the advantages of high efficiency, robustness and high accuracy and so on.

  6. Using non-aliasing Contourlet transform in restructing algorithm of mine images%抗混叠Contourlet变换在煤矿图像重构算法中的应用

    刘丽虹; 俞啸; 胡延军


    This article applied non -aliasing Contourlet transform to reconstruction algorithm of mine images based on theory of compressed sensing. Simulation indicates that reconstruction result of compressed sensing reconstructed algorithm, based on non-aliasing Contourlet transform, is better than based on traditional contourlet transform and Sym4 wavelet transform, when reconstruct one mine image with OMP algorithm under the same observing system.%将抗混叠的Contourlet变换应用到基于压缩感知理论的矿井图像重构算法中.仿真实验表明,在相同的观测系统下采用OMP算法对矿井图像进行重建时,相比于传统的Contourlet变换和Sym4小波变换,基于抗混叠Contourlet变换的压缩感知重构的图像恢复效果更佳.

  7. Distribution of chemical elements in soils and stream sediments in the area of abandoned Sb-As-Tl Allchar mine, Republic of Macedonia.

    Bačeva, Katerina; Stafilov, Trajče; Šajn, Robert; Tănăselia, Claudiu; Makreski, Petre


    The aim of this study was to investigate the distribution of some toxic elements in topsoil and subsoil, focusing on the identification of natural and anthropogenic element sources in the small region of rare As-Sb-Tl mineralization outcrop and abandoned mine Allchar known for the highest natural concentration of Tl in soil worldwide. The samples of soil and sediments after total digestion were analyzed by inductively coupled plasma-mass spectrometry (ICP-MS) and inductively coupled plasma-atomic emission spectrometry (ICP-AES). Factor analysis (FA) was used to identify and characterize element associations. Six associations of elements were determined by the method of multivariate statistics: Rb-Ta-K-Nb-Ga-Sn-Ba-Bi-Li-Be-(La-Eu)-Hf-Zr-Zn-In-Pd-Ag-Pt-Mg; Tl-As-Sb-Hg; Te-S-Ag-Pt-Al-Sc-(Gd-Lu)-Y; Fe-Cu-V-Ge-Co-In; Pd-Zr-Hf-W-Be and Ni-Mn-Co-Cr-Mg. The purpose of the assessment was to determine the nature and extent of potential contamination as well as to broadly assess possible impacts to human health and the environment. The results from the analysis of the collected samples in the vicinity of the mine revealed that As and Tl elements have the highest median values. Higher median values for Sb are obviously as a result of the past mining activities and as a result of area surface phenomena in the past. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Concentrations of cadmium, Cobalt, Lead, Nickel, and Zinc in Blood and Fillets of Northern Hog Sucker (Hypentelium nigricans) from streams contaminated by lead-Zinc mining: Implications for monitoring

    Schmitt, C.J.; Brumbaugh, W.G.; May, T.W.


    Lead (Pb) and other metals can accumulate in northern hog sucker (Hypentelium nigricans) and other suckers (Catostomidae), which are harvested in large numbers from Ozark streams by recreational fishers. Suckers are also important in the diets of piscivorous wildlife and fishes. Suckers from streams contaminated by historic Pb-zinc (Zn) mining in southeastern Missouri are presently identified in a consumption advisory because of Pb concentrations. We evaluated blood sampling as a potentially nonlethal alternative to fillet sampling for Pb and other metals in northern hog sucker. Scaled, skin-on, bone-in "fillet" and blood samples were obtained from northern hog suckers (n = 75) collected at nine sites representing a wide range of conditions relative to Pb-Zn mining in southeastern Missouri. All samples were analyzed for cadmium (Cd), cobalt (Co), Pb, nickel (Ni), and Zn. Fillets were also analyzed for calcium as an indicator of the amount of bone, skin, and mucus included in the samples. Pb, Cd, Co, and Ni concentrations were typically higher in blood than in fillets, but Zn concentrations were similar in both sample types. Concentrations of all metals except Zn were typically higher at sites located downstream from active and historic Pb-Zn mines and related facilities than at nonmining sites. Blood concentrations of Pb, Cd, and Co were highly correlated with corresponding fillet concentrations; log-log linear regressions between concentrations in the two sample types explained 94% of the variation for Pb, 73-83% of the variation for Co, and 61% of the variation for Cd. In contrast, relations for Ni and Zn explained <12% of the total variation. Fillet Pb and calcium concentrations were correlated (r = 0.83), but only in the 12 fish from the most contaminated site; concentrations were not significantly correlated across all sites. Conversely, fillet Cd and calcium were correlated across the range of sites (r = 0.78), and the inclusion of calcium in the fillet

  9. Design and Implementation of Data Mining Algorithm Package Prototype Based on DMX%基于DMX的数据挖掘算法包原型的设计与实现

    李由; 孙蕾


    数据挖掘扩展插件DMX(Data Mining extension)语言是由微软提出的在SSAS(SQL Server Analysis Services)中实现的符合OLE DB for Data Mining规范的数据挖掘语言,它提供了创建、访问和管理数据挖掘算法和信息的数据存储方式和开源的可扩展的模型调用体系.为了满足目前国内中小企业迫切需求的数据挖掘工具的应用需求,通过分层设计、逐次封装,设计并实现了一种基于DMX语言的特性和功能并具有平台无关性和应用推广性的数据挖掘算法包原型系统.通过公共测试数据验证了该原型系统的可行性和可用性.%Data Mining Extensions is a Microsoft's data mining language which is based on OLE DB for data mining specification and implemented in the SSAS( SQL Server Analysis Services). This language provides data storage method and open source extendible model call gystem for creating, accessing and managing data mining algorithm and information. In order to meet the needs of data mining tools for present domestic SMEs and through hierarchical designing and gradual encapsulating,designs and implements one data mining algorithms pncknge prototype which is based on the feature and function of DMX language and platform independent and application propagable. The paper has verified feasibility and usability of the prototype.

  10. Data Mining of Determinants of Intrauterine Growth Retardation Revisited Using Novel Algorithms Generating Semantic Maps and Prototypical Discriminating Variable Profiles.

    Massimo Buscema

    Full Text Available Intra-uterine growth retardation is often of unknown origin, and is of great interest as a "Fetal Origin of Adult Disease" has been now well recognized. We built a benchmark based upon a previously analysed data set related to Intrauterine Growth Retardation with 46 subjects described by 14 variables, related with the insulin-like growth factor system and pro-inflammatory cytokines, namely interleukin-6 and tumor necrosis factor-α.We used new algorithms for optimal information sorting based on the combination of two neural network algorithms: Auto-contractive Map and Activation and Competition System. Auto-Contractive Map spatializes the relationships among variables or records by constructing a suitable embedding space where 'closeness' among variables or records reflects accurately their associations. The Activation and Competition System algorithm instead works as a dynamic non linear associative memory on the weight matrices of other algorithms, and is able to produce a prototypical variable profile of a given target.Classical statistical analysis, proved to be unable to distinguish intrauterine growth retardation from appropriate-for-gestational age (AGA subjects due to the high non-linearity of underlying functions. Auto-contractive map succeeded in clustering and differentiating completely the conditions under study, while Activation and Competition System allowed to develop the profile of variables which discriminated the two conditions under study better than any other previous form of attempt. In particular, Activation and Competition System showed that ppropriateness for gestational age was explained by IGF-2 relative gene expression, and by IGFBP-2 and TNF-α placental contents. IUGR instead was explained by IGF-I, IGFBP-1, IGFBP-2 and IL-6 gene expression in placenta.This further analysis provided further insight into the placental key-players of fetal growth within the insulin-like growth factor and cytokine systems. Our previous

  11. Fast Vertical Mining Using Boolean Algebra

    Hosny M. Ibrahim


    Full Text Available The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The sizes of bit vectors for itemsets are the main space expense of the algorithm that restricts its expansibility. Therefore, in this paper, a proposed algorithm that compresses the bit vectors of frequent itemsets will be presented. The new bit vector schema presented here depends on Boolean algebra rules to compute the intersection of two compressed bit vectors without making any costly decompression operation. The experimental results show that the proposed algorithm, Vertical Boolean Mining (VBM algorithm is better than both Apriori algorithm and the classical vertical association rule mining algorithm in the mining time and the memory usage.

  12. Data mining methods

    Chattamvelli, Rajan


    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  13. Application of artificial neural network coupled with genetic algorithm and simulated annealing to solve groundwater inflow problem to an advancing open pit mine

    Bahrami, Saeed; Doulati Ardejani, Faramarz; Baafi, Ernest


    In this study, hybrid models are designed to predict groundwater inflow to an advancing open pit mine and the hydraulic head (HH) in observation wells at different distances from the centre of the pit during its advance. Hybrid methods coupling artificial neural network (ANN) with genetic algorithm (GA) methods (ANN-GA), and simulated annealing (SA) methods (ANN-SA), were utilised. Ratios of depth of pit penetration in aquifer to aquifer thickness, pit bottom radius to its top radius, inverse of pit advance time and the HH in the observation wells to the distance of observation wells from the centre of the pit were used as inputs to the networks. To achieve the objective two hybrid models consisting of ANN-GA and ANN-SA with 4-5-3-1 arrangement were designed. In addition, by switching the last argument of the input layer with the argument of the output layer of two earlier models, two new models were developed to predict the HH in the observation wells for the period of the mining process. The accuracy and reliability of models are verified by field data, results of a numerical finite element model using SEEP/W, outputs of simple ANNs and some well-known analytical solutions. Predicted results obtained by the hybrid methods are closer to the field data compared to the outputs of analytical and simple ANN models. Results show that despite the use of fewer and simpler parameters by the hybrid models, the ANN-GA and to some extent the ANN-SA have the ability to compete with the numerical models.

  14. Mine your own business! Mine other's news!

    Pham, Quang-Khai; Saint-Paul, Régis; Benatallah, Boualem; Mouaddib, Noureddine; Raschia, Guillaume


    International audience; Major media companies such as The Financial Times, the Wall Street Journal or Reuters generate huge amounts of textual news data on a daily basis. Mining frequent patterns in this mass of information is critical for knowledge workers such as financial analysts, stock traders or economists. Using existing frequent pattern mining (FPM) algorithms for the analysis of news data is difficult because of the size and lack of structuring of the free text news content. In this ...

  15. Visual analytics of anomaly detection in large data streams

    Hao, Ming C.; Dayal, Umeshwar; Keim, Daniel A.; Sharma, Ratnesh K.; Mehta, Abhay


    Most data streams usually are multi-dimensional, high-speed, and contain massive volumes of continuous information. They are seen in daily applications, such as telephone calls, retail sales, data center performance, and oil production operations. Many analysts want insight into the behavior of this data. They want to catch the exceptions in flight to reveal the causes of the anomalies and to take immediate action. To guide the user in finding the anomalies in the large data stream quickly, we derive a new automated neighborhood threshold marking technique, called AnomalyMarker. This technique is built on cell-based data streams and user-defined thresholds. We extend the scope of the data points around the threshold to include the surrounding areas. The idea is to define a focus area (marked area) which enables users to (1) visually group the interesting data points related to the anomalies (i.e., problems that occur persistently or occasionally) for observing their behavior; (2) discover the factors related to the anomaly by visualizing the correlations between the problem attribute with the attributes of the nearby data items from the entire multi-dimensional data stream. Mining results are quickly presented in graphical representations (i.e., tooltip) for the user to zoom into the problem regions. Different algorithms are introduced which try to optimize the size and extent of the anomaly markers. We have successfully applied this technique to detect data stream anomalies in large real-world enterprise server performance and data center energy management.

  16. Automatic trace metal monitoring station use for early warning and short term events in polluted rivers: application to streams loaded by mining tailing.

    Lourino-Cabana, Beatriz; Iftekhar, Shafia; Billon, Gabriel; Mikkelsen, Oyvind; Ouddane, Baghdad


    An automatic trace metal monitoring station (ATMS) system was implemented to study seasonal and short time changes in selected metal concentrations in two river courses influenced by mine drainage. High frequency monitoring over periods of months revealed daily variations of zinc, iron and copper, and also proved the use of ATMS as an early warning system in such polluted environments. Complementary measurements with ICP-MS (inductively coupled plasma-mass spectrometry), ionic chromatography, and thermodynamic equilibrium calculations also gave some new insights into the geochemical behaviour of the metals in these two rivers.

  17. Security Visualization Analytics Model in Online Social Networks Using Data Mining and Graph-based Structure Algorithms

    Prajit Limsaiprom


    Full Text Available The rise of the Internet accelerates the creation of various large-scale online social networks, which can be described the relationships and activities between human beings. The online social networks relationships in real world are too big to present with useful information to identify the criminal or cyber-attacks. This research proposed new information security analytic model for online social networks, which called Security Visualization Analytics (SVA Model. SVA Model used the set of algorithms (1 Graph-based Structure algorithm to analyze the key factors of influencing nodes about density, centrality and the cohesive subgroup to identify the influencing nodes of anomaly and attack patterns (2 Supervised Learning with oneR classification algorithm was used to predict new links from such influencing nodes in online social networks on discovering surprising links in the existing ones of influencing nodes, which nodes in online social networks will be linked next from the attacked influencing nodes to monitor the risk. The results showed 42 influencing nodes of anomaly and attack patterns and can be predict 31 new links from such nodes were achieved by SVA Model with the accuracy of confidence level 95.0%. The new proposed model and results illustrated SVA Model was significance analysis. Such understanding can lead to efficient implementation of tools to links prediction in online social networks. They could be applied as a guide to further investigate of social networks behavior to improve the security model and notify the risk, computer viruses or cyber-attacks for online social networks in advance.


    李伟; 郑烇


    In mesh-based peer-to-peer streaming media systems, media contents are usually divided into different data segments. Among them, the scheduling algorithm in charge of coordinating the data segments from multiple sending peers is the key factor of video quality affecting users' perception. In order to improve overall performance of streaming media system, the context-aware adaptive (CAA) streaming media data scheduling algorithm is proposed in this paper. In this algorithm, the priority of data segments is defined based on the context information, and the bandwidth of networks between neighbours' nodes is dynamically evaluated. Also, the algorithm calculates the order and direction requested by data segments according to context information such as priority of segments, assessment of sending peers quality and network capacity. Simulation results show that the proposed CAA scheduling algorithm requires smaller buffering delays. What' s more, it achieves higher peer throughput and more balanced load distribution across peers than the conventional P2P streaming media scheduling algorithms. Meanwhile, it also improves the continuity index of peers.%在网格型P2P流媒体系统中,媒体内容通常分成不同的数据块.其中,负责协调来自多个发送节点的数据块的调度算法,是影响用户感知的视频质量的重要因素.为了提高流媒体系统的整体性能,提出一种上下文感知的自适应(CAA)流媒体数据调度算法.算法根据上下文信息定义了数据块的优先级,并动态估计与邻居节点间的网络带宽,根据数据块的优先级、发送节点质量的评估和网络容量等上下文信息计算数据块请求的次序和方向.仿真结果表明,CAA调度算法具有较小的缓冲延迟,在节点吞吐量和系统负载均衡方面比传统的P2P流媒体调度算法有所提高,同时节点连续性指标也得到了改进.

  19. Dynamic Scheduling Algorithms for Streaming Media Based on Hybrid Content Delivery Network%基于混合模式的流媒体缓存调度算法

    叶剑虹; 叶双


    A hybrid content delivery network combining complementary advantages of CDN and P2P called HyCDN for streaming media was presented. The CVCR4P2P (Comprehensive Value Cache Replacement Algorithm for P2P) algorithm was proposed for the peers inside domain, which considers bytes benefit of prefix data, transmission cost and access rate of streaming media. Another algorithm,DSA4ProxyC (Dynamic Scheduling Algorithm for Proxy Caching), which joints the proxy caching and server scheduling strategies for proxies between domain was also shown. It employs the scheme of cache allocation based on the current batching interval that has non-zero requests, which can be updated periodically according to the popularity of streaming media object The principle is obeyed that the data cached for each streaming media object are in proportion to their popularity at the proxy server. Theoretical analysis and simulation results show that the hybrid dynamic scheduling can effectively reduce server and network bandwidth usage,and also has a very good adaptability for the variety of the request arrival rate.%介绍了一种结合了CDN和P2P互补优势的流媒体混合内容分发网络(HyCDN).针对HyCDN不同区域提出了相应的缓存算法,域内用户端综合考虑了流媒体前缀字节的有用性、文件的传输代价及点播热度,在此基础上提出缓存替换算法(Comprehensive Value Cache Replacement Algorithm for P2P,CVCR4P2P);对域间边缘服务器采用补丁预取与调度算法(Dynamic Scheduling Algorithm for Proxy Caching,DSA4ProxyC),通过基于用户访问情况自适应伸缩缓存的分配方案,使流媒体后缀部分在边缘服务器中缓存的数据段与其流行度成正比.理论分析及实验结果表明,混合流媒体缓存调度策略的实施能有效地降低骨干网络带宽资源消耗,对用户请求到达速率的变化具有良好的适应性.


    宋擒豹; 沈钧毅


    Similar customer groups, relevant Web pages, and frequent accesspaths can be discovered by analyzing of Web log files and customer database. In this paper, novel Web log mining algorithms are presented. First, according to Web site's directed graph defined, a URL-UserID relevant matrix is set up, where URL is taken as row and UserID is taken as column, and each element's value of this matrix is the user's hits. Second, similar customer groups are discovered by measuring similarity between column vectors, and relevant Web pages are obtained by measuring similarity between row vectors; frequent access paths can also be discovered by further processing of the latter. Experiments show the effectiveness of the algorithms.%通过对Web服务器日志文件和客户交易数据进行分析,可以发现相似客户群体、相关Web页面和频繁访问路径.提出了一种新颖的Web日志挖掘算法.在该算法中,首先以Web站点URL为行、以UserID为列建立URL-UserID关联矩阵,元素值为用户的访问次数,然后,对列向量进行相似性分析得到相似客户群体,对行向量进行相似性度量获得相关Web页面,对后者再进一步处理还可以发现频繁访问路径.实验结果表明了算法的有效性.