data mining: Topics by WorldWideScience.org

Sample records for data mining

Data mining, mining data : energy consumption modelling

Energy Technology Data Exchange (ETDEWEB)

Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

2007-09-15

Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.
Social big data mining

CERN Document Server

Ishikawa, Hiroshi

2015-01-01

Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.
Collaborative Data Mining

Science.gov (United States)

Moyle, Steve

Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.
Data mining in radiology

International Nuclear Information System (INIS)

Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

2014-01-01

Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining
Data preprocessing in data mining

CERN Document Server

García, Salvador; Herrera, Francisco

2015-01-01

Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying t...
Data mining in Cloud Computing

Directory of Open Access Journals (Sweden)

Ruxandra-Ştefania PETRE

2012-10-01

Full Text Available This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs.Data mining techniques and applications are very much needed in the cloud computing paradigm. The implementation of data mining techniques through Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage.
Data mining applications in healthcare.

Science.gov (United States)

Koh, Hian Chye; Tan, Gerald

2005-01-01

Data mining has been used intensively and extensively by many organizations. In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Data mining applications can greatly benefit all parties involved in the healthcare industry. For example, data mining can help healthcare insurers detect fraud and abuse, healthcare organizations make customer relationship management decisions, physicians identify effective treatments and best practices, and patients receive better and more affordable healthcare services. The huge amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional methods. Data mining provides the methodology and technology to transform these mounds of data into useful information for decision making. This article explores data mining applications in healthcare. In particular, it discusses data mining and its applications within healthcare in major areas such as the evaluation of treatment effectiveness, management of healthcare, customer relationship management, and the detection of fraud and abuse. It also gives an illustrative example of a healthcare data mining application involving the identification of risk factors associated with the onset of diabetes. Finally, the article highlights the limitations of data mining and discusses some future directions.
Security Measures in Data Mining

OpenAIRE

Anish Gupta; Vimal Bibhu; Rashid Hussain

2012-01-01

Data mining is a technique to dig the data from the large databases for analysis and executive decision making. Security aspect is one of the measure requirement for data mining applications. In this paper we present security requirement measures for the data mining. We summarize the requirements of security for data mining in tabular format. The summarization is performed by the requirements with different aspects of security measure of data mining. The performances and outcomes are determin...
Data Mining for CRM

Science.gov (United States)

Thearling, Kurt

Data Mining technology allows marketing organizations to better understand their customers and respond to their needs. This chapter describes how Data Mining can be combined with customer relationship management to help drive improved interactions with customers. An example showing how to use Data Mining to drive customer acquisition activities is presented.
Data mining for service

CERN Document Server

2014-01-01

Virtually all nontrivial and modern service related problems and systems involve data volumes and types that clearly fall into what is presently meant as "big data", that is, are huge, heterogeneous, complex, distributed, etc. Data mining is a series of processes which include collecting and accumulating data, modeling phenomena, and discovering new information, and it is one of the most important steps to scientific analysis of the processes of services. Data mining application in services requires a thorough understanding of the characteristics of each service and knowledge of the compatibility of data mining technology within each particular service, rather than knowledge only in calculation speed and prediction accuracy. Varied examples of services provided in this book will help readers understand the relation between services and data mining technology. This book is intended to stimulate interest among researchers and practitioners in the relation between data mining technology and its application to ...
Implications of Emerging Data Mining

Science.gov (United States)

Kulathuramaiyer, Narayanan; Maurer, Hermann

Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.
Data mining for bioinformatics applications

CERN Document Server

Zengyou, He

2015-01-01

Data Mining for Bioinformatics Applications provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems, including problem definition, data collection, data preprocessing, modeling, and validation. The text uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems, containing 45 bioinformatics problems that have been investigated in recent research. For each example, the entire data mining process is described, ranging from data preprocessing to modeling and result validation. Provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems Uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems Contains 45 bioinformatics problems that have been investigated in recent research.
Data mining

CERN Document Server

Gorunescu, Florin

2011-01-01

The knowledge discovery process is as old as Homo sapiens. Until some time ago, this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since 'knowledge is power'. The goal of this book is to provide, in a friendly way
Data Stream Mining

Science.gov (United States)

Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).
Data mining methods

CERN Document Server

Chattamvelli, Rajan

2015-01-01

DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...
DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW

OpenAIRE

Pragati Sharma; Dr. Sanjiv Sharma

2018-01-01

Recently, data mining is gaining more popularity among researcher. Data mining provides various techniques and methods for analysing data produced by various applications of different domain. Similarly, Educational mining is providing a way for analyzing educational data set. Educational mining concerns with developing methods for discovering knowledge from data that come from educational field and it helps to extract the hidden patterns and to discover new knowledge from large educational da...
Data Mining Aplications in Livestock

Directory of Open Access Journals (Sweden)

Feyza ALEV ÇETİN

2016-03-01

Full Text Available Data mining provides discovering the required and applicable knowledge from very large amounts of information collected in one centre. Data mining has been used in the information industry and society. Although many methods of data mining has been used, these techniques has been remarkable in animal husbandry in recent years. For the solution of complex problems in animal husbandry many methods were discussed and developed. Brief information on data mining techniques such as k-means approach, k-nearest neighbor approach, multivariate adaptive regression function (MARS, naive Bayesian classifiers (NBC, artificial neural networks (ANN, support vector machines (SVM, decision trees are given in the study. Some data mining methods are presented and examples of the application of data mining in the field of animal husbandry in the world are provided with this study.
Big data mining: In-database Oracle data mining over hadoop

Science.gov (United States)

Kovacheva, Zlatinka; Naydenova, Ina; Kaloyanova, Kalinka; Markov, Krasimir

2017-07-01

Big data challenges different aspects of storing, processing and managing data, as well as analyzing and using data for business purposes. Applying Data Mining methods over Big Data is another challenge because of huge data volumes, variety of information, and the dynamic of the sources. Different applications are made in this area, but their successful usage depends on understanding many specific parameters. In this paper we present several opportunities for using Data Mining techniques provided by the analytical engine of RDBMS Oracle over data stored in Hadoop Distributed File System (HDFS). Some experimental results are given and they are discussed.
Granular-relational data mining how to mine relational data in the paradigm of granular computing ?

CERN Document Server

Hońko, Piotr

2017-01-01

This book provides two general granular computing approaches to mining relational data, the first of which uses abstract descriptions of relational objects to build their granular representation, while the second extends existing granular data mining solutions to a relational case. Both approaches make it possible to perform and improve popular data mining tasks such as classification, clustering, and association discovery. How can different relational data mining tasks best be unified? How can the construction process of relational patterns be simplified? How can richer knowledge from relational data be discovered? All these questions can be answered in the same way: by mining relational data in the paradigm of granular computing! This book will allow readers with previous experience in the field of relational data mining to discover the many benefits of its granular perspective. In turn, those readers familiar with the paradigm of granular computing will find valuable insights on its application to mining r...
Mining Views : database views for data mining

NARCIS (Netherlands)

Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Nijssen, S.; De Raedt, L.

2007-01-01

We propose a relational database model towards the integration of data mining into relational database systems, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules, decision trees and clusterings, can be

Mining Views : database views for data mining

NARCIS (Netherlands)

Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.

2008-01-01

We present a system towards the integration of data mining into relational databases. To this end, a relational database model is proposed, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules and decision
Data mining in agriculture

CERN Document Server

Mucherino, Antonio; Pardalos, Panos M

2009-01-01

Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.
Data Mining Tools in Science Education

OpenAIRE

Premysl Zaskodny

2012-01-01

The main principle of paper is Data Mining in Science Education (DMSE) as Problem Solving. The main goal of paper is consisting in Delimitation of Complex Data Mining Tool and Partial Data Mining Tool of DMSE. The procedure of paper is consisting of Data Preprocessing in Science Education, Data Processing in Science Education, Description of Curricular Process as Complex Data Mining Tool (CP-DMSE), Description of Analytical Synthetic Modeling as Partial Data Mining Tool (ASM-DMSE) and finally...
Real world data mining applications

CERN Document Server

Abou-Nasr, Mahmoud; Stahlbock, Robert; Weiss, Gary M

2014-01-01

Data mining applications range from commercial to social domains, with novel applications appearing swiftly; for example, within the context of social networks. The expanding application sphere and social reach of advanced data mining raise pertinent issues of privacy and security. Present-day data mining is a progressive multidisciplinary endeavor. This inter- and multidisciplinary approach is well reflected within the field of information systems. The information systems research addresses software and hardware requirements for supporting computationally and data-intensive applications. Furthermore, it encompasses analyzing system and data aspects, and all manual or automated activities. In that respect, research at the interface of information systems and data mining has significant potential to produce actionable knowledge vital for corporate decision-making. The aim of the proposed volume is to provide a balanced treatment of the latest advances and developments in data mining; in particular, exploring s...
Data mining in pharma sector: benefits.

Science.gov (United States)

Ranjan, Jayanthi

2009-01-01

The amount of data getting generated in any sector at present is enormous. The information flow in the pharma industry is huge. Pharma firms are progressing into increased technology-enabled products and services. Data mining, which is knowledge discovery from large sets of data, helps pharma firms to discover patterns in improving the quality of drug discovery and delivery methods. The paper aims to present how data mining is useful in the pharma industry, how its techniques can yield good results in pharma sector, and to show how data mining can really enhance in making decisions using pharmaceutical data. This conceptual paper is written based on secondary study, research and observations from magazines, reports and notes. The author has listed the types of patterns that can be discovered using data mining in pharma data. The paper shows how data mining is useful in the pharma industry and how its techniques can yield good results in pharma sector. Although much work can be produced for discovering knowledge in pharma data using data mining, the paper is limited to conceptualizing the ideas and view points at this stage; future work may include applying data mining techniques to pharma data based on primary research using the available, famous significant data mining tools. Research papers and conceptual papers related to data mining in Pharma industry are rare; this is the motivation for the paper.
Mining High-Dimensional Data

Science.gov (United States)

Wang, Wei; Yang, Jiong

With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.
Data mining theories, algorithms, and examples

CERN Document Server

Ye, Nong

2013-01-01

AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal
Data-Mining Research in Education

OpenAIRE

Cheng, Jiechao

2017-01-01

As an interdisciplinary discipline, data mining (DM) is popular in education area especially when examining students' learning performances. It focuses on analyzing educational related data to develop models for improving learners' learning experiences and enhancing institutional effectiveness. Therefore, DM does help education institutions provide high-quality education for its learners. Applying data mining in education also known as educational data mining (EDM), which enables to better un...
Data Mining Web Services for Science Data Repositories

Science.gov (United States)

Graves, S.; Ramachandran, R.; Keiser, K.; Maskey, M.; Lynnes, C.; Pham, L.

2006-12-01

The maturation of web services standards and technologies sets the stage for a distributed "Service-Oriented Architecture" (SOA) for NASA's next generation science data processing. This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the Internet. NASA has initiated a project to create a suite of specialized data mining web services designed specifically for science data. The project leverages the Algorithm Development and Mining (ADaM) toolkit as its basis. The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by several research organizations and educational institutions worldwide. These mining services will give the scientific community a powerful and versatile data mining capability that can be used to create higher order products such as thematic maps from current and future NASA satellite data records with methods that are not currently available. The package of mining and related services are being developed using Web Services standards so that community-based measurement processing systems can access and interoperate with them. These standards-based services allow users different options for utilizing them, from direct remote invocation by a client application to deployment of a Business Process Execution Language (BPEL) solutions package where a complex data mining workflow is exposed to others as a single service. The ability to deploy and operate these services at a data archive allows the data mining algorithms to be run where the data are stored, a more efficient scenario than moving large amounts of data over the network. This will be demonstrated in a scenario in which a user uses a remote Web-Service-enabled clustering algorithm to create cloud masks from satellite imagery at the Goddard Earth Sciences Data and Information Services Center (GES DISC).
Mastering SQL Server 2014 data mining

CERN Document Server

Bassan, Amarpreet Singh

2014-01-01

If you are a developer who is working on data mining for large companies and would like to enhance your knowledge of SQL Server Data Mining Suite, this book is for you. Whether you are brand new to data mining or are a seasoned expert, you will be able to master the skills needed to build a data mining solution.
Applied data mining

CERN Document Server

Xu, Guandong

2013-01-01

Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.
The Hazards of Data Mining in Healthcare.

Science.gov (United States)

Househ, Mowafa; Aldosari, Bakheet

2017-01-01

From the mid-1990s, data mining methods have been used to explore and find patterns and relationships in healthcare data. During the 1990s and early 2000's, data mining was a topic of great interest to healthcare researchers, as data mining showed some promise in the use of its predictive techniques to help model the healthcare system and improve the delivery of healthcare services. However, it was soon discovered that mining healthcare data had many challenges relating to the veracity of healthcare data and limitations around predictive modelling leading to failures of data mining projects. As the Big Data movement has gained momentum over the past few years, there has been a reemergence of interest in the use of data mining techniques and methods to analyze healthcare generated Big Data. Much has been written on the positive impacts of data mining on healthcare practice relating to issues of best practice, fraud detection, chronic disease management, and general healthcare decision making. Little has been written about the limitations and challenges of data mining use in healthcare. In this review paper, we explore some of the limitations and challenges in the use of data mining techniques in healthcare. Our results show that the limitations of data mining in healthcare include reliability of medical data, data sharing between healthcare organizations, inappropriate modelling leading to inaccurate predictions. We conclude that there are many pitfalls in the use of data mining in healthcare and more work is needed to show evidence of its utility in facilitating healthcare decision-making for healthcare providers, managers, and policy makers and more evidence is needed on data mining's overall impact on healthcare services and patient care.
Privacy-Preserving Data Mining of Medical Data Using Data Separation-Based Techniques

Directory of Open Access Journals (Sweden)

Gang Kou

2007-08-01

Full Text Available Data mining is concerned with the extraction of useful knowledge from various types of data. Medical data mining has been a popular data mining topic of late. Compared with other data mining areas, medical data mining has some unique characteristics. Because medical files are related to human subjects, privacy concerns are taken more seriously than other data mining tasks. This paper applied data separation-based techniques to preserve privacy in classification of medical data. We take two approaches to protect privacy: one approach is to vertically partition the medical data and mine these partitioned data at multiple sites; the other approach is to horizontally split data across multiple sites. In the vertical partition approach, each site uses a portion of the attributes to compute its results, and the distributed results are assembled at a central trusted party using a majority-vote ensemble method. In the horizontal partition approach, data are distributed among several sites. Each site computes its own data, and a central trusted party is responsible to integrate these results. We implement these two approaches using medical datasets from UCI KDD archive and report the experimental results.
Accounting and Financial Data Analysis Data Mining Tools

Directory of Open Access Journals (Sweden)

Diana Elena Codreanu

2011-05-01

Full Text Available Computerized accounting systems in recent years have seen an increase in complexity due to thecompetitive economic environment but with the help of data analysis solutions such as OLAP and DataMining can be a multidimensional data analysis, can detect the fraud and can discover knowledge hidden indata, ensuring such information is useful for decision making within the organization. In the literature thereare many definitions for data mining but all boils down to same idea: the process takes place to extract newinformation from large data collections, information without the aid of data mining tools would be verydifficult to obtain. Information obtained by data mining process has the advantage that only respond to thequestion of what happens but at the same time argue and show why certain things are happening. In this paperwe wish to present advanced techniques for analysis and exploitation of data stored in a multidimensionaldatabase.
Contrast data mining concepts, algorithms, and applications

CERN Document Server

Dong, Guozhu

2012-01-01

A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies
Data Mining Solutions for the Business Environment

Directory of Open Access Journals (Sweden)

Ruxandra-Stefania PETRE

2014-02-01

Full Text Available Over the past years, data mining became a matter of considerable importance due to the large amounts of data available in the applications belonging to various domains. Data mining, a dynamic and fast-expanding field, that applies advanced data analysis techniques, from statistics, machine learning, database systems or artificial intelligence, in order to discover relevant patterns, trends and relations contained within the data, information impossible to observe using other techniques. The paper focuses on presenting the applications of data mining in the business environment. It contains a general overview of data mining, providing a definition of the concept, enumerating six primary data mining techniques and mentioning the main fields for which data mining can be applied. The paper also presents the main business areas which can benefit from the use of data mining tools, along with their use cases: retail, banking and insurance. Also the main commercially available data mining tools and their key features are presented within the paper. Besides the analysis of data mining and the business areas that can successfully apply it, the paper presents the main features of a data mining solution that can be applied for the business environment and the architecture, with its main components, for the solution, that would help improve customer experiences and decision-making
Collaborative Data Mining Tool for Education

Science.gov (United States)

Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

2009-01-01

This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…
Set-oriented data mining in relational databases

NARCIS (Netherlands)

Houtsma, M.A.W.; Swami, Arun

1995-01-01

Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are
Application of data mining techniques for nuclear data and instrumentation

International Nuclear Information System (INIS)

Toshniwal, Durga

2013-01-01

Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly
Process mining online assessment data

NARCIS (Netherlands)

Pechenizkiy, M.; Trcka, N.; Vasilyeva, E.; Aalst, van der W.M.P.; De Bra, P.M.E.; Barnes, T.; Desmarais, M.; Romero, C.; Ventura, S.

2009-01-01

Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of

Applied data mining for business and industry

CERN Document Server

Giudici, Paolo

2009-01-01

The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...
Pocket data mining big data on small devices

CERN Document Server

Gaber, Mohamed Medhat; Gomes, Joao Bartolo

2014-01-01

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the depl...
Data mining for social network data

CERN Document Server

Memon, Nasrullah; Hicks, David L; Chen, Hsinchun

2010-01-01

Driven by counter-terrorism efforts, marketing analysis and an explosion in online social networking in recent years, data mining has moved to the forefront of information science. This proposed Special Issue on ""Data Mining for Social Network Data"" will present a broad range of recent studies in social networking analysis. It will focus on emerging trends and needs in discovery and analysis of communities, solitary and social activities, and activities in open fora, and commercial sites as well. It will also look at network modeling, infrastructure construction, dynamic growth and evolution
Organizational Data Mining

Science.gov (United States)

Nemati, Hamid R.; Barko, Christopher D.

Many organizations today possess substantial quantities of business information but have very little real business knowledge. A recent survey of 450 business executives reported that managerial intuition and instinct are more prevalent than hard facts in driving organizational decisions. To reverse this trend, businesses of all sizes would be well advised to adopt Organizational Data Mining (ODM). ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage. ODM has helped many organizations optimize internal resource allocations while better understanding and responding to the needs of their customers. The fundamental aspects of ODM can be categorized into Artificial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT), with OT being the key distinction between ODM and Data Mining. In this chapter, we introduce ODM, explain its unique characteristics, and report on the current status of ODM research. Next we illustrate how several leading organizations have adopted ODM and are benefiting from it. Then we examine the evolution of ODM to the present day and conclude our chapter by contemplating ODM's challenging yet opportunistic future.
Process Mining Online Assessment Data

Science.gov (United States)

Pechenizkiy, Mykola; Trcka, Nikola; Vasilyeva, Ekaterina; van der Aalst, Wil; De Bra, Paul

2009-01-01

Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for…
Data Mining Based on Cloud-Computing Technology

Directory of Open Access Journals (Sweden)

Ren Ying

2016-01-01

Full Text Available There are performance bottlenecks and scalability problems when traditional data-mining system is used in cloud computing. In this paper, we present a data-mining platform based on cloud computing. Compared with a traditional data mining system, this platform is highly scalable, has massive data processing capacities, is service-oriented, and has low hardware cost. This platform can support the design and applications of a wide range of distributed data-mining systems.
Data mining mobile devices

CERN Document Server

Mena, Jesus

2013-01-01

With today's consumers spending more time on their mobiles than on their PCs, new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire.Data Mining Mobile Devices defines the collection of machine-sensed environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users
A Survey of Educational Data-Mining Research

Science.gov (United States)

Huebner, Richard A.

2013-01-01

Educational data mining (EDM) is an emerging discipline that focuses on applying data mining tools and techniques to educationally related data. The discipline focuses on analyzing educational data to develop models for improving learning experiences and improving institutional effectiveness. A literature review on educational data mining topics…
Data Mining Mining Data: MSHA Enforcement Efforts, Underground Coal Mine Safety, and New Health Implications

OpenAIRE

Kniesner, Thomas J.; Leeth, John D.

2003-01-01

Studies of industrial safety regulations, OSHA in particular, often find little effect on worker safety. Critics of the regulatory approach argue that safety standards have little to do with industrial injuries, and defenders of the regulatory approach cite infrequent inspections and low penalties for violating safety standards. We use recently assembled data from the Mine Safety and Health Administration (MSHA) concerning underground coal mine production, safety regulatory activities, and wo...
Open-source tools for data mining.

Science.gov (United States)

Zupan, Blaz; Demsar, Janez

2008-03-01

With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
Learning data mining with R

CERN Document Server

Makhabel, Bater

2015-01-01

This book is intended for the budding data scientist or quantitative analyst with only a basic exposure to R and statistics. This book assumes familiarity with only the very basics of R, such as the main data types, simple functions, and how to move data around. No prior experience with data mining packages is necessary; however, you should have a basic understanding of data mining concepts and processes.
Quantification of Operational Risk Using A Data Mining

Science.gov (United States)

Perera, J. Sebastian

1999-01-01

What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process
Transparent data mining for big and small data

CERN Document Server

Quercia, Daniele; Pasquale, Frank

2017-01-01

This book focuses on new and emerging data mining solutions that offer a greater level of transparency than existing solutions. Transparent data mining solutions with desirable properties (e.g. effective, fully automatic, scalable) are covered in the book. Experimental findings of transparent solutions are tailored to different domain experts, and experimental metrics for evaluating algorithmic transparency are presented. The book also discusses societal effects of black box vs. transparent approaches to data mining, as well as real-world use cases for these approaches. As algorithms increasingly support different aspects of modern life, a greater level of transparency is sorely needed, not least because discrimination and biases have to be avoided. With contributions from domain experts, this book provides an overview of an emerging area of data mining that has profound societal consequences, and provides the technical background to for readers to contribute to the field or to put existing approaches to prac...
Biomedical Data Mining

NARCIS (Netherlands)

Peek, N.; Combi, C.; Tucker, A.

2009-01-01

Objective: To introduce the special topic of Methods of Information in Medicine on data mining in biomedicine, with selected papers from two workshops on Intelligent Data Analysis in bioMedicine (IDAMAP) held in Verona (2006) and Amsterdam (2007). Methods: Defining the field of biomedical data
Data mining for dummies

CERN Document Server

Brown, Meta S

2014-01-01

Delve into your data for the key to success Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business''s entire paradigm for a more successful outcome. Data Mining for Dummies shows you why it doesn''t take a data scientist to gain
Finding Gold in Data Mining

Science.gov (United States)

Flaherty, Bill

2013-01-01

Data-mining systems provide a variety of opportunities for school district personnel to streamline operations and focus on student achievement. This article describes the value of data mining for school personnel, finance departments, teacher evaluations, and in the classroom. It suggests that much could be learned about district practices if one…
Exploring the Integration of Data Mining and Data Visualization

Science.gov (United States)

Zhang, Yi

2011-01-01

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be…
Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.

Science.gov (United States)

Al-Saggaf, Yeslam; Islam, Md Zahidul

2015-08-01

This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.
Data mining in e-commerce: A survey

Indian Academy of Sciences (India)

R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

it is only apposite to seek the services of data mining to make (business) sense out of these data sets. Data mining ..... for the simple reason that for practical purposes, it is sufficient to include snapshots of data taken at say, weekly ..... of the mining environment and the expenses the user is willing to incur). The authors have.
Data Mining Mining Data: MSHA Enforcement Efforts, Underground Coal Mine Safety, and New Health Policy Implications

OpenAIRE

Thomas J. Kniesner; John D. Leeth

2003-01-01

Studies of industrial safety regulations, Occupational Safety and Health Administration (OSHA) in particular, often find little effect on worker safety. Critics of the regulatory approach argue that safety standards have little to do with industrial injuries and defenders of the regulatory approach cite infrequent inspections and low fines for violating safety standards. We use recently assembled data from the Mine Safety and Health Administration (MSHA) concerning underground coal mine produ...

Educational data mining and learning analytics

OpenAIRE

Vera Hernández, Joan Carles

2017-01-01

Treball basat en Educational Data Mining & Learning Analitics d'anàlisi de la matriculació dels alumnes i el seu impacte sobre la decisió de tornar-se a matricular. Trabajo basado en Educational Data Mining & Learning Analytics análisis de la matriculación de los alumnos y su impacto sobre la decisión de volverse a matricular. Work based on Educational Data Mining & Learning Analytics analysis of student enrollment and its impact on the decision to re-enroll.
MouseMine: a new data warehouse for MGI.

Science.gov (United States)

Motenko, H; Neuhauser, S B; O'Keefe, M; Richardson, J E

2015-08-01

MouseMine (www.mousemine.org) is a new data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Based on the InterMine software framework, MouseMine supports powerful query, reporting, and analysis capabilities, the ability to save and combine results from different queries, easy integration into larger workflows, and a comprehensive Web Services layer. Through MouseMine, users can access a significant portion of MGI data in new and useful ways. Importantly, MouseMine is also a member of a growing community of online data resources based on InterMine, including those established by other model organism databases. Adopting common interfaces and collaborating on data representation standards are critical to fostering cross-species data analysis. This paper presents a general introduction to MouseMine, presents examples of its use, and discusses the potential for further integration into the MGI interface.
A survey of temporal data mining

Indian Academy of Sciences (India)

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The ﬁeld of temporal data mining is concerned with such analysis in the case of ordered data streams ...
Data mining methods and applications

CERN Document Server

Lawrence, Kenneth D; Klimberg, Ronald K

2007-01-01

With today's information explosion, many organizations are now able to access a wealth of valuable data. Unfortunately, most of these organizations find they are ill-equipped to organize this information, let alone put it to work for them. Gain a Competitive Advantage Employ data mining in research and forecasting Build models with data management tools and methodology optimization Gain sophisticated breakdowns and complex analysis through multivariate, evolutionary, and neural net methodsLearn how to classify data and maintain qualityTransform Data into Business Acumen Data Mining Methods and
Data Mining for Intrusion Detection

Science.gov (United States)

Singhal, Anoop; Jajodia, Sushil

Data Mining Techniques have been successfully applied in many different fields including marketing, manufacturing, fraud detection and network management. Over the past years there is a lot of interest in security technologies such as intrusion detection, cryptography, authentication and firewalls. This chapter discusses the application of Data Mining techniques to computer security. Conclusions are drawn and directions for future research are suggested.
Data Mining Tools for Malware Detection

CERN Document Server

Masud, Mehedy; Thuraisingham, Bhavani; Andreasson, Kim J

2011-01-01

Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets. The authors describe the systems they have designed and devel
A survey of temporal data mining

Indian Academy of Sciences (India)

other subtle relationships in the data using a combination of techniques from ... stamped list of items bought by customers lends itself to data mining analysis that ...... Frequent episode mining can be used here as part of an alarm management.
IT Data Mining Tool Uses in Aerospace

Science.gov (United States)

Monroe, Gilena A.; Freeman, Kenneth; Jones, Kevin L.

2012-01-01

Data mining has a broad spectrum of uses throughout the realms of aerospace and information technology. Each of these areas has useful methods for processing, distributing, and storing its corresponding data. This paper focuses on ways to leverage the data mining tools and resources used in NASA's information technology area to meet the similar data mining needs of aviation and aerospace domains. This paper details the searching, alerting, reporting, and application functionalities of the Splunk system, used by NASA's Security Operations Center (SOC), and their potential shared solutions to address aircraft and spacecraft flight and ground systems data mining requirements. This paper also touches on capacity and security requirements when addressing sizeable amounts of data across a large data infrastructure.
Applications of Data Mining in Higher Education

OpenAIRE

Monika Goyal; Rajan Vohra

2012-01-01

Data analysis plays an important role for decision support irrespective of type of industry like any manufacturing unit and educations system. There are many domains in which data mining techniques plays an important role. This paper proposes the use of data mining techniques to improve the efficiency of higher education institution. If data mining techniques such as clustering, decision tree and association are applied to higher education processes, it would help to improve students performa...
Comparative analysis of data mining techniques for business data

Science.gov (United States)

Jamil, Jastini Mohd; Shaharanee, Izwan Nizal Mohd

2014-12-01

Data mining is the process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data contained within a database. Companies are using this tool to further understand their customers, to design targeted sales and marketing campaigns, to predict what product customers will buy and the frequency of purchase, and to spot trends in customer preferences that can lead to new product development. In this paper, we conduct a systematic approach to explore several of data mining techniques in business application. The experimental result reveals that all data mining techniques accomplish their goals perfectly, but each of the technique has its own characteristics and specification that demonstrate their accuracy, proficiency and preference.
Software tool for data mining and its applications

Science.gov (United States)

Yang, Jie; Ye, Chenzhou; Chen, Nianyi

2002-03-01

A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Utility Independent Privacy Preserving Data Mining - Horizontally Partitioned Data

Directory of Open Access Journals (Sweden)

E Poovammal

2010-06-01

Full Text Available Micro data is a valuable source of information for research. However, publishing data about individuals for research purposes, without revealing sensitive information, is an important problem. The main objective of privacy preserving data mining algorithms is to obtain accurate results/rules by analyzing the maximum possible amount of data without unintended information disclosure. Data sets for analysis may be in a centralized server or in a distributed environment. In a distributed environment, the data may be horizontally or vertically partitioned. We have developed a simple technique by which horizontally partitioned data can be used for any type of mining task without information loss. The partitioned sensitive data at 'm' different sites are transformed using a mapping table or graded grouping technique, depending on the data type. This transformed data set is given to a third party for analysis. This may not be a trusted party, but it is still allowed to perform mining operations on the data set and to release the results to all the 'm' parties. The results are interpreted among the 'm' parties involved in the data sharing. The experiments conducted on real data sets prove that our proposed simple transformation procedure preserves one hundred percent of the performance of any data mining algorithm as compared to the original data set while preserving privacy.
Data preprocessing for data mining

OpenAIRE

Ren, Yifei

2013-01-01

People have increasing amounts data in the current prosperous information age. In order to improve competitive power and work efficiency, discovering knowledge from data is becoming more and more important. Data mining, as an emerging interdisciplinary applications field, plays a significant role in various trades’ and industries' decision making. However, it is known that original data is always dirty and not suitable for further analysis which have become a major obstacle of finding knowled...
Physics Mining of Multi-Source Data Sets

Science.gov (United States)

Helly, John; Karimabadi, Homa; Sipes, Tamara

2012-01-01

Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.
Big data mining analysis method based on cloud computing

Science.gov (United States)

Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

2017-08-01

Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

Directory of Open Access Journals (Sweden)

Knaus William A

2006-03-01

Full Text Available Abstract Background Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness, hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. Methods The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. Results We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. Conclusion The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of
Possibility of Integrated Data Mining of Clinical Data

Directory of Open Access Journals (Sweden)

Akinori Abe

2007-03-01

Full Text Available In this paper, we introduce integrated data mining. Because of recent rapid progress in medical science as well as clinical diagnosis and treatment, integrated and cooperative research among medical researchers, biology, engineering, cultural science, and sociology is required. Therefore, we propose a framework called Cyber Integrated Medical Infrastructure (CIMI. Within this framework, we can deal with various types of data and consequently need to integrate those data prior to analysis. In this study, for medical science, we analyze the features and relationships among various types of data and show the possibility of integrated data mining.
Data Mining and Analysis

Science.gov (United States)

Samms, Kevin O.

2015-01-01

The Data Mining project seeks to bring the capability of data visualization to NASA anomaly and problem reporting systems for the purpose of improving data trending, evaluations, and analyses. Currently NASA systems are tailored to meet the specific needs of its organizations. This tailoring has led to a variety of nomenclatures and levels of annotation for procedures, parts, and anomalies making difficult the realization of the common causes for anomalies. Making significant observations and realizing the connection between these causes without a common way to view large data sets is difficult to impossible. In the first phase of the Data Mining project a portal was created to present a common visualization of normalized sensitive data to customers with the appropriate security access. The tool of the visualization itself was also developed and fine-tuned. In the second phase of the project we took on the difficult task of searching and analyzing the target data set for common causes between anomalies. In the final part of the second phase we have learned more about how much of the analysis work will be the job of the Data Mining team, how to perform that work, and how that work may be used by different customers in different ways. In this paper I detail how our perspective has changed after gaining more insight into how the customers wish to interact with the output and how that has changed the product.
Mining Product Data Models: A Case Study

Directory of Open Access Journals (Sweden)

Cristina-Claudia DOLEAN

2014-01-01

Full Text Available This paper presents two case studies used to prove the validity of some data-flow mining algorithms. We proposed the data-flow mining algorithms because most part of mining algorithms focuses on the control-flow perspective. First case study uses event logs generated by an ERP system (Navision after we set several trackers on the data elements needed in the process analyzed; while the second case study uses the event logs generated by YAWL system. We offered a general solution of data-flow model extraction from different data sources. In order to apply the data-flow mining algorithms the event logs must comply a certain format (using InputOutput extension. But to respect this format, a set of conversion tools is needed. We depicted the conversion tools used and how we got the data-flow models. Moreover, the data-flow model is compared to the control-flow model.
Process mining : data science in action

NARCIS (Netherlands)

Van der Aalst, W.M.P.

2016-01-01

This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a

Data Mining and Machine Learning in Astronomy

Science.gov (United States)

Ball, Nicholas M.; Brunner, Robert J.

We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
DATA MINING AND APPLICATION OF IT TO CAPITAL MARKETS

Directory of Open Access Journals (Sweden)

Cenk AKKAYA

2011-07-01

Full Text Available Nowadays with the development of technology importance given to knowledge increases gradually. Data mining enables to form forecasts and models regarding future by making use of past data. Any method which helps to discover data can be used as a data mining method. Enterprises gain important competitive advantage by data mining methods. Data mining is used in different fields. In finance field it is a specially used in financial performance applications, guessing the enterprise bankruptcies and failures, determining transaction manipulation, determining financial risk management, determining customer profile and depth management. It can be costly, risky and time consuming for enterprises to gain knowledge. Thus today enterprises use data mining as an innovative competitive mean. The aim of the study is to determine the importance of data mining applications to capital markets.
Data Mining Practical Machine Learning Tools and Techniques

CERN Document Server

Witten, Ian H; Hall, Mark A

2011-01-01

Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place
Data Mining and Statistics for Decision Making

CERN Document Server

Tufféry, Stéphane

2011-01-01

Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin
4D seismic data acquisition method during coal mining

International Nuclear Information System (INIS)

Du, Wen-Feng; Peng, Su-Ping

2014-01-01

In order to observe overburden media changes caused by mining processing, we take the fully-mechanized working face of the BLT coal mine in Shendong mine district as an example to develop a 4D seismic data acquisition methodology during coal mining. The 4D seismic data acquisition is implemented to collect 3D seismic data four times in different periods, such as before mining, during the mining process and after mining to observe the changes of the overburden layer during coal mining. The seismic data in the research area demonstrates that seismic waves are stronger in energy, higher in frequency and have better continuous reflectors before coal mining. However, all this is reversed after coal mining because the overburden layer has been mined, the seismic energy and frequency decrease, and reflections have more discontinuities. Comparing the records collected in the survey with those from newly mined areas and other records acquired in the same survey with the same geometry and with a long time for settling after mining, it clearly shows that the seismic reflections have stronger amplitudes and are more continuous because the media have recovered by overburden layer compaction after a long time of settling after mining. By 4D seismic acquisition, the original background investigation of the coal layers can be derived from the first records, then the layer structure changes can be monitored through the records of mining action and compaction action after mining. This method has laid the foundation for further research into the variation principles of the overburden layer under modern coal-mining conditions. (paper)
Data Mining and Data Fusion for Enhanced Decision Support

Energy Technology Data Exchange (ETDEWEB)

Khan, Shiraj [ORNL; Ganguly, Auroop R [ORNL; Gupta, Amar [University of Arizona

2008-01-01

The process of Data Mining converts information to knowledge by utilizing tools from the disciplines of computational statistics, database technologies, machine learning, signal processing, nonlinear dynamics, process modeling, simulation, and allied disciplines. Data Mining allows business problems to be analyzed from diverse perspectives, including dimensionality reduction, correlation and co-occurrence, clustering and classification, regression and forecasting, anomaly detection, and change analysis. The predictive insights generated from Data Mining can be further utilized through real-time analysis and decision sciences, as well as through human-driven analysis based on management by exceptions or by objectives, to generate actionable knowledge. The tools that enable the transformation of raw data to actionable predictive insights are collectively referred as Decision Support tools. This chapter presents a new formalization of the decision process, leading to a new Decision Superiority model, partially motivated by the Joint Directors of Laboratories (JDL) Data Fusion Model. In addition, it examines the growing importance of Data Fusion concepts.
The Top Ten Algorithms in Data Mining

CERN Document Server

Wu, Xindong

2009-01-01

From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc
Research on Customer Value Based on Extension Data Mining

Science.gov (United States)

Chun-Yan, Yang; Wei-Hua, Li

Extenics is a new discipline for dealing with contradiction problems with formulize model. Extension data mining (EDM) is a product combining Extenics with data mining. It explores to acquire the knowledge based on extension transformations, which is called extension knowledge (EK), taking advantage of extension methods and data mining technology. EK includes extensible classification knowledge, conductive knowledge and so on. Extension data mining technology (EDMT) is a new data mining technology that mining EK in databases or data warehouse. Customer value (CV) can weigh the essentiality of customer relationship for an enterprise according to an enterprise as a subject of tasting value and customers as objects of tasting value at the same time. CV varies continually. Mining the changing knowledge of CV in databases using EDMT, including quantitative change knowledge and qualitative change knowledge, can provide a foundation for that an enterprise decides the strategy of customer relationship management (CRM). It can also provide a new idea for studying CV.
Advanced Data Mining of Leukemia Cells Micro-Arrays

Directory of Open Access Journals (Sweden)

Richard S. Segall

2009-12-01

Full Text Available This paper provides continuation and extensions of previous research by Segall and Pierce (2009a that discussed data mining for micro-array databases of Leukemia cells for primarily self-organized maps (SOM. As Segall and Pierce (2009a and Segall and Pierce (2009b the results of applying data mining are shown and discussed for the data categories of microarray databases of HL60, Jurkat, NB4 and U937 Leukemia cells that are also described in this article. First, a background section is provided on the work of others pertaining to the applications of data mining to micro-array databases of Leukemia cells and micro-array databases in general. As noted in predecessor article by Segall and Pierce (2009a, micro-array databases are one of the most popular functional genomics tools in use today. This research in this paper is intended to use advanced data mining technologies for better interpretations and knowledge discovery as generated by the patterns of gene expressions of HL60, Jurkat, NB4 and U937 Leukemia cells. The advanced data mining performed entailed using other data mining tools such as cubic clustering criterion, variable importance rankings, decision trees, and more detailed examinations of data mining statistics and study of other self-organized maps (SOM clustering regions of workspace as generated by SAS Enterprise Miner version 4. Conclusions and future directions of the research are also presented.
Data mining applications in the context of casemix.

Science.gov (United States)

Koh, H C; Leong, S K

2001-07-01

In October 1999, the Singapore Government introduced casemix-based funding to public hospitals. The casemix approach to health care funding is expected to yield significant benefits, including equity and rationality in financing health care, the use of comparative casemix data for quality improvement activities, and the provision of information that enables hospitals to understand their cost behaviour and reinforces the drive for more cost-efficient services. However, there is some concern about the "quicker and sicker" syndrome (that is, the rapid discharge of patients with little regard for the quality of outcome). As it is likely that consequences of premature discharges will be reflected in the readmission data, an analysis of possible systematic patterns in readmission data can provide useful insight into the "quicker and sicker" syndrome. This paper explores potential data mining applications in the context of casemix by using readmission data as an illustration. In particular, it illustrates how data mining can be used to better understand readmission data and to detect systematic patterns, if any. From a technical perspective, data mining (which is capable of analysing complex non-linear and interaction relationships) supplements and complements traditional statistical methods in data analysis. From an applications perspective, data mining provides the technology and methodology to analyse mass volume of data to detect hidden patterns in data. Using readmission data as an illustrative data mining application, this paper explores potential data mining applications in the general casemix context.
Recurrent process mining with live event data

NARCIS (Netherlands)

Syamsiyah, A.; van Dongen, B.F.; van der Aalst, W.M.P.; Teniente, E.; Weidlich, M.

2018-01-01

In organizations, process mining activities are typically performed in a recurrent fashion, e.g. once a week, an event log is extracted from the information systems and a process mining tool is used to analyze the process’ characteristics. Typically, process mining tools import the data from a
Application and Exploration of Big Data Mining in Clinical Medicine.

Science.gov (United States)

Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

2016-03-20

To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.
Robust processing of mining subsidence monitoring data

Energy Technology Data Exchange (ETDEWEB)

Mingzhong, Wang; Guogang, Huang [Pingdingshan Mining Bureau (China); Yunjia, Wang; Guogangli, [China Univ. of Mining and Technology, Xuzhou (China)

1997-12-31

Since China began to do research on mining subsidence in 1950s, more than one thousand lines have been observed. Yet, monitoring data sometimes contain quite a lot of outliers because of the limit of observation and geological mining conditions. In China, nowdays, the method of processing mining subsidence monitoring data is based on the principle of the least square method. It is possible to produce lower accuracy, less reliability, or even errors. For reason given above, the authors, according to Chinese actual situation, have done some research work on the robust processing of mining subsidence monitoring data in respect of how to get prediction parameters. The authors have derived related formulas, designed some computational programmes, done a great quantity of actual calculation and simulation, and achieved good results. (orig.)
Robust processing of mining subsidence monitoring data

Energy Technology Data Exchange (ETDEWEB)

Wang Mingzhong; Huang Guogang [Pingdingshan Mining Bureau (China); Wang Yunjia; Guogangli [China Univ. of Mining and Technology, Xuzhou (China)

1996-12-31

Since China began to do research on mining subsidence in 1950s, more than one thousand lines have been observed. Yet, monitoring data sometimes contain quite a lot of outliers because of the limit of observation and geological mining conditions. In China, nowdays, the method of processing mining subsidence monitoring data is based on the principle of the least square method. It is possible to produce lower accuracy, less reliability, or even errors. For reason given above, the authors, according to Chinese actual situation, have done some research work on the robust processing of mining subsidence monitoring data in respect of how to get prediction parameters. The authors have derived related formulas, designed some computational programmes, done a great quantity of actual calculation and simulation, and achieved good results. (orig.)
Mining and Integration of Environmental Data

Science.gov (United States)

Tran, V.; Hluchy, L.; Habala, O.; Ciglan, M.

2009-04-01

The project ADMIRE (Advanced Data Mining and Integration Research for Europe) is a 7th FP EU ICT project aims to deliver a consistent and easy-to-use technology for extracting information and knowledge. The project is motivated by the difficulty of extracting meaningful information by data mining combinations of data from multiple heterogeneous and distributed resources. It will also provide an abstract view of data mining and integration, which will give users and developers the power to cope with complexity and heterogeneity of services, data and processes. The data sets describing phenomena from domains like business, society, and environment often contain spatial and temporal dimensions. Integration of spatio-temporal data from different sources is a challenging task due to those dimensions. Different spatio-temporal data sets contain data at different resolutions (e.g. size of the spatial grid) and frequencies. This heterogeneity is the principal challenge of geo-spatial and temporal data sets integration - the integrated data set should hold homogeneous data of the same resolution and frequency. Thus, to integrate heterogeneous spatio-temporal data from distinct source, transformation of one or more data sets is necessary. Following transformation operation are required: • transformation to common spatial and temporal representation - (e.g. transformation to common coordinate system), • spatial and/or temporal aggregation - data from detailed data source are aggregated to match the resolution of other resources involved in the integration process, • spatial and/or temporal record decomposition - records from source with lower resolution data are decomposed to match the granularity of the other data source. This operation decreases data quality (e.g. transformation of data from 50km grid to 10 km grid) - data from lower resolution data set in the integrated schema are imprecise, but it allows us to preserve higher resolution data. We can decompose the
On data mining in context : cases, fusion and evaluation

NARCIS (Netherlands)

Putten, Petrus Wilhelmus Henricus van der

2010-01-01

Data mining can be seen as a process, with modeling as the core step. However, other steps such as planning, data preparation, evaluation and deployment are of key importance for applications. This thesis studies data mining in the context of these other steps with the goal of improving data mining
Warehousing Structured and Unstructured Data for Data Mining.

Science.gov (United States)

Miller, L. L.; Honavar, Vasant; Barta, Tom

1997-01-01

Describes an extensible object-oriented view system that supports the integration of both structured and unstructured data sources in either the multidatabase or data warehouse environment. Discusses related work and data mining issues. (AEF)
WEKA-G: Parallel data mining on computational grids

Directory of Open Access Journals (Sweden)

PIMENTA, A.

2009-12-01

Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.
Application and Exploration of Big Data Mining in Clinical Medicine

Science.gov (United States)

Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

2016-01-01

Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378
Usage of Data Mining at Financial Decision Making

Directory of Open Access Journals (Sweden)

Levent BORAN

2014-06-01

Full Text Available The knowledge age requires controlling every kind of information. Recognition of patterns in data may provide previously unknown and useful information that can provide competitive advantages. If related techniques are applied on financial statements, it is possible to acquire valuable information about companies’ financial situations. It is considered that data mining could be an alternative of common financial analysis techniques such as vertical analysis, horizontal analysis, trend analysis and ratio analysis. Against existing financial analysis methods, data mining provides some advantages, which are ability of manipulation of huge data and competence of obtaining previously unknown information. There exist two major constraints of data mining implementation that are lack of experts on both data mining and related domains and cost of computer software and hardware used.

Ensemble Data Mining Methods

Data.gov (United States)

National Aeronautics and Space Administration — Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve...
A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

Science.gov (United States)

Pérez, Joaquín; Iturbide, Emmanuel; Olivares, Víctor; Hidalgo, Miguel; Martínez, Alicia; Almanza, Nelva

2015-11-01

It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50% or up to 70% of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
Spatiotemporal Data Mining: A Computational Perspective

Directory of Open Access Journals (Sweden)

Shashi Shekhar

2015-10-01

Full Text Available Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal databases. It has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. The complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns. In this survey, we review recent computational techniques and tools in spatiotemporal data mining, focusing on several major pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection. Compared with other surveys in the literature, this paper emphasizes the statistical foundations of spatiotemporal data mining and provides comprehensive coverage of computational approaches for various pattern families. ISPRS Int. J. Geo-Inf. 2015, 4 2307 We also list popular software tools for spatiotemporal data analysis. The survey concludes with a look at future research needs.
Using Data Mining to Teach Applied Statistics and Correlation

Science.gov (United States)

Hartnett, Jessica L.

2016-01-01

This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Data Mining at NASA: From Theory to Applications

Science.gov (United States)

Srivastava, Ashok N.

2009-01-01

This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.
Data mining for the social sciences an introduction

CERN Document Server

Attewell, Paul

2015-01-01

We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining
Privacy Preserving Distributed Data Mining

Data.gov (United States)

National Aeronautics and Space Administration — Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring...
Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

CERN Document Server

Ratner, Bruce

2011-01-01

The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has
Improving diagnostic accuracy using agent-based distributed data mining system.

Science.gov (United States)

Sridhar, S

2013-09-01

The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.
PROGRAMS WITH DATA MINING CAPABILITIES

Directory of Open Access Journals (Sweden)

Ciobanu Dumitru

2012-03-01

Full Text Available The fact that the Internet has become a commodity in the world has created a framework for anew economy. Traditional businesses migrate to this new environment that offers many features and options atrelatively low prices. However competitiveness is fierce and successful Internet business is tied to rigorous use of allavailable information. The information is often hidden in data and for their retrieval is necessary to use softwarecapable of applying data mining algorithms and techniques. In this paper we want to review some of the programswith data mining capabilities currently available in this area.We also propose some classifications of this softwareto assist those who wish to use such software.
Data Mining Solutions for the Business Environment

OpenAIRE

Ruxandra-Stefania PETRE

2013-01-01

Over the past years, data mining became a matter of considerable importance due to the large amounts of data available in the applications belonging to various domains. Data mining, a dynamic and fast-expanding field, that applies advanced data analysis techniques, from statistics, machine learning, database systems or artificial intelligence, in order to discover relevant patterns, trends and relations contained within the data, information impossible to observe using other techniques. The p...
Data Analysis and Data Mining: Current Issues in Biomedical Informatics

Science.gov (United States)

Bellazzi, Riccardo; Diomidous, Marianna; Sarkar, Indra Neil; Takabayashi, Katsuhiko; Ziegler, Andreas; McCray, Alexa T.

2011-01-01

Summary Background Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. Objectives To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. Results The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. Conclusions Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers. PMID:22146916
Multipass mining sequence room closures: In situ data report

International Nuclear Information System (INIS)

Munson, D.E.; Jones, R.L.; Northrop-Salazar, C.L.; Woerner, S.J.

1992-12-01

During the construction of the Thermal/Structural In Situ Test Rooms at the Waste Isolation Pilot Plant (WIPP) facility, measurements of the salt displacements were obtained at very early times, essentially concurrent with the mining activity. This was accomplished by emplacing manually read closure gage stations directly at the mining face, actually between the face and the mining machine, immediately upon mining of the intended gage location. Typically, these mining sequence closure measurements were taken within one hour of mining of the location and within one meter of the mining face. Readings were taken at these gage stations as the multipass mining continued, with the gage station reestablished as each successive mining pass destroyed the earlier gage points. Data reduction yields the displacement history during the mining operation. These early mining sequence closure data, when combined with the later data of the permanently emplaced closure gages, gives the total time-dependent closure displacements of the test rooms. This complete closure history is an essential part of assuring that the in situ test databases will provide an adequate basis for validation of the predictive technology of salt creep behavior, as required by the WIPP technology development program for disposal of radioactive waste in bedded salt
Educational data mining applications and trends

CERN Document Server

2014-01-01

This book is devoted to the Educational Data Mining arena. It highlights works that show relevant proposals, developments, and achievements that shape trends and inspire future research. After a rigorous revision process sixteen manuscripts were accepted and organized into four parts as follows: · Profile: The first part embraces three chapters oriented to: 1) describe the nature of educational data mining (EDM); 2) describe how to pre-process raw data to facilitate data mining (DM); 3) explain how EDM supports government policies to enhance education. · Student modeling: The second part contains five chapters concerned with: 4) explore the factors having an impact on the students academic success; 5) detect student's personality and behaviors in an educational game; 6) predict students performance to adjust content and strategies; 7) identify students who will most benefit from tutor support; 8) hypothesize the student answer correctness based on eye metrics and mouse click. · As...
Data mining in healthcare: decision making and precision

Directory of Open Access Journals (Sweden)

Ionuţ ŢĂRANU

2016-05-01

Full Text Available The trend of application of data mining in healthcare today is increased because the health sector is rich with information and data mining has become a necessity. Healthcare organizations generate and collect large volumes of information to a daily basis. Use of information technology enables automation of data mining and knowledge that help bring some interesting patterns which means eliminating manual tasks and easy data extraction directly from electronic records, electronic transfer system that will secure medical records, save lives and reduce the cost of medical services as well as enabling early detection of infectious diseases on the basis of advanced data collection. Data mining can enable healthcare organizations to anticipate trends in the patient's medical condition and behaviour proved by analysis of prospects different and by making connections between seemingly unrelated information. The raw data from healthcare organizations are voluminous and heterogeneous. It needs to be collected and stored in organized form and their integration allows the formation unite medical information system. Data mining in health offers unlimited possibilities for analyzing different data models less visible or hidden to common analysis techniques. These patterns can be used by healthcare practitioners to make forecasts, put diagnoses, and set treatments for patients in healthcare organizations.
Large Data Set Mining

NARCIS (Netherlands)

Leemans, I.B.; Broomhall, Susan

2017-01-01

Digital emotion research has yet to make history. Until now large data set mining has not been a very active ﬁeld of research in early modern emotion studies. This is indeed surprising since ﬁrst, the early modern ﬁeld has such rich, copyright-free, digitized data sets and second, emotion studies
Design of data warehouse in teaching state based on OLAP and data mining

Science.gov (United States)

Zhou, Lijuan; Wu, Minhua; Li, Shuang

2009-04-01

The data warehouse and the data mining technology is one of information technology research hot topics. At present the data warehouse and the data mining technology in aspects and so on commercial, financial industry as well as enterprise's production, market marketing obtained the widespread application, but is relatively less in educational fields' application. Over the years, the teaching and management have been accumulating large amounts of data in colleges and universities, while the data can not be effectively used, in the light of social needs of the university development and the current status of data management, the establishment of data warehouse in university state, the better use of existing data, and on the basis dealing with a higher level of disposal --data mining are particularly important. In this paper, starting from the decision-making needs design data warehouse structure of university teaching state, and then through the design structure and data extraction, loading, conversion create a data warehouse model, finally make use of association rule mining algorithm for data mining, to get effective results applied in practice. Based on the data analysis and mining, get a lot of valuable information, which can be used to guide teaching management, thereby improving the quality of teaching and promoting teaching devotion in universities and enhancing teaching infrastructure. At the same time it can provide detailed, multi-dimensional information for universities assessment and higher education research.
Expressive power of an algebra for data mining

NARCIS (Netherlands)

Calders, T.; Lakshmanan, L.V.S.; Ng, R.T.; Paredaens, J.

2006-01-01

The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can
A Data Mining Classification Approach for Behavioral Malware Detection

Directory of Open Access Journals (Sweden)

Monire Norouzi

2016-01-01

Full Text Available Data mining techniques have numerous applications in malware detection. Classification method is one of the most popular data mining techniques. In this paper we present a data mining classification approach to detect malware behavior. We proposed different classification methods in order to detect malware based on the feature and behavior of each malware. A dynamic analysis method has been presented for identifying the malware features. A suggested program has been presented for converting a malware behavior executive history XML file to a suitable WEKA tool input. To illustrate the performance efficiency as well as training data and test, we apply the proposed approaches to a real case study data set using WEKA tool. The evaluation results demonstrated the availability of the proposed data mining approach. Also our proposed data mining approach is more efficient for detecting malware and behavioral classification of malware can be useful to detect malware in a behavioral antivirus.
Data mining concepts and techniques

CERN Document Server

Han, Jiawei

2005-01-01

Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and app...

Data mining and business analytics with R

CERN Document Server

Ledolter, Johannes

2013-01-01

Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification. Highlighting both underlying concepts and practical computational skills, Data Mining
Solar Data Mining at Georgia State University

Science.gov (United States)

Angryk, R.; Martens, P. C.; Schuh, M.; Aydin, B.; Kempton, D.; Banda, J.; Ma, R.; Naduvil-Vadukootu, S.; Akkineni, V.; Küçük, A.; Filali Boubrahimi, S.; Hamdi, S. M.

2016-12-01

In this talk we give an overview of research projects related to solar data analysis that are conducted at Georgia State University. We will provide update on multiple advances made by our research team on the analysis of image parameters, spatio-temporal patterns mining, temporal data analysis and our experiences with big, heterogeneous solar data visualization, analysis, processing and storage. We will talk about up-to-date data mining methodologies, and their importance for big data-driven solar physics research.
Data mining and knowledge discovery for big data methodologies, challenge and opportunities

CERN Document Server

2014-01-01

The field of data mining has made significant and far-reaching advances over the past three decades. Because of its potential power for solving complex problems, data mining has been successfully applied to diverse areas such as business, engineering, social media, and biological science. Many of these applications search for patterns in complex structural information. In biomedicine for example, modeling complex biological systems requires linking knowledge across many levels of science, from genes to disease. Further, the data characteristics of the problems have also grown from static to dynamic and spatiotemporal, complete to incomplete, and centralized to distributed, and grow in their scope and size (this is known as big data). The effective integration of big data for decision-making also requires privacy preservation. The contributions to this monograph summarize the advances of data mining in the respective fields. This volume consists of nine chapters that address subjects ranging from mining da...
Spatio-Temporal Data Mining for Location-Based Services

DEFF Research Database (Denmark)

Gidofalvi, Gyozo

. The objectives of the presented thesis are three-fold. First, to extend popular data mining methods to the spatio-temporal domain. Second, to demonstrate the usefulness of the extended methods and the derived knowledge in promising LBS examples. Finally, to eliminate privacy concerns in connection with spatio......-temporal data mining by devising systems for privacy-preserving location data collection and mining.......Location-Based Services (LBS) are continuously gaining popularity. Innovative LBSes integrate knowledge about the users into the service. Such knowledge can be derived by analyzing the location data of users. Such data contain two unique dimensions, space and time, which need to be analyzed...
76 FR 14637 - State Medicaid Fraud Control Units; Data Mining

Science.gov (United States)

2011-03-17

...] State Medicaid Fraud Control Units; Data Mining AGENCY: Office of Inspector General (OIG), HHS. ACTION... and analyzing State Medicaid claims data, known as data mining. To support and modernize MFCU efforts... (FFP) in the costs of defined data mining activities under specified conditions. In addition, we...
data mining in distributed database

International Nuclear Information System (INIS)

Ghunaim, A.A.A.

2007-01-01

as we march into the age of digital information, the collection and the storage of large quantities of data is increased, and the problem of data overload looms ominously ahead. it is estimated today that the volume of data stored by a company doubles every year but the amount of meaningful information is decreases rapidly. the ability to analyze and understand massive datasets lags far behind the ability to gather and store the data. the unbridled growth of data will inevitably lead to a situation in which it is increasingly difficult to access the desired information; it will always be like looking for a needle in a haystack, and where only the amount of hay will be growing all the time . so, a new generation of computational techniques and tools is required to analyze and understand the rapidly growing volumes of data . and, because the information technology (it) has become a strategic weapon in the modern life, it is needed to use a new decision support tools to be an international powerful competitor.data mining is one of these tools and its methods make it possible to extract decisive knowledge needed by an enterprise and it means that it concerned with inferring models from data , including statistical pattern recognition, applied statistics, machine learning , and neural networks. data mining is a tool for increasing productivity of people trying to build predictive models. data mining techniques have been applied successfully to several real world problem domains; but the application in the nuclear reactors field has only little attention . one of the main reasons, is the difficulty in obtaining the data sets
Web Mining of Hotel Customer Survey Data

Directory of Open Access Journals (Sweden)

Richard S. Segall

2008-12-01

Full Text Available This paper provides an extensive literature review and list of references on the background of web mining as applied specifically to hotel customer survey data. This research applies the techniques of web mining to actual text of written comments for hotel customers using Megaputer PolyAnalyst®. Web mining functionalities utilized include those such as clustering, link analysis, key word and phrase extraction, taxonomy, and dimension matrices. This paper provides screen shots of the web mining applications using Megaputer PolyAnalyst®. Conclusions and future directions of the research are presented.
Analisis Data Lulusan dengan Data Mining untuk Mendukung Strategi Promosi Universitas Lancang Kuning

Directory of Open Access Journals (Sweden)

Elvira Asril

2015-11-01

Full Text Available Setiap perusahaan maupun organisasi yang ingin tetap bertahan perlu untuk menentukan strategi promosi yang tepat. Penentuan strategi promosi yang tepat akan dapat mengurangi biaya promosi dan mencapai sasaran promosi yang tepat. Salah satu cara yang dapat dilakukan untuk penentuan strategi promosi adalah dengan menggunakan teknik data mining. Teknik data mining yang digunakan dalam hal ini adalah dengan menggunakan algoritma Clustering K-Means. Clustering merupakan pengelompokkan record, observasi, atau kasus ke dalam kelas-kelas objek yang mirip. K-Means adalah metode klaster data non-hirarkis yang mencoba untuk membagi data ke dalam satu atau lebih klaster. Penelitian dilakukan dengan mengamati beberapa variabel penelitian yang sering dipertimbangkan oleh perguruan tinggi dalam menentukan sasaran promosinya yaitu asal sekolah, daerah, dan jurusan. Hasil penelitian ini adalah berupa pola menarik hasil data mining yang merupakan informasi penting untuk mendukung strategi promosi yang tepat dalam mendapatkan calon mahasiswa baru.Kata kunci: Data Mining, Clustering, K-Means Each company or organization that wants to survive needs to determine appropriate promotional strategies. Determination of appropriate promotional strategies will be able to reduce costs and achieve the goals the promotion of proper promotion. One way that can be done to determine campaign strategy is to use data mining techniques. Data mining techniques used in this case is to use a K-Means clustering algorithm. Clustering is the grouping of records, observation, or in the case of the object classes that are similar. K-Means is a method of non-hierarchical clustering of data that is trying to divide the data into one or more clusters. The study was conducted by observing some of the variables that are often considered by the college in determining the target of promotion that the school of origin, region, and department. Results of this study are interesting pattern of
Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling

Science.gov (United States)

Zhang, Wen-Ran

2002-03-01

An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.
Mining Staff Assignment Rules from Event-Based Data

NARCIS (Netherlands)

Ly, Linh Thao; Rinderle, Stefanie; Dadam, Peter; Reichert, Manfred; Bussler, Christoph J.; Haller, Armin

2006-01-01

Process mining offers methods and techniques for capturing process behaviour from log data of past process executions. Although many promising approaches on mining the control flow have been published, no attempt has been made to mine the staff assignment situation of business processes. In this
Data warehousing and data mining: A case study

Directory of Open Access Journals (Sweden)

Suknović Milija

2005-01-01

Full Text Available This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS, University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.
Data-Throughput Enhancement Using Data Mining-Informed Cognitive Radio

Directory of Open Access Journals (Sweden)

Khashayar Kotobi

2015-03-01

Full Text Available We propose the data mining-informed cognitive radio, which uses non-traditional data sources and data-mining techniques for decision making and improving the performance of a wireless network. To date, the application of information other than wireless channel data in cognitive radios has not been significantly studied. We use a novel dataset (Twitter traffic as an indicator of network load in a wireless channel. Using this dataset, we present and test a series of predictive algorithms that show an improvement in wireless channel utilization over traditional collision-detection algorithms. Our results demonstrate the viability of using these novel datasets to inform and create more efficient cognitive radio networks.
Integrating Data Mining Techniques into Telemedicine Systems

Directory of Open Access Journals (Sweden)

Mihaela GHEORGHE

2014-01-01

Full Text Available The medical system is facing a wide range of challenges nowadays due to changes that are taking place in the global healthcare systems. These challenges are represented mostly by economic constraints (spiraling costs, financial issues, but also, by the increased emphasis on accountability and transparency, changes that were made in the education field, the fact that the biomedical research keeps growing in what concerns the complexities of the specific studies etc. Also the new partnerships that were made in medical care systems and the great advances in IT industry suggest that a predominant paradigm shift is occurring. This needs a focus on interaction, collaboration and increased sharing of information and knowledge, all of these may is in turn be leading healthcare organizations to embrace the techniques of data mining in order to create and sustain optimal healthcare outcomes. Data mining is a domain of great importance nowadays as it provides advanced data analysis techniques for extracting the knowledge from the huge volumes of data collected and stored by every system of a daily basis. In the healthcare organizations data mining can provide valuable information for patient's diagnosis and treatment planning, customer relationship management, organization resources management or fraud detection. In this article we focus on describing the importance of data mining techniques and systems for healthcare organizations with a focus on developing and implementing telemedicine solution in order to improve the healthcare services provided to the patients. We provide architecture for integrating data mining techniques into telemedicine systems and also offer an overview on understanding and improving the implemented solution by using Business Process Management methods.
Data mining in e-commerce: A survey

Indian Academy of Sciences (India)

Data mining has matured as a ﬁeld of basic and applied research in computer science in general and e-commerce in particular. In this paper, we survey some of the recent approaches and architectures where data mining has been applied in the ﬁelds of e-commerce and e-business. Our intent is not to survey the plethora ...
Data Mining Techniques for Customer Relationship Management

Science.gov (United States)

Guo, Feng; Qin, Huilin

2017-10-01

Data mining have made customer relationship management (CRM) a new area where firms can gain a competitive advantage, and play a key role in the firms’ management decision. In this paper, we first analyze the value and application fields of data mining techniques for CRM, and further explore how data mining applied to Customer churn analysis. A new business culture is developing today. The conventional production centered and sales purposed market strategy is gradually shifting to customer centered and service purposed. Customers’ value orientation is increasingly affecting the firms’. And customer resource has become one of the most important strategic resources. Therefore, understanding customers’ needs and discriminating the most contributed customers has become the driving force of most modern business.
Advances in Machine Learning and Data Mining for Astronomy

Science.gov (United States)

Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

2012-03-01

Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
Mining multi-dimensional data for decision support

Energy Technology Data Exchange (ETDEWEB)

Donato, J.M.; Schryver, J.C.; Hinkel, G.C.; Schmoyer, R.L. Jr. [Oak Ridge National Lab., TN (United States); Grady, N.W.; Leuze, M.R. [Oak Ridge National Lab., TN (United States)]|[Joint Inst. for Computational Science, Knoxville, TN (United States)

1998-06-01

While it is widely recognized that data can be a valuable resource for any organization, extracting information contained within the data is often a difficult problem. Attempts to obtain information from data may be limited by legacy data storage formats, lack of expert knowledge about the data, difficulty in viewing the data, or the volume of data needing to be processed. The rapidly developing field of Data Mining or Knowledge Data Discovery is a blending of Artificial Intelligence, Statistics, and Human-Computer Interaction. Sophisticated data navigation tools to obtain the information needed for decision support do not yet exist. Each data mining task requires a custom solution that depends upon the character and quantity of the data. This paper presents a two-stage approach for handling the prediction of personal bankruptcy using credit card account data, combining decision tree and artificial neural network technologies. Topics to be discussed include the pre-processing of data, including data cleansing, the filtering of data for pertinent records, and the reduction of data for attributes contributing to the prediction of bankruptcy, and the two steps in the mining process itself.
Data and Statistics on New York's Mining Resources - NYS Dept. of

Science.gov (United States)

): Search DEC D E C banner Home Â» Lands and Waters Â» Mining & Reclamation Â» Data and Statistics on New York's Mining Resources Skip to main navigation Data and Statistics on New York's Mining Resources Statistics on New York's Mining Resources: Mines in New York - Information on active mines in New York State
Marine data users clustering using data mining technique

Directory of Open Access Journals (Sweden)

Farnaz Ghiasi

2015-09-01

Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.
BOOK REVIEW EDUCATIONAL DATA MINING: APPLICATIONS AND TRENDS

Directory of Open Access Journals (Sweden)

Aylin OZTURK

2016-04-01

Full Text Available Educational Data Mining (EDM is a developing field based on data mining techniques. EDM emerged as a combination of areas such as machine learning, statistics, computer science, education, cognitive science, and psychometry. EDM focuses on learner characteristics, behaviors, academic achievements, process of learning, educational functionalities, domain knowledge content, assessments, and applications. Educational data mining is defined by Baker (2010 as ‘‘an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in’’. EDM is concerned with improving the learning process and environment.

Mining Educational Data to Analyze the Student Motivation Behavior

OpenAIRE

Kunyanuth Kularbphettong; Cholticha Tongsiri

2012-01-01

The purpose of this research aims to discover the knowledge for analysis student motivation behavior on e-Learning based on Data Mining Techniques, in case of the Information Technology for Communication and Learning Course at Suan Sunandha Rajabhat University. The data mining techniques was applied in this research including association rules, classification techniques. The results showed that using data mining technique can indicate the important variables that influenc...
Frequent Pattern Mining Algorithms for Data Clustering

DEFF Research Database (Denmark)

Zimek, Arthur; Assent, Ira; Vreeken, Jilles

2014-01-01

that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...
Data mining, knowledge discovery and data-driven modelling

NARCIS (Netherlands)

Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

2003-01-01

The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration
A Data Mining Approach for Cardiovascular Diagnosis

Directory of Open Access Journals (Sweden)

Pereira Joana

2017-12-01

Full Text Available The large amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analysed by traditional methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complex data. In the healthcare industry specifically, data mining can be used to decrease costs by increasing efficiency, improve patient quality of life, and perhaps most importantly, save the lives of more patients. The main goal of this project is to apply data mining techniques in order to make possible the prediction of the degree of disability that patients will present when they leave hospitalization. The clinical data that will compose the data set was obtained from one single hospital and contains information about patients who were hospitalized in Cardio Vascular Disease’s (CVD unit in 2016 for having suffered a cardiovascular accident. To develop this project, it will be used the Waikato Environment for Knowledge Analysis (WEKA machine learning Workbench since this one allows users to quickly try out and compare different machine learning methods on new data sets
Data Mining and Knowledge Management in Higher Education -Potential Applications.

Science.gov (United States)

Luan, Jing

This paper introduces a new decision support tool, data mining, in the context of knowledge management. The most striking features of data mining techniques are clustering and prediction. The clustering aspect of data mining offers comprehensive characteristics analysis of students, while the predicting function estimates the likelihood for a…
Academic Performance: An Approach From Data Mining

Directory of Open Access Journals (Sweden)

David L. La Red Martinez

2012-02-01

Full Text Available The relatively low% of students promoted and regularized in Operating Systems Course of the LSI (Bachelor’s Degree in Information Systems of FaCENA (Faculty of Sciences and Natural Surveying - Facultad de Ciencias Exactas, Naturales y Agrimensura of UNNE (academic success, prompted this work, whose objective is to determine the variables that affect the academic performance, whereas the final status of the student according to the Res. 185/03 CD (scheme for evaluation and promotion: promoted, regular or free1. The variables considered are: status of the student, educational level of parents, secondary education, socio-economic level, and others. Data warehouse (Data Warehouses: DW and data mining (Data Mining: DM techniques were used to search pro.les of students and determine success or failure academic potential situations. Classifications through techniques of clustering according to different criteria have become. Some criteria were the following: mining of classification according to academic program, according to final status of the student, according to importance given to the study, mining of demographic clustering and Kohonen clustering according to final status of the student. Were conducted statistics of partition, detail of partitions, details of clusters, detail of fields and frequency of fields, overall quality of each process and quality detailed (precision, classification, reliability, arrays of confusion, diagrams of gain / elevation, trees, distribution of nodes, of importance of fields, correspondence tables of fields and statistics of cluster. Once certain profiles of students with low academic performance, it may address actions aimed at avoiding potential academic failures. This work aims to provide a brief description of aspects related to the data warehouse built and some processes of data mining developed on the same.
Data Mining Smart Energy Time Series

Directory of Open Access Journals (Sweden)

Janina POPEANGA

2015-07-01

Full Text Available With the advent of smart metering technology the amount of energy data will increase significantly and utilities industry will have to face another big challenge - to find relationships within time-series data and even more - to analyze such huge numbers of time series to find useful patterns and trends with fast or even real-time response. This study makes a small review of the literature in the field, trying to demonstrate how essential is the application of data mining techniques in the time series to make the best use of this large quantity of data, despite all the difficulties. Also, the most important Time Series Data Mining techniques are presented, highlighting their applicability in the energy domain.
Knowledge-Based Reinforcement Learning for Data Mining

Science.gov (United States)

Kudenko, Daniel; Grzes, Marek

Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human
Methodologies of Knowledge Discovery from Data and Data Mining Methods in Mechanical Engineering

Directory of Open Access Journals (Sweden)

Rogalewicz Michał

2016-12-01

Full Text Available The paper contains a review of methodologies of a process of knowledge discovery from data and methods of data exploration (Data Mining, which are the most frequently used in mechanical engineering. The methodologies contain various scenarios of data exploring, while DM methods are used in their scope. The paper shows premises for use of DM methods in industry, as well as their advantages and disadvantages. Development of methodologies of knowledge discovery from data is also presented, along with a classification of the most widespread Data Mining methods, divided by type of realized tasks. The paper is summarized by presentation of selected Data Mining applications in mechanical engineering.
Data mining-aided materials discovery and optimization

Directory of Open Access Journals (Sweden)

Wencong Lu

2017-09-01

Full Text Available Recent developments in data mining-aided materials discovery and optimization are reviewed in this paper, and an introduction to the materials data mining (MDM process is provided using case studies. Both qualitative and quantitative methods in machine learning can be adopted in the MDM process to accomplish different tasks in materials discovery, design, and optimization. State-of-the-art techniques in data mining-aided materials discovery and optimization are demonstrated by reviewing the controllable synthesis of dendritic Co3O4 superstructures, materials design of layered double hydroxide, battery materials discovery, and thermoelectric materials design. The results of the case studies indicate that MDM is a powerful approach for use in materials discovery and innovation, and will play an important role in the development of the Materials Genome Initiative and Materials Informatics.
Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes

OpenAIRE

Anjewierden , Anjo; Kolloffel , Bas; Hulshof , Casper

2007-01-01

In this paper we investigate the application of data mining methods to provide learners with real-time adaptive feedback on the nature and patterns of their on-line communication while learning collaboratively.We derived two models for classifying chat messages using data mining techniques and tested these on an actual data set [16]. The reliability of the classification of chat messages is established by comparing the models performance to that of humans. Results indicate that the classifica...
Data Mining for Education Decision Support: A Review

Directory of Open Access Journals (Sweden)

Suhirman Suhirman

2014-12-01

Full Text Available Management of higher education must continue to evaluate on an ongoing basis in order to improve the quality of institutions. This will be able to do the necessary evaluation of various data, information, and knowledge of both internal and external institutions. They plan to use more efficiently the collected data, develop tools so that to collect and direct management information, in order to support managerial decision making. The collected data could be utilized to evaluate quality, perform analyses and diagnoses, evaluate dependability to the standards and practices of curricula and syllabi, and suggest alternatives in decision processes. Data minings to support decision making are well suited methods to provide decision support in the education environments, by generating and presenting relevant information and knowledge towards quality improvement of education processes. In educational domain, this information is very useful since it can be used as a base for investigating and enhancing the current educational standards and managements. In this paper, a review on data mining for academic decision support in education field is presented. The details of this paper will review on recent data mining in educational field and outlines future researches in educational data mining.
APLIKASI DATA MINING UNTUK MENAMPILKAN INFORMASI TINGKAT KELULUSAN MAHASISWA

Directory of Open Access Journals (Sweden)

Yuli Asriningtias

2014-01-01

Full Text Available Perguruan tinggi dituntut memiliki keunggulan bersaing dengan memanfaatkan sumber dayanya, termasuk sumber daya manusia dalam hal ini adalah mahasiswa.Tidak semua mahasiswa dapat menyelesaikan study tepat waktu, disamping IPK yang beragam. Lama waktu mahasiswa dalam menempuh studi dan IPK menjadi salah satu faktor tingkat keunggulan sebuah Perguruan Tinggi. Nilai potensi tersebut dapat digali menggunakan teknik data mining.Data mining adalah kegiatan menemukan pola yang menarik dari data dalam jumlah besar, data dapat disimpan dalam database, data warehouse, atau penyimpanan informasi lainnya. Data warehouse merupakan penyimpanan data yang berorientasi objek, terintegrasi, mempunyai variant waktu, dan menyimpan data dalam bentuk nonvolatile sebagai pendukung manejemen dalam proses pengambilan keputusan. Penelitian ini dikembangkan dengan cara menscan data pada database secara langsung sehingga menghasilkan informasi yag dibutuhkan. Aplikasi data mining ini dibangun menggunakan bahasa pemrograman Borland Delphi 7 dan menggunakan database SQL Server 2000 sebagai media penyimpan data. Hasil dari penelitian bahwa dapat diketahui tingkat ketepatan waktu dan nilai kelulusan mahasiswa yang berelasi dengan atribut data masuk mahasiswa. Kata Kunci : Data mining, data warehouse, kelulusan mahasiswa.
Kajian Data Mining Customer Relationship Management pada Lembaga Keuangan Mikro

Directory of Open Access Journals (Sweden)

Tikaridha Hardiani

2016-01-01

Full Text Available Companies are required to be ready to face the competition will be intense with other companies, including micro-finance institutions. Faced more intense competition, has led to many businesses in microfinance institutions find profitable strategy to distinguish from the others. Strategy that can be applied is implementing Customer Relationship Management (CRM and data mining. Data mining can be used to microfinance institutions that have a large enough data. Determine the potential customers with customer segmentation can help the decision-making marketing strategy that will be implemented . This paper discusses several data mining techniques that can be used for customer segmentation. Proposed method of data mining technique is fuzzy clustering with fuzzy C-Means algorithm and fuzzy RFM. Keywords : Customer relationship management; Data mining; Fuzzy clustering; Micro-finance institutions; Fuzzy C-Means; Fuzzy RFM
Data Mining in Education : A Review on the Knowledge Discovery Perspective

OpenAIRE

Pratiyush Guleria; Manu Sood

2014-01-01

Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data minin g can be used to mine understandable meaningful patterns from large databases and these patterns ma y then be converted into knowledge.Data mining is t he process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehou se and...
Time Dependent Data Mining in RAVEN

Energy Technology Data Exchange (ETDEWEB)

Cogliati, Joshua Joseph [Idaho National Lab. (INL), Idaho Falls, ID (United States); Chen, Jun [Idaho National Lab. (INL), Idaho Falls, ID (United States); Patel, Japan Ketan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Talbot, Paul William [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

2016-09-01

RAVEN is a generic software framework to perform parametric and probabilistic analysis based on the response of complex system codes. The goal of this type of analyses is to understand the response of such systems in particular with respect their probabilistic behavior, to understand their predictability and drivers or lack of thereof. Data mining capabilities are the cornerstones to perform such deep learning of system responses. For this reason static data mining capabilities were added last fiscal year (FY 15). In real applications, when dealing with complex multi-scale, multi-physics systems it seems natural that, during transients, the relevance of the different scales, and physics, would evolve over time. For these reasons the data mining capabilities have been extended allowing their application over time. In this writing it is reported a description of the new RAVEN capabilities implemented with several simple analytical tests to explain their application and highlight the proper implementation. The report concludes with the application of those newly implemented capabilities to the analysis of a simulation performed with the Bison code.
Data Warehouse, Data Mining Dan Konsep Cross-Selling Pada Analisis Penjualan Produk

Directory of Open Access Journals (Sweden)

Eka Miranda

2010-12-01

Full Text Available This paper is about designing and implementing data warehousing and data mining, along with their roles in supporting decision-making related to sales product analysis in cross-selling concept of PT XYZ. The database the company used is not supporting data analysis and decision-making. Therefore, it made a data warehousing design that could be used to keep data in a huge amount and could give report and answer from user’s questions in ad hoc. The method is used to design and implement data warehousing and data mining which consists of literature study, company problem analysis, and data warehousing design, and testing result. The writing results are a data warehousing design and data mining and also the implementation of cross-selling concept to analysis the sales, purchases, and customers’ cancellation data. The data could be showed and analyzed from some point of views that could help managers to analyse and acknowledge more information.
Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques.

Science.gov (United States)

Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M; Anticoi, Hernán Francisco; Guash, Eduard

2018-03-07

An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector-either surface or underground mining-based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents.
A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining

Energy Technology Data Exchange (ETDEWEB)

Kevin McCarthy; Milos Manic

2012-08-01

Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.
Mining Outlier Data in Mobile Internet-Based Large Real-Time Databases

Directory of Open Access Journals (Sweden)

Xin Liu

2018-01-01

Full Text Available Mining outlier data guarantees access security and data scheduling of parallel databases and maintains high-performance operation of real-time databases. Traditional mining methods generate abundant interference data with reduced accuracy, efficiency, and stability, causing severe deficiencies. This paper proposes a new mining outlier data method, which is used to analyze real-time data features, obtain magnitude spectra models of outlier data, establish a decisional-tree information chain transmission model for outlier data in mobile Internet, obtain the information flow of internal outlier data in the information chain of a large real-time database, and cluster data. Upon local characteristic time scale parameters of information flow, the phase position features of the outlier data before filtering are obtained; the decision-tree outlier-classification feature-filtering algorithm is adopted to acquire signals for analysis and instant amplitude and to achieve the phase-frequency characteristics of outlier data. Wavelet transform threshold denoising is combined with signal denoising to analyze data offset, to correct formed detection filter model, and to realize outlier data mining. The simulation suggests that the method detects the characteristic outlier data feature response distribution, reduces response time, iteration frequency, and mining error rate, improves mining adaptation and coverage, and shows good mining outcomes.

Report from Dagstuhl Seminar 12331 Mobility Data Mining and Privacy

OpenAIRE

Clifton, Christopher W.; Kuijpers, Bart; Morik, Katharina; Saygin, Yucel

2012-01-01

This report documents the program and the outcomes of Dagstuhl Seminar 12331 “Mobility Data Mining and Privacy”. Mobility data mining aims to extract knowledge from movement behaviour of people, but this data also poses novel privacy risks. This seminar gathered a multidisciplinary team for a conversation on how to balance the value in mining mobility data with privacy issues. The seminar focused on four key issues: Privacy in vehicular data, in cellular data, context- dependent privacy, and ...
Mining algorithm for association rules in big data based on Hadoop

Science.gov (United States)

Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

2018-04-01

In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.
Data mining in time series databases

CERN Document Server

Kandel, Abraham; Bunke, Horst

2004-01-01

Adding the time dimension to real-world databases produces Time SeriesDatabases (TSDB) and introduces new aspects and difficulties to datamining and knowledge discovery. This book covers the state-of-the-artmethodology for mining time series databases. The novel data miningmethods presented in the book include techniques for efficientsegmentation, indexing, and classification of noisy and dynamic timeseries. A graph-based method for anomaly detection in time series isdescribed and the book also studies the implications of a novel andpotentially useful representation of time series as strings. Theproblem of detecting changes in data mining models that are inducedfrom temporal databases is additionally discussed.
Data Mine and Forget It?: A Cautionary Tale

Science.gov (United States)

Tada, Yuri; Kraft, Norbert Otto; Orasanu, Judith M.

2011-01-01

With the development of new technologies, data mining has become increasingly popular. However, caution should be exercised in choosing the variables to include in data mining. A series of regression trees was created to demonstrate the change in the selection by the program of significant predictors based on the nature of variables.
Randomized algorithms in automatic control and data mining

CERN Document Server

Granichin, Oleg; Toledano-Kitai, Dvora

2015-01-01

In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.
Event metadata records as a testbed for scalable data mining

International Nuclear Information System (INIS)

Gemmeren, P van; Malon, D

2010-01-01

At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.
Introduction to the special section on educational data mining

NARCIS (Netherlands)

Calders, T.G.K.; Pechenizkiy, M.

2012-01-01

Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM is both a learning science, as well as a rich application area for data mining, due to
HSM: Heterogeneous Subspace Mining in High Dimensional Data

DEFF Research Database (Denmark)

Müller, Emmanuel; Assent, Ira; Seidl, Thomas

2009-01-01

Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional...... challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant...... for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines...
ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

Science.gov (United States)

Krishnakumar, Vivek; Contrino, Sergio; Cheng, Chia-Yi; Belyaeva, Irina; Ferlanti, Erik S; Miller, Jason R; Vaughn, Matthew W; Micklem, Gos; Town, Christopher D; Chan, Agnes P

2017-01-01

ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
A Survey on Accessing Data over Cloud Environment using Data mining Algorithms

OpenAIRE

B.Prasanalakshmi; A.Selvaraj

2015-01-01

In today's world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large s...
A Big Data Platform for Storing, Accessing, Mining and Learning Geospatial Data

Science.gov (United States)

Yang, C. P.; Bambacus, M.; Duffy, D.; Little, M. M.

2017-12-01

Big Data is becoming a norm in geoscience domains. A platform that is capable to effiently manage, access, analyze, mine, and learn the big data for new information and knowledge is desired. This paper introduces our latest effort on developing such a platform based on our past years' experiences on cloud and high performance computing, analyzing big data, comparing big data containers, and mining big geospatial data for new information. The platform includes four layers: a) the bottom layer includes a computing infrastructure with proper network, computer, and storage systems; b) the 2nd layer is a cloud computing layer based on virtualization to provide on demand computing services for upper layers; c) the 3rd layer is big data containers that are customized for dealing with different types of data and functionalities; d) the 4th layer is a big data presentation layer that supports the effient management, access, analyses, mining and learning of big geospatial data.
Implementasi Data Warehouse dan Data Mining: Studi Kasus Analisis Peminatan Studi Siswa

Directory of Open Access Journals (Sweden)

Eka Miranda

2011-06-01

Full Text Available This paper discusses the implementation of data mining and their role in helping decision-making related to students’ specialization program selection. Currently, the university uses a database to store records of transactions which can not directly be used to assist analysis and decision making. Based on these issues then made the data warehouse design used to store large amounts of data and also has the potential to gain new data distribution perspectives and allows to answer the ad hoc question as well as to perform data analysis. The method used consists of: record analysis related to students’ academic achievement, designing data warehouse and data mining. The paper’s results are in a form of data warehouse and data mining design and its implementation with the classification techniques and association rules. From these results can be seen the students’ tendency and pattern background in choosing the specialization, to help them make decisions.
The viability of business data mining in the sports environment ...

African Journals Online (AJOL)

Data mining can be viewed as the process of extracting previously unknown information from large databases and utilising this information to make crucial business decisions (Simoudis, 1996: 26). This paper considers the viability of using data mining tools and techniques in sports, particularly with regard to mining the ...
Challenges in computational statistics and data mining

CERN Document Server

Mielniczuk, Jan

2016-01-01

This volume contains nineteen research papers belonging to the areas of computational statistics, data mining, and their applications. Those papers, all written specifically for this volume, are their authors’ contributions to honour and celebrate Professor Jacek Koronacki on the occcasion of his 70th birthday. The book’s related and often interconnected topics, represent Jacek Koronacki’s research interests and their evolution. They also clearly indicate how close the areas of computational statistics and data mining are.
Data Mining and Complex Problems: Case Study in Composite Materials

Science.gov (United States)

Rabelo, Luis; Marin, Mario

2009-01-01

Data mining is defined as the discovery of useful, possibly unexpected, patterns and relationships in data using statistical and non-statistical techniques in order to develop schemes for decision and policy making. Data mining can be used to discover the sources and causes of problems in complex systems. In addition, data mining can support simulation strategies by finding the different constants and parameters to be used in the development of simulation models. This paper introduces a framework for data mining and its application to complex problems. To further explain some of the concepts outlined in this paper, the potential application to the NASA Shuttle Reinforced Carbon-Carbon structures and genetic programming is used as an illustration.
Data mining concepts, methods and applications in management and engineering design

CERN Document Server

Yin, Yong; Tang, Jiafu; Zhu, JianMing

2011-01-01

Data Mining introduces in clear and simple ways how to use existing data mining methods to obtain effective solutions for a variety of management and engineering design problems. Data Mining is organised into two parts: the first provides a focused introduction to data mining and the second goes into greater depth on subjects such as customer analysis. It covers almost all managerial activities of a company, including: * supply chain design, * product development, * manufacturing system design, * product quality control, and * preservation of privacy. Incorporating recent developments of data
Data mining and education.

Science.gov (United States)

Koedinger, Kenneth R; D'Mello, Sidney; McLaughlin, Elizabeth A; Pardos, Zachary A; Rosé, Carolyn P

2015-01-01

An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and metacognition on learning, and analysis of language data and collaborative learning. For example, we discuss (1) how different statistical assessment methods were used in a data mining competition to improve prediction of student responses to intelligent tutor tasks, (2) how better cognitive models can be discovered from data and used to improve instruction, (3) how data-driven models of student affect can be used to focus discussion in a dialog-based tutoring system, and (4) how machine learning techniques applied to discussion data can be used to produce automated agents that support student learning as they collaborate in a chat room or a discussion board. © 2015 John Wiley & Sons, Ltd.
Improve Data Mining and Knowledge Discovery through the use of MatLab

Science.gov (United States)

Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

2011-01-01

Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and
Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining

OpenAIRE

Chen, D

2012-01-01

Many small online retailers and new entrants to the online retail sector are keen to practice data mining and consumer-centric marketing in their businesses yet technically lack the necessary knowledge and expertise to do so. In this article a case study of using data mining techniques in customer-centric business intelligence for an online retailer is presented. The main purpose of this analysis is to help the business better understand its customers and therefore conduct customer-centric ma...
Data mining in soft computing framework: a survey.

Science.gov (United States)

Mitra, S; Pal, S K; Mitra, P

2002-01-01

The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

Uncertainty modeling for data mining a label semantics approach

CERN Document Server

Qin, Zengchang

2014-01-01

Outlining a new research direction in fuzzy set theory applied to data mining, this volume proposes a number of new data mining algorithms and includes dozens of figures and illustrations that help the reader grasp the complexities of the concepts.
The handbook of data mining

CERN Document Server

Ye, Nong

2003-01-01

This bk is the 1st comprehensive one to feature systematic coverage of the concepts, techniques, examples, issues, software tools and future advancements of data mining. The demand for DM apps are increasing in indus, gov, & academia.
Highly Robust Methods in Data Mining

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2013-01-01

Roč. 8, č. 1 (2013), s. 9-24 ISSN 1452-4864 Institutional support: RVO:67985807 Keywords : data mining * robust statistics * high-dimensional data * cluster analysis * logistic regression * neural networks Subject RIV: BB - Applied Statistics, Operational Research
Is Europe Falling Behind in Data Mining? Copyright’s Impact on Data Mining in Academic Research

NARCIS (Netherlands)

Handke, C.; Guibault, L.; Vallbé, J.J.; Schmidt, B.; Dobreva, M.

2015-01-01

With the diffusion of digital information technology, data mining (DM) is widely expected to increase the productivity of all kinds of research activities. Based on bibliometric data, we demonstrate that the share of DM-related research articles in all published academic papers has increased
Supporting Solar Physics Research via Data Mining

Science.gov (United States)

Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

2012-05-01

In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.
Open data mining for Taiwan's dengue epidemic.

Science.gov (United States)

Wu, ChienHsing; Kao, Shu-Chen; Shih, Chia-Hung; Kan, Meng-Hsuan

2018-07-01

By using a quantitative approach, this study examines the applicability of data mining technique to discover knowledge from open data related to Taiwan's dengue epidemic. We compare results when Google trend data are included or excluded. Data sources are government open data, climate data, and Google trend data. Research findings from analysis of 70,914 cases are obtained. Location and time (month) in open data show the highest classification power followed by climate variables (temperature and humidity), whereas gender and age show the lowest values. Both prediction accuracy and simplicity decrease when Google trends are considered (respectively 0.94 and 0.37, compared to 0.96 and 0.46). The article demonstrates the value of open data mining in the context of public health care. Copyright © 2018 Elsevier B.V. All rights reserved.
Data Mining Learning Models and Algorithms on a Scada System Data Repository

Directory of Open Access Journals (Sweden)

Mircea Rîşteiu

2010-06-01

Full Text Available This paper presents three data mining techniques applied
on a SCADA system data repository: NaÄ³ve Bayes, k-Nearest Neighbor and Decision Trees. A conclusion that k-Nearest Neighbor is a suitable method to classify the large amount of data considered is made finally according to the mining result and its reasonable explanation. The experiments are built on the training data set and evaluated using the new test set with machine learning tool WEKA.
DATA MINING. CONCEPTS AND APPLICATIONS IN BANKING SECTOR

Directory of Open Access Journals (Sweden)

ADRIAN IONUT PASCU

2018-02-01

Full Text Available The concept of banking refers to the multitude of services and products that commercial banks offer to clients and include besides transactional accounts both passive and active products. Due to the increased competitiveness in banking, the relationship between the bank and the client has become an essential factor for the strategy in order to increase customer satisfaction. Currently the banking system is able to store impressive amounts of data that they collect daily, from customer data and transaction details to data on their transactional or risk profile. The process through which large amounts of data are analyzed, extracted, identified and the information obtained using mathematical and statistical models are interpreted is known as data mining. The discovery of knowledge from data involves identifying some models, some patterns with which certain events or possible risks are anticipated. This process helps banks to develop strategies in areas such as customer retention and loyalty, customer satisfaction, fraud detection and prevention, risk management, money laundering prevention. The aim of this paper is to present the concept of data mining and the concept of data discovery (KDD, but also the impact and important use of data mining techniques in the banking sector. This paper explores and reviews various data mining techniques that are applied in the banking sector but also provides insight into how these techniques are used in different areas to make decision-making easier and more efficient.
2nd International Conference on Computational Intelligence in Data Mining

CERN Document Server

Mohapatra, Durga

2016-01-01

The book is a collection of high-quality peer-reviewed research papers presented in the Second International Conference on Computational Intelligence in Data Mining (ICCIDM 2015) held at Bhubaneswar, Odisha, India during 5 – 6 December 2015. The two-volume Proceedings address the difficulties and challenges for the seamless integration of two core disciplines of computer science, i.e., computational intelligence and data mining. The book addresses different methods and techniques of integration for enhancing the overall goal of data mining. The book helps to disseminate the knowledge about some innovative, active research directions in the field of data mining, machine and computational intelligence, along with some current issues and applications of related topics.
Optimal sampling strategy for data mining

International Nuclear Information System (INIS)

Ghaffar, A.; Shahbaz, M.; Mahmood, W.

2013-01-01

Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)
Using data mining to segment healthcare markets from patients' preference perspectives.

Science.gov (United States)

Liu, Sandra S; Chen, Jie

2009-01-01

This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.
Informatics, Data Mining, Econometrics and Financial Economics: A Connection

NARCIS (Netherlands)

C-L. Chang (Chia-Lin); M.J. McAleer (Michael); W.-K. Wong (Wing-Keung)

2015-01-01

textabstractThis short communication reviews some of the literature in econometrics and financial economics that is related to informatics and data mining. We then discuss some of the research on econometrics and financial economics that could be extended to informatics and data mining beyond the
Class association rules mining from students’ test data (Abstract)

NARCIS (Netherlands)

Romero, C.; Ventura, S.; Vasilyeva, E.; Pechenizkiy, M.; Baker, de R.S.J.; Merceron, A.; Pavlik Jr., P.I.

2010-01-01

In this paper we propose the use of a special type of association rules mining for discovering interesting relationships from the students’ test data collected in our case with Moodle learning management system (LMS). Particularly, we apply Class Association Rule (CAR) mining to different data
Combining complex networks and data mining: Why and how

Science.gov (United States)

Zanin, M.; Papo, D.; Sousa, P. A.; Menasalvas, E.; Nicchi, A.; Kubik, E.; Boccaletti, S.

2016-05-01

The increasing power of computer technology does not dispense with the need to extract meaningful information out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING

OpenAIRE

Arumugam.S

2016-01-01

The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is mainly used to make predictions about future events which are unknown. Predictive analytics which uses various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for analyzing the current data and to make predictions about futu...
Detecting Structural Damage of Nuclear Power Plant by Interactive Data Mining Approach

International Nuclear Information System (INIS)

Yufei Shu

2006-01-01

This paper presents a nonlinear structural damage identification technique, based on an interactive data mining approach, which integrates a human cognitive model in a data mining loop. A mining control agent emulating human analysts is developed, which directly interacts with the data miner, analyzing and verifying the output of the data miner and controlling the data mining process. Additionally, an artificial neural network method, which is adopted as a core component of the proposed interactive data mining method, is evolved by adding a novelty detecting and retraining function for handling complicated nuclear power plant quake-proof data. Plant quake-proof testing data has been applied to the system to show the validation of the proposed method. (author)
Development of an Enhanced Generic Data Mining Life Cycle (DMLC)

OpenAIRE

Hofmann, Markus; Tierney, Brendan

2017-01-01

Data mining projects are complex and have a high failure rate. In order to improve project management and success rates of such projects a life cycle is vital to the overall success of the project. This paper reports on a research project that was concerned with the life cycle development for large scale data mining projects. The paper provides a detailed view of the design and development of a generic data mining life cycle called DMLC. The life cycle aims to support all members of data mini...
High Performance Data mining by Genetic Neural Network

Directory of Open Access Journals (Sweden)

Dadmehr Rahbari

2013-10-01

Full Text Available Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Most methods for mining problems is based on artificial intelligence algorithms. Neural network optimization based on three basic parameters topology, weights and the learning rate is a powerful method. We introduce optimal method for solving this problem. In this paper genetic algorithm with mutation and crossover operators change the network structure and optimized that. Dataset used for our work is stroke disease with twenty features that optimized number of that achieved by new hybrid algorithm. Result of this work is very well incomparison with other similar method. Low present of error show that our method is our new approach to efficient, high-performance data mining problems is introduced.
SparkText: Biomedical Text Mining on Big Data Framework.

Science.gov (United States)

Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
The study on privacy preserving data mining for information security

Science.gov (United States)

Li, Xiaohui

2012-04-01

Privacy preserving data mining have a rapid development in a short year. But it still faces many challenges in the future. Firstly, the level of privacy has different definitions in different filed. Therefore, the measure of privacy preserving data mining technology protecting private information is not the same. So, it's an urgent issue to present a unified privacy definition and measure. Secondly, the most of research in privacy preserving data mining is presently confined to the theory study.

Some remarks on parallel data mining using a persistent object manager

International Nuclear Information System (INIS)

Araujo, Neil; Grossman, Robert; Hanley, David

1996-01-01

Our underlying assumption is that high performance data management will be as important as high performance computing by the beginning of the next millennium. Given this, data mining will take on increasing importance. In this paper, we discuss our experience with parallel data mining on an IBM SP-2, focusing on four issues which we feel are emerging as critical for data mining applications in general. (author)
Identifying Drug–Drug Interactions by Data Mining

DEFF Research Database (Denmark)

Hansen, Peter Wæde; Clemmensen, Line Katrine Harder; Sehested, Thomas S.G.

2016-01-01

Background—Knowledge about drug–drug interactions commonly arises from preclinical trials, from adverse drug reports, or based on knowledge of mechanisms of action. Our aim was to investigate whether drug–drug interactions were discoverable without prior hypotheses using data mining. We focused...... registries. Additionally, we discovered a few potentially novel interactions. This opens up for the use of data mining to discover unknown drug–drug interactions in cardiovascular medicine....... on warfarin–drug interactions as the prototype. Methods and Results—We analyzed altered prothrombin time (measured as international normalized ratio [INR]) after initiation of a novel prescription in previously INR-stable warfarin-treated patients with nonvalvular atrial fibrillation. Data sets were retrieved...
Predictive models in churn data mining: a review

OpenAIRE

García, David L.; Vellido Alcacena, Alfredo; Nebot Castells, M. Àngela

2007-01-01

The development of predictive models of customer abandonment plays a central role in any churn management strategy. These models can be developed using either qualitative approaches or can take a data-centred point of view. In the latter case, the use of Data Mining procedures and techniques can provide useful and actionable insights into the processes leading to abandonment. In this report, we provide a brief and structured review of some of the Data Mining approaches that have been put forw...
Data Mining and Homeland Security: An Overview

National Research Council Canada - National Science Library

Seifert, Jeffrey W

2008-01-01

.... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...
Data Mining and Homeland Security: An Overview

National Research Council Canada - National Science Library

Seifert, Jeffrey W

2007-01-01

.... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...
Data Mining and Homeland Security: An Overview

National Research Council Canada - National Science Library

Seifert, Jeffrey W

2006-01-01

.... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...
An XML-Enabled Data Mining Query Language XML-DMQL

NARCIS (Netherlands)

Feng, L.; Dillon, T.

2005-01-01

Inspired by the good work of Han et al. (1996) and Elfeky et al. (2001) on the design of data mining query languages for relational and object-oriented databases, in this paper, we develop an expressive XML-enabled data mining query language by extension of XQuery. We first describe some
Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques

Science.gov (United States)

Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M.; Anticoi, Hernán Francisco; Guash, Eduard

2018-01-01

An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector—either surface or underground mining—based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents. PMID:29518921
Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques

Directory of Open Access Journals (Sweden)

Lluís Sanmiquel

2018-03-01

Full Text Available An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector—either surface or underground mining—based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents.
A novel water quality data analysis framework based on time-series data mining.

Science.gov (United States)

Deng, Weihui; Wang, Guoyin

2017-07-01

The rapid development of time-series data mining provides an emerging method for water resource management research. In this paper, based on the time-series data mining methodology, we propose a novel and general analysis framework for water quality time-series data. It consists of two parts: implementation components and common tasks of time-series data mining in water quality data. In the first part, we propose to granulate the time series into several two-dimensional normal clouds and calculate the similarities in the granulated level. On the basis of the similarity matrix, the similarity search, anomaly detection, and pattern discovery tasks in the water quality time-series instance dataset can be easily implemented in the second part. We present a case study of this analysis framework on weekly Dissolve Oxygen time-series data collected from five monitoring stations on the upper reaches of Yangtze River, China. It discovered the relationship of water quality in the mainstream and tributary as well as the main changing patterns of DO. The experimental results show that the proposed analysis framework is a feasible and efficient method to mine the hidden and valuable knowledge from water quality historical time-series data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Advanced Data Mining of Leukemia Cells Micro-Arrays

OpenAIRE

Richard S. Segall; Ryan M. Pierce

2009-01-01

This paper provides continuation and extensions of previous research by Segall and Pierce (2009a) that discussed data mining for micro-array databases of Leukemia cells for primarily self-organized maps (SOM). As Segall and Pierce (2009a) and Segall and Pierce (2009b) the results of applying data mining are shown and discussed for the data categories of microarray databases of HL60, Jurkat, NB4 and U937 Leukemia cells that are also described in this article. First, a background section is pro...
Towards Cooperative Predictive Data Mining in Competitive Environments

Science.gov (United States)

Lisý, Viliam; Jakob, Michal; Benda, Petr; Urban, Štěpán; Pěchouček, Michal

We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.
Student Privacy and Educational Data Mining: Perspectives from Industry

Science.gov (United States)

Sabourin, Jennifer; Kosturko, Lucy; FitzGerald, Clare; McQuiggan, Scott

2015-01-01

While the field of educational data mining (EDM) has generated many innovations for improving educational software and student learning, the mining of student data has recently come under a great deal of scrutiny. Many stakeholder groups, including public officials, media outlets, and parents, have voiced concern over the privacy of student data…
Prediction of thermodynamic properties of refrigerants using data mining

International Nuclear Information System (INIS)

Kuecueksille, Ecir Ugur; Selbas, Resat; Sencan, Arzu

2011-01-01

The analysis of vapor compression refrigeration systems requires the availability of simple and efficient mathematical formulations for the determination of thermodynamic properties of refrigerants. The aim of this study is to determine thermodynamic properties as enthalpy, entropy and specific volume of alternative refrigerants using data mining method. Alternative refrigerants used in the study are R134a, R404a, R407c and R410a. The results obtained from data mining have been compared to actual data from the literature. The study shows that the data mining methodology is successfully applicable to determine enthalpy, entropy and specific volume values for any temperature and pressure of refrigerants. Therefore, computation time reduces and simulation of vapor compression refrigeration systems is fairly facilitated.
Visual Data Mining of Robot Performance Data, Phase II

Data.gov (United States)

National Aeronautics and Space Administration — We propose to design and develop VDM/RP, a visual data mining system that will enable analysts to acquire, store, query, analyze, and visualize recent and historical...
Web based parallel/distributed medical data mining using software agents

Energy Technology Data Exchange (ETDEWEB)

Kargupta, H.; Stafford, B.; Hamzaoglu, I.

1997-12-31

This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.
Data Mining Supercomputing with SAS JMP® Genomics

Directory of Open Access Journals (Sweden)

Richard S. Segall

2011-02-01

Full Text Available JMP® Genomics is statistical discovery software that can uncover meaningful patterns in high-throughput genomics and proteomics data. JMP® Genomics is designed for biologists, biostatisticians, statistical geneticists, and those engaged in analyzing the vast stores of data that are common in genomic research (SAS, 2009. Data mining was performed using JMP® Genomics on the two collections of microarray databases available from National Center for Biotechnology Information (NCBI for lung cancer and breast cancer. The Gene Expression Omnibus (GEO of NCBI serves as a public repository for a wide range of highthroughput experimental data, including the two collections of lung cancer and breast cancer that were used for this research. The results for applying data mining using software JMP® Genomics are shown in this paper with numerous screen shots.
SparkText: Biomedical Text Mining on Big Data Framework

Science.gov (United States)

He, Karen Y.; Wang, Kai

2016-01-01

Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
SparkText: Biomedical Text Mining on Big Data Framework.

Directory of Open Access Journals (Sweden)

Zhan Ye

Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
DATA MINING THE GALAXY ZOO MERGERS

Data.gov (United States)

National Aeronautics and Space Administration — DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the...

Kernel Methods for Mining Instance Data in Ontologies

Science.gov (United States)

Bloehdorn, Stephan; Sure, York

The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.
A Tools-Based Approach to Teaching Data Mining Methods

Science.gov (United States)

Jafar, Musa J.

2010-01-01

Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…
Experienced ethical issues of personalized data-mined media services

DEFF Research Database (Denmark)

Sørensen, Jannick Kirk

2008-01-01

This tentative PhD project description concerns the ethnographic examination of users’ experience of privacy issues and usability related to personalized data mined (web-) services for media content.......This tentative PhD project description concerns the ethnographic examination of users’ experience of privacy issues and usability related to personalized data mined (web-) services for media content....
Data processing in management of Dolni Rozinka uranium mines

International Nuclear Information System (INIS)

Benes, B.

1987-01-01

In 1985, a qualitative inovation was introduced of data processing by the commissioning of the EC 1026 computer with a terminal network and a remote data communication system. The design jobs which are being gradually implemented are mainly oriented to the creating of an automated information system for operative control of mining production, data preparation in mining plants, and to the personnel, wages, material consumptions, etc. areas. (J.B.)
[Aspects for data mining implementation in gerontology and geriatrics].

Science.gov (United States)

Mikhal'skiĭ, A I

2014-01-01

Current challenges facing theory and practice in ageing sciences need new methods of experimental data investigation. This is a result as of experimental basis developments in biological research, so of information technology progress. These achievements make it possible to use well proven in different fields of science and engineering data mining methods for tasks in gerontology and geriatrics. Some examples of data mining methods implementation in gerontology are presented.
Teaching Financial Data Mining using Stocks and Futures Contracts

Directory of Open Access Journals (Sweden)

Gary Boetticher

2005-06-01

Full Text Available Financial data mining models is considered to be "the hardest way to make easy money." Data miners are certainly motivated by the prospect of discovering a financial "Holy Grail." However, designing and implementing a successful model poses many intellectual challenges. These include securing and cleaning data; acquiring a sufficient amount of financial domain knowledge; bounding the complexity of the problem; and properly validating results. Teaching financial data mining is especially difficult due to the student's limited financial domain knowledge and the relatively short period (one semester for building financial models. This paper describes an application of a financial data mining term project based on Stock and E-Mini futures contracts and discusses "lessons learned" from assigning similar term projects over six different semesters. Results of each case study results are presented and discussed.
Research on forecast technology of mine gas emission based on fuzzy data mining (FDM)

Energy Technology Data Exchange (ETDEWEB)

Xu Chang-kai; Wang Yao-cai; Wang Jun-wei [CUMT, Xuzhou (China). School of Information and Electrical Engineering

2004-07-01

The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.
1st International Conference on Computational Intelligence in Data Mining

CERN Document Server

Behera, Himansu; Mandal, Jyotsna; Mohapatra, Durga

2015-01-01

The contributed volume aims to explicate and address the difficulties and challenges for the seamless integration of two core disciplines of computer science, i.e., computational intelligence and data mining. Data Mining aims at the automatic discovery of underlying non-trivial knowledge from datasets by applying intelligent analysis techniques. The interest in this research area has experienced a considerable growth in the last years due to two key factors: (a) knowledge hidden in organizations’ databases can be exploited to improve strategic and managerial decision-making; (b) the large volume of data managed by organizations makes it impossible to carry out a manual analysis. The book addresses different methods and techniques of integration for enhancing the overall goal of data mining. The book helps to disseminate the knowledge about some innovative, active research directions in the field of data mining, machine and computational intelligence, along with some current issues and applications of relate...
Traffic Flow Management: Data Mining Update

Science.gov (United States)

Grabbe, Shon R.

2012-01-01

This presentation provides an update on recent data mining efforts that have been designed to (1) identify like/similar days in the national airspace system, (2) cluster/aggregate national-level rerouting data and (3) apply machine learning techniques to predict when Ground Delay Programs are required at a weather-impacted airport
Engaging Business Students with Data Mining

Science.gov (United States)

Brandon, Dan

2016-01-01

The Economist calls it "a golden vein", and many business experts now say it is the new science of winning. Business and technologists have many names for this new science, "business intelligence" (BI), " data analytics," and "data mining" are among the most common. The job market for people skilled in this…
EXTRACTING KNOWLEDGE FROM DATA - DATA MINING

Directory of Open Access Journals (Sweden)

DIANA ELENA CODREANU

2011-04-01

Full Text Available Managers of economic organizations have at their disposal a large volume of information and practically facing an avalanche of information, but they can not operate studying reports containing detailed data volumes without a correlation because of the good an organization may be decided in fractions of time. Thus, to take the best and effective decisions in real time, managers need to have the correct information is presented quickly, in a synthetic way, but relevant to allow for predictions and analysis.This paper wants to highlight the solutions to extract knowledge from data, namely data mining. With this technology not only has to verify some hypotheses, but aims at discovering new knowledge, so that economic organization to cope with fierce competition in the market.
On the classification techniques in data mining for microarray data classification

Science.gov (United States)

Aydadenta, Husna; Adiwijaya

2018-03-01

Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Knowledge Discovery and Data Mining in Iran's Climatic Researches

Science.gov (United States)

Karimi, Mostafa

2013-04-01

Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for
A survey on Big Data Stream Mining

African Journals Online (AJOL)

pc

2018-03-05

Mar 5, 2018 ... huge amount of stream like telecommunication systems. So, there ... streams have many challenges for data mining algorithm design like using of ..... A. Bifet and R. Gavalda, "Learning from Time-Changing Data with. Adaptive ...
Combining Data Warehouse and Data Mining Techniques for Web Log Analysis

DEFF Research Database (Denmark)

Pedersen, Torben Bach; Jespersen, Søren; Thorhauge, Jesper

2008-01-01

a number of approaches thatcombine data warehousing and data mining techniques in order to analyze Web logs.After introducing the well-known click and session data warehouse (DW) schemas,the chapter presents the subsession schema, which allows fast queries on sequences...
A Mining Algorithm for Extracting Decision Process Data Models

Directory of Open Access Journals (Sweden)

Cristina-Claudia DOLEAN

2011-01-01

Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.
Workshop on Educational Data Mining @ ICALT07 (EDM@ICALT07)

NARCIS (Netherlands)

Beck, J.E.; Calders, T.; Pechenizkiy, M.; Viola, S.R.; Spector, J.M.; Sampson, D.G.; Okamoto, T.; Cerri, S.A.; Ueno, M.; Kashihara, A.

2007-01-01

The educational data mining workshop1 held in conjunction with the 7 IEEE International Conference on Advanced Learning Technologies (ICALT) in Niigata, Japan on July 18-20, 2007. EDM@ICALT07 continues the series of Workshops organized by the International Working Group on Educational Data Mining
Data Mining Process Optimization in Computational Multi-agent Systems

OpenAIRE

Kazík, O.; Neruda, R. (Roman)

2015-01-01

In this paper, we present an agent-based solution of metalearning problem which focuses on optimization of data mining processes. We exploit the framework of computational multi-agent systems in which various meta-learning problems have been already studied, e.g. parameter-space search or simple method recommendation. In this paper, we examine the effect of data preprocessing for machine learning problems. We perform the set of experiments in the search-space of data mining processes which is...
Post-acquisition data mining techniques for LC-MS/MS-acquired data in drug metabolite identification.

Science.gov (United States)

Dhurjad, Pooja Sukhdev; Marothu, Vamsi Krishna; Rathod, Rajeshwari

2017-08-01

Metabolite identification is a crucial part of the drug discovery process. LC-MS/MS-based metabolite identification has gained widespread use, but the data acquired by the LC-MS/MS instrument is complex, and thus the interpretation of data becomes troublesome. Fortunately, advancements in data mining techniques have simplified the process of data interpretation with improved mass accuracy and provide a potentially selective, sensitive, accurate and comprehensive way for metabolite identification. In this review, we have discussed the targeted (extracted ion chromatogram, mass defect filter, product ion filter, neutral loss filter and isotope pattern filter) and untargeted (control sample comparison, background subtraction and metabolomic approaches) post-acquisition data mining techniques, which facilitate the drug metabolite identification. We have also discussed the importance of integrated data mining strategy.
Data mining: childhood injury control and beyond.

Science.gov (United States)

Tepas, Joseph J

2009-08-01

Data mining is defined as the automatic extraction of useful, often previously unknown information from large databases or data sets. It has become a major part of modern life and is extensively used in industry, banking, government, and health care delivery. The process requires a data collection system that integrates input from multiple sources containing critical elements that define outcomes of interest. Appropriately designed data mining processes identify and adjust for confounding variables. The statistical modeling used to manipulate accumulated data may involve any number of techniques. As predicted results are periodically analyzed against those observed, the model is consistently refined to optimize precision and accuracy. Whether applying integrated sources of clinical data to inferential probabilistic prediction of risk of ventilator-associated pneumonia or population surveillance for signs of bioterrorism, it is essential that modern health care providers have at least a rudimentary understanding of what the concept means, how it basically works, and what it means to current and future health care.

Spatial data mining of pipeline data provides new wave of O and M capital cost optimization opportunities

Energy Technology Data Exchange (ETDEWEB)

Richardson, D. [QM4 Engineering Ltd., Calgary, AB (Canada)

2010-07-01

This paper discussed the cost optimization benefits of spatial data mining in upstream oil and gas pipeline operations. The data mining method was used to enhance the characterization and management of internal corrosion risk and to optimize pipeline corrosion inhibition, as well as to identify pipeline network hydraulic bottlenecks. The data mining method formed part of a quality-based pipeline integrity management program. Results of the data mining study highlighted trends in well operational data and historical pipeline failure events. Use of the methodology resulted in significant savings. It was demonstrated that the key to a successful pipeline management model is a complete inventory characterization and determination of failure susceptibility profiles through the application of rigorous data standards. 4 tabs., 8 figs.
G-Tunnel welded tuff mining experiment data summary

International Nuclear Information System (INIS)

Zimmerman, R.M.; Bellman, R.A. Jr.; Mann, K.L.; Zerga, D.P.; Fowler, M.

1990-03-01

Designers and analysts of radioactive waste repositories must be ably to predict the mechanical behavior of the host rock. Sandia National Laboratories elected to conduct a mine-by in welded tuff so that predictive-type information could be obtained regarding the response of the rock to a drill and blast excavation process, where smooth blasting techniques were used. Included in the study were evaluations of and recommendations for various measurement systems that might be used in future mine by efforts. This report summarizes all of the data obtained in the welded tuff mining experiment. 6 refs., 29 figs., 12 tabs
SegMine workflows for semantic microarray data analysis in Orange4WS

Directory of Open Access Journals (Sweden)

Kulovesi Kimmo

2011-10-01

Full Text Available Abstract Background In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, knowledge discovery from diverse distributed data and knowledge sources (such as GO, KEGG, PubMed, and experimental databases. Specifically, cutting-edge data analysis approaches, such as semantic data mining, link discovery, and visualization, have not yet been made available to researchers investigating complex biological datasets. Results We present a new methodology, SegMine, for semantic analysis of microarray data by exploiting general biological knowledge, and a new workflow environment, Orange4WS, with integrated support for web services in which the SegMine methodology is implemented. The SegMine methodology consists of two main steps. First, the semantic subgroup discovery algorithm is used to construct elaborate rules that identify enriched gene sets. Then, a link discovery service is used for the creation and visualization of new biological hypotheses. The utility of SegMine, implemented as a set of workflows in Orange4WS, is demonstrated in two microarray data analysis applications. In the analysis of senescence in human stem cells, the use of SegMine resulted in three novel research hypotheses that could improve understanding of the underlying mechanisms of senescence and identification of candidate marker genes. Conclusions Compared to the available data analysis systems, SegMine offers improved hypothesis generation and data interpretation for bioinformatics in an easy-to-use integrated workflow environment.
Data Mining Gets Traction in Education

Science.gov (United States)

Sparks, Sarah D.

2011-01-01

The new and rapidly growing field of educational data mining is using the chaff from data collected through normal school activities to explore learning in more detail than ever, and researchers say the day when educators can make use of Amazon.com-like feedback on student learning behaviors may be closer than most people think. Educational data…
Clinical diabetes research using data mining: a Canadian perspective.

Science.gov (United States)

Shah, Baiju R; Lipscombe, Lorraine L

2015-06-01

With the advent of the digitization of large amounts of information and the computer power capable of analyzing this volume of information, data mining is increasingly being applied to medical research. Datasets created for administration of the healthcare system provide a wealth of information from different healthcare sectors, and Canadian provinces' single-payer universal healthcare systems mean that data are more comprehensive and complete in this country than in many other jurisdictions. The increasing ability to also link clinical information, such as electronic medical records, laboratory test results and disease registries, has broadened the types of data available for analysis. Data-mining methods have been used in many different areas of diabetes clinical research, including classic epidemiology, effectiveness research, population health and health services research. Although methodologic challenges and privacy concerns remain important barriers to using these techniques, data mining remains a powerful tool for clinical research. Copyright © 2015 Canadian Diabetes Association. Published by Elsevier Inc. All rights reserved.
DECISION SUPPORT SYSTEM TO SUPPORT DECISION PROCESSES WITH DATA MINING

OpenAIRE

Rupnik, Rok; Kukar, Matjaž

2007-01-01

Traditional techniques of data analysis do not enable the solution of all kind of problems and for that reason they have become insufficient. This caused a newinterdisciplinary field of data mining to arise, encompassing both classical statistical, and modern machine learning techniques to support the data analysis and knowledge discovery from data. Data mining methods are powerful in dealing with large quantities of data, but on the other hand they are difficult to master by business users t...
Separation in Data Mining Based on Fractal Nature of Data

Czech Academy of Sciences Publication Activity Database

Jiřina, Marcel; Jiřina jr., M.

2013-01-01

Roč. 3, č. 1 (2013), s. 44-60 ISSN 2225-658X Institutional support: RVO:67985807 Keywords : nearest neighbor * fractal set * multifractal * IINC method * correlation dimension Subject RIV: JC - Computer Hardware ; Software http://sdiwc.net/digital-library/separation-in-data-mining-based-on-fractal-nature-of-data.html
Towards the generic framework for utility considerations in data mining research

NARCIS (Netherlands)

Puuronen, S.; Pechenizkiy, M.; Soares, C.; Ghani, R.

2010-01-01

Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is
Clustering for data mining a data recovery approach

CERN Document Server

Mirkin, Boris

2005-01-01

Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods--K-Means for partitioning the data set and Ward's method for hierarchical clustering--have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids.Rather than the traditional set of ad hoc techniques, Clustering for Data Mining: A Data Recovery Approach presents a theory that not only closes gaps in K-Mean
International Conference on Computational Intelligence in Data Mining

CERN Document Server

Mohapatra, Durga

2017-01-01

The book presents high quality papers presented at the International Conference on Computational Intelligence in Data Mining (ICCIDM 2016) organized by School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India during December 10 – 11, 2016. The book disseminates the knowledge about innovative, active research directions in the field of data mining, machine and computational intelligence, along with current issues and applications of related topics. The volume aims to explicate and address the difficulties and challenges that of seamless integration of the two core disciplines of computer science. .
Large Scale Data Mining to Improve Usability of Data: An Intelligent Archive Testbed

Science.gov (United States)

Ramapriyan, Hampapuram; Isaac, David; Yang, Wenli; Morse, Steve

2005-01-01

Research in certain scientific disciplines - including Earth science, particle physics, and astrophysics - continually faces the challenge that the volume of data needed to perform valid scientific research can at times overwhelm even a sizable research community. The desire to improve utilization of this data gave rise to the Intelligent Archives project, which seeks to make data archives active participants in a knowledge building system capable of discovering events or patterns that represent new information or knowledge. Data mining can automatically discover patterns and events, but it is generally viewed as unsuited for large-scale use in disciplines like Earth science that routinely involve very high data volumes. Dozens of research projects have shown promising uses of data mining in Earth science, but all of these are based on experiments with data subsets of a few gigabytes or less, rather than the terabytes or petabytes typically encountered in operational systems. To bridge this gap, the Intelligent Archives project is establishing a testbed with the goal of demonstrating the use of data mining techniques in an operationally-relevant environment. This paper discusses the goals of the testbed and the design choices surrounding critical issues that arose during testbed implementation.
Collaborative mining and transfer learning for relational data

Science.gov (United States)

Levchuk, Georgiy; Eslami, Mohammed

2015-06-01

Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.
Comparsion analysis of data mining models applied to clinical research in traditional Chinese medicine.

Science.gov (United States)

Zhao, Yufeng; Xie, Qi; He, Liyun; Liu, Baoyan; Li, Kun; Zhang, Xiang; Bai, Wenjing; Luo, Lin; Jing, Xianghong; Huo, Ruili

2014-10-01

To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine (TCM) diagnosis and therapy. Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies: symptoms, symptom patterns, herbs, and efficacy. Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes. The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared. By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.
Application of data mining in performance measures

Science.gov (United States)

Chan, Michael F. S.; Chung, Walter W.; Wong, Tai Sun

2001-10-01

This paper proposes a structured framework for exploiting data mining application for performance measures. The context is set in an airline company is illustrated for the use of such framework. The framework takes in consideration of how a knowledge worker interacts with performance information at the enterprise level to support them to make informed decision in managing the effectiveness of operations. A case study of applying data mining technology for performance data in an airline company is illustrated. The use of performance measures is specifically applied to assist in the aircraft delay management process. The increasingly dispersed and complex operations of airline operation put much strain on the part of knowledge worker in using search, acquiring and analyzing information to manage performance. One major problem faced with knowledge workers is the identification of root causes of performance deficiency. The large amount of factors involved in the analyze the root causes can be time consuming and the objective of applying data mining technology is to reduce the time and resources needed for such process. The increasing market competition for better performance management in various industries gives rises to need of the intelligent use of data. Because of this, the framework proposed here is very much generalizable to industries such as manufacturing. It could assist knowledge workers who are constantly looking for ways to improve operation effectiveness through new initiatives and the effort is required to be quickly done to gain competitive advantage in the marketplace.
Mining Risk Factors in RFID Baggage Tracking Data

DEFF Research Database (Denmark)

Ahmed, Tanvir; Calders, Toon; Pedersen, Torben Bach

2015-01-01

and frustration to the passengers. To remedy these problems we propose a detailed methodology for mining risk factors from Radio Frequency Identification (RFID) baggage tracking data. The factors should identify potential issues in the baggage management. However, the baggage tracking data are low level...... and not directly accessible for finding such factors. Moreover, baggage tracking data are highly imbalanced, for example, our experimental data, which is a large real-world data set from the Scandinavian countries, contains only 0.8% mishandled bags. This imbalance presents difficulties to most data mining...... techniques. The paper presents detailed steps for pre-processing the unprocessed raw tracking data for higher-level analysis and handling the imbalance problem. We fragment the data set based on a number of relevant factors and find the best classifier for each of them. The paper reports on a comprehensive...
Compass: A hybrid method for clinical and biobank data mining

DEFF Research Database (Denmark)

Krysiak-Baltyn, Konrad; Petersen, Thomas Nordahl; Audouze, Karine Marie Laure

2014-01-01

We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply...... Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically...... significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we...
Statistically significant relational data mining :

Energy Technology Data Exchange (ETDEWEB)

Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

2014-02-01

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Advances in machine learning and data mining for astronomy

CERN Document Server

Way, Michael J

2012-01-01

Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health
Comparative genomics using data mining tools

Indian Academy of Sciences (India)

We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and ...
Process mining data science in action

CERN Document Server

van der Aalst, Wil

2016-01-01

The first to cover this missing link between data mining and process modeling, this book provides real-world techniques for monitoring and analyzing processes in real time. It is a powerful new tool destined to play a key role in business process management.

Review of Data Mining Techniques for Churn Prediction in Telecom

OpenAIRE

Vishal Mahajan; Richa Misra; Renuka Mahajan

2015-01-01

Telecommunication sector generates a huge amount of data due to increasing number of subscribers, rapidly renewable technologies; data based applications and other value added service. This data can be usefully mined for churn analysis and prediction. Significant research had been undertaken by researchers worldwide to understand the data mining practices that can be used for predicting customer churn. This paper provides a review of around 100 recent journal articles starting from year 2000 ...
Relational XES: Data management for process mining

NARCIS (Netherlands)

Dongen, van B.F.; Shabani, S.; Grabis, J.; Sandkuhl, K.

2015-01-01

Information systems log data during the execution of business processes in so called "event logs". Process mining aims to improve business processes by extracting knowledge from event logs. Currently, the de-facto standard for storing and managing event data, XES, is tailored towards sequential
Relational XES : data management for process mining

NARCIS (Netherlands)

Dongen, van B.F.; Shabani, S.

2015-01-01

Information systems log data during the execution of business processes in so called "event logs". Process mining aims to improve business processes by extracting knowledge from event logs. Currently, the de-facto standard for storing and managing event data, XES, is tailored towards sequential
Characterization of a mine fire using atmospheric monitoring system sensor data.

Science.gov (United States)

Yuan, L; Thomas, R A; Zhou, L

2017-06-01

Atmospheric monitoring systems (AMS) have been widely used in underground coal mines in the United States for the detection of fire in the belt entry and the monitoring of other ventilation-related parameters such as airflow velocity and methane concentration in specific mine locations. In addition to an AMS being able to detect a mine fire, the AMS data have the potential to provide fire characteristic information such as fire growth - in terms of heat release rate - and exact fire location. Such information is critical in making decisions regarding fire-fighting strategies, underground personnel evacuation and optimal escape routes. In this study, a methodology was developed to calculate the fire heat release rate using AMS sensor data for carbon monoxide concentration, carbon dioxide concentration and airflow velocity based on the theory of heat and species transfer in ventilation airflow. Full-scale mine fire experiments were then conducted in the Pittsburgh Mining Research Division's Safety Research Coal Mine using an AMS with different fire sources. Sensor data collected from the experiments were used to calculate the heat release rates of the fires using this methodology. The calculated heat release rate was compared with the value determined from the mass loss rate of the combustible material using a digital load cell. The experimental results show that the heat release rate of a mine fire can be calculated using AMS sensor data with reasonable accuracy.
Data mining application in industrial energy audit for lighting

Energy Technology Data Exchange (ETDEWEB)

Maricar, N.M.; Kim, G.C.; Jamal, N. [Kolej Univ., Melaka (Malaysia). Faculty of Electrical Engineering

2005-07-01

A data mining application for lighting energy audits at industrial sites was presented. Data collection was based on the parameters needed for the analysis part of the audit. Data collection included the activity for which the room was used; its dimension; light level readings in lux; the number of luminaries; the number of lamps per luminaries; lamp fixtures; and lamp wattage. The lumen method was used to calculate the recommended numbers of luminaries in the room. The number was then compared with the existing system's luminaries. The installed load efficacy ratio (ILER) was then used to determine proper retrofit action to maximize energy usage. The difference between the calculated lux and the standard lux was used to create data subsets. A data mining algorithm was used to determine that the ILER plays an important role in calculating the efficiency of lighting systems. It was also concluded that the method can be used to minimize the time needed to analyze large amounts of lighting data. The results of case studies were also used to show that the combined data mining algorithm provided accurate assessments using existing calculated data. 7 refs., 8 tabs., 5 figs.
Review of Data Mining Techniques for Churn Prediction in Telecom

Directory of Open Access Journals (Sweden)

Vishal Mahajan

2015-12-01

service. This data can be usefully mined for churn analysis and prediction. Significant research had been undertaken by researchers worldwide to understand the data mining practices that can be used for predicting customer churn. This paper provides a review of around 100 recent journal articles starting from year 2000 to present the various data mining techniques used in multiple customer based churn models. It then summarizes the existing telecom literature by highlighting the sample size used, churn variables employed and the findings of different DM techniques. Finally, we list the most popular techniques for churn prediction in telecom as decision trees, regression analysis and clustering, thereby providing a roadmap to new researchers to build upon novel churn management models.
Fuzzy C-Means Clustering Model Data Mining For Recognizing Stock Data Sampling Pattern

Directory of Open Access Journals (Sweden)

Sylvia Jane Annatje Sumarauw

2007-06-01

Full Text Available Abstract Capital market has been beneficial to companies and investor. For investors, the capital market provides two economical advantages, namely deviden and capital gain, and a non-economical one that is a voting .} hare in Shareholders General Meeting. But, it can also penalize the share owners. In order to prevent them from the risk, the investors should predict the prospect of their companies. As a consequence of having an abstract commodity, the share quality will be determined by the validity of their company profile information. Any information of stock value fluctuation from Jakarta Stock Exchange can be a useful consideration and a good measurement for data analysis. In the context of preventing the shareholders from the risk, this research focuses on stock data sample category or stock data sample pattern by using Fuzzy c-Me, MS Clustering Model which providing any useful information jar the investors. lite research analyses stock data such as Individual Index, Volume and Amount on Property and Real Estate Emitter Group at Jakarta Stock Exchange from January 1 till December 31 of 204. 'he mining process follows Cross Industry Standard Process model for Data Mining (CRISP,. DM in the form of circle with these steps: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. At this modelling process, the Fuzzy c-Means Clustering Model will be applied. Data Mining Fuzzy c-Means Clustering Model can analyze stock data in a big database with many complex variables especially for finding the data sample pattern, and then building Fuzzy Inference System for stimulating inputs to be outputs that based on Fuzzy Logic by recognising the pattern. Keywords: Data Mining, AUz..:y c-Means Clustering Model, Pattern Recognition
A Framework for Investigating Influence of Organizational Decision Makers on Data Mining Process Achievement

Directory of Open Access Journals (Sweden)

Hanieh Hajisafari

2012-02-01

Full Text Available Currently, few studies deal with evaluation of data mining plans in context of solvng organizational problems. A successful data miner is searching to solve a fully defined business problem. To make the data mining (DM results actionable, the data miner must explain them to the business insider. The interaction process between the business insiders and data miners is actually a knowledge-sharing process. In this study through representing a framwork, influence of organizational decision makers on data mining process and results investigated. By investigating research literature, the critical success factors of data mining plans was identified and the role of organizational decision makers in each step of data mining was investigated.‌ Then, the conceptual framework of influence of organizational decision makers on data mining process achievement was designed. By getting expert opinions, the proposed framework was analyzed and evantually designed the final framework of influence of organizational decision makers on data mining process achievement. Analysis of experts opinions showed that by knowledge sharing of data ming results with decision makers, "learning", "action or internalization" and "enforcing/unlearning" will become as critical success factors. Also, results of examining importance of decision makers' feedback on data mining steps showed that getting feedback from decision makers could have most influence on "knowledge extraction and representing model" step and least on "data cleaning and preprocessing" step.
Stratified sampling design based on data mining.

Science.gov (United States)

Kim, Yeonkook J; Oh, Yoonhwan; Park, Sunghoon; Cho, Sungzoon; Park, Hayoung

2013-09-01

To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.
Data Mining: A Hybrid Methodology for Complex and Dynamic Research

Science.gov (United States)

Lang, Susan; Baehr, Craig

2012-01-01

This article provides an overview of the ways in which data and text mining have potential as research methodologies in composition studies. It introduces data mining in the context of the field of composition studies and discusses ways in which this methodology can complement and extend our existing research practices by blending the best of what…
An Application of Multithreaded Data Mining in Educational Leadership Research

OpenAIRE

Fikis, David; Wang, Yinying; Bowers, Alex

2015-01-01

This study aims to apply high-performance computing to educational leadership research. Specifically, we applied an array of data acquisition and analytical techniques to the field of educational leadership research, including text data mining, probabiblistic topic modeling, and the use of software (CasperJS, GNU utilities, R, etc.) as well as hardware (the VELA batch computer and the multi-threaded data mining environment).
Data mining with SPSS modeler theory, exercises and solutions

CERN Document Server

Wendler, Tilo

2016-01-01

Introducing the IBM SPSS Modeler, this book guides readers through data mining processes and presents relevant statistical methods. There is a special focus on step-by-step tutorials and well-documented examples that help demystify complex mathematical algorithms and computer programs. The variety of exercises and solutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. While intended for students, the simplicity of the Modeler makes the book useful for anyone wishing to learn about basic and more advanced data mining, and put this knowledge into practice.
Real-Time Clinical Decision Support System with Data Stream Mining

Directory of Open Access Journals (Sweden)

Yang Zhang

2012-01-01

Full Text Available This research aims to describe a new design of data stream mining system that can analyze medical data stream and make real-time prediction. The motivation of the research is due to a growing concern of combining software technology and medical functions for the development of software application that can be used in medical field of chronic disease prognosis and diagnosis, children healthcare, diabetes diagnosis, and so forth. Most of the existing software technologies are case-based data mining systems. They only can analyze finite and structured data set and can only work well in their early years and can hardly meet today's medical requirement. In this paper, we describe a clinical-support-system based data stream mining technology; the design has taken into account all the shortcomings of the existing clinical support systems.
Data Mining Relationships Among Urban Socioeconomic, Land Cover, and Remotely Sensed Ecological Data

Science.gov (United States)

Mennis, J.; Wessman, C.; Golubiewski, N.

2003-12-01

This research investigates the relationships among socioeconomic character, land cover, and ecological function in a rapidly urbanizing region, the Front Range of Colorado. We use novel spatial geographic information systems- (GIS-) based data integration and data mining techniques to integrate and analyze diverse spatial data sets. These data include elevation data, transportation data, land cover data derived from aerial photography, block group-level U.S. Census data, and vegetation greenness (NDVI) data derived from Landsat imagery. These data are used to derive a variety of U.S. block group-level variables indicating demographic, geographic, ecological, and land cover characteristics. We employ spatial association rule mining, decision tree induction, and spatial on-line analytical processing (OLAP), in addition to more conventional multivariate statistical techniques, to investigate relationships among these variables.
Building a Classification Model for Enrollment In Higher Educational Courses using Data Mining Techniques

OpenAIRE

Saini, Priyanka

2014-01-01

Data Mining is the process of extracting useful patterns from the huge amount of database and many data mining techniques are used for mining these patterns. Recently, one of the remarkable facts in higher educational institute is the rapid growth data and this educational data is expanding quickly without any advantage to the educational management. The main aim of the management is to refine the education standard; therefore by applying the various data mining techniques on this data one ca...
An Intelligent Agent based Architecture for Visual Data Mining

OpenAIRE

Hamdi Ellouzi; Hela Ltifi; Mounir Ben Ayed

2016-01-01

the aim of this paper is to present an intelligent architecture of Decision Support System (DSS) based on visual data mining. This architecture applies the multi-agent technology to facilitate the design and development of DSS in complex and dynamic environment. Multi-Agent Systems add a high level of abstraction. To validate the proposed architecture, it is implemented to develop a distributed visual data mining based DSS to predict nosocomial infectionsoccurrence in intensive care units. Th...
Data mining methods for quality assurance in an environmental monitoring network

NARCIS (Netherlands)

Athanasiadis, Ioannis N.; Rizzoli, Andrea Emilio; Beard, Daniel W.

2010-01-01

The paper presents a system architecture that employs data mining techniques for ensuring quality assurance in an environmental monitoring network. We investigate how data mining techniques can be incorporated in the quality assurance decision making process. As prior expert decisions are
A Review of Financial Accounting Fraud Detection based on Data Mining Techniques

Science.gov (United States)

Sharma, Anuj; Kumar Panigrahi, Prabin

2012-02-01

With an upsurge in financial accounting fraud in the current economic scenario experienced, financial accounting fraud detection (FAFD) has become an emerging topic of great importance for academic, research and industries. The failure of internal auditing system of the organization in identifying the accounting frauds has lead to use of specialized procedures to detect financial accounting fraud, collective known as forensic accounting. Data mining techniques are providing great aid in financial accounting fraud detection, since dealing with the large data volumes and complexities of financial data are big challenges for forensic accounting. This paper presents a comprehensive review of the literature on the application of data mining techniques for the detection of financial accounting fraud and proposes a framework for data mining techniques based accounting fraud detection. The systematic and comprehensive literature review of the data mining techniques applicable to financial accounting fraud detection may provide a foundation to future research in this field. The findings of this review show that data mining techniques like logistic models, neural networks, Bayesian belief network, and decision trees have been applied most extensively to provide primary solutions to the problems inherent in the detection and classification of fraudulent data.
Vlsi implementation of flexible architecture for decision tree classification in data mining

Science.gov (United States)

Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

2017-07-01

The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
2nd International Conference on Soft Computing and Data Mining

CERN Document Server

Ghazali, Rozaida; Nawi, Nazri; Deris, Mustafa

2017-01-01

This book provides a comprehensive introduction and practical look at the concepts and techniques readers need to get the most out of their data in real-world, large-scale data mining projects. It also guides readers through the data-analytic thinking necessary for extracting useful knowledge and business value from the data. The book is based on the Soft Computing and Data Mining (SCDM-16) conference, which was held in Bandung, Indonesia on August 18th–20th 2016 to discuss the state of the art in soft computing techniques, and offer participants sufficient knowledge to tackle a wide range of complex systems. The scope of the conference is reflected in the book, which presents a balance of soft computing techniques and data mining approaches. The two constituents are introduced to the reader systematically and brought together using different combinations of applications and practices. It offers engineers, data analysts, practitioners, scientists and managers the insights into the concepts, tools and techni...

Classification of Internet banking customers using data mining algorithms

Directory of Open Access Journals (Sweden)

Reza Radfar

2014-03-01

Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.
Interestingness of association rules in data mining: Issues relevant ...

Indian Academy of Sciences (India)

R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

mental changes in many spheres of our daily life. .... concentrate on association rule mining since it features as one of the main data mining tech- ..... years, a lot of work has been done in defining and quantifying 'interestingness. .... a critical effect on both, selection of interesting events and variation of interestingness thresh-.
Educational Data Mining Application for Estimating Students Performance in Weka Environment

Science.gov (United States)

Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

2017-11-01

Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.
Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis.

Science.gov (United States)

Hu, Zhi-Liang; Reecy, James M; Wu, Xiao-Lin

2012-01-01

A database can be used to warehouse quantitative trait loci (QTL) data from multiple sources for comparison, genomic data mining, and meta-analysis. A robust database design involves sound data structure logistics, meaningful data transformations, normalization, and proper user interface designs. This chapter starts with a brief review of relational database basics and concentrates on issues associated with curation of QTL data into a relational database, with emphasis on the principles of data normalization and structure optimization. In addition, some simple examples of QTL data mining and meta-analysis are included. These examples are provided to help readers better understand the potential and importance of sound database design.
Data mining and visualization techniques

Science.gov (United States)

Wong, Pak Chung [Richland, WA; Whitney, Paul [Richland, WA; Thomas, Jim [Richland, WA

2004-03-23

Disclosed are association rule identification and visualization methods, systems, and apparatus. An association rule in data mining is an implication of the form X.fwdarw.Y where X is a set of antecedent items and Y is the consequent item. A unique visualization technique that provides multiple antecedent, consequent, confidence, and support information is disclosed to facilitate better presentation of large quantities of complex association rules.
Data Mining – Innovative Method for Obtaining Information in Marketingand Business Management

Directory of Open Access Journals (Sweden)

Mirela-Cristina Voicu

2011-05-01

Full Text Available The existence of massive amounts of data raised the question of using their reorientation to a retrospective to a prospective operation. Data mining offers the promise of an important aid for discovering hidden patterns in data that can be used to predict the behavior of customers, products and processes. Data mining tools must be guided by users who understand the business, the general nature of the data and analytical methods involved. It discovers information within the data that queries and reports can’t effectively reveal. It is vital to collect data and prepare properly, to face reality models. Choosing the most appropriate product data mining is to find a tool with the capabilities required, an interface that matches the skills of users and can be applied in a specific business problem. In this context, the purpose of this paper is to illustrate some of the problems of company activity problems which can be solved by using data mining techniques.
Visual data mining for developing competitive strategies in higher education

OpenAIRE

Ertek, Gürdal; Ertek, Gurdal

2009-01-01

Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive benchmarking, and planning for High School Relationship Management (HSRM). In this paper the f...
Model architecture of intelligent data mining oriented urban transportation information

Science.gov (United States)

Yang, Bogang; Tao, Yingchun; Sui, Jianbo; Zhang, Feizhou

2007-06-01

Aiming at solving practical problems in urban traffic, the paper presents model architecture of intelligent data mining from hierarchical view. With artificial intelligent technologies used in the framework, the intelligent data mining technology improves, which is more suitable for the change of real-time road condition. It also provides efficient technology support for the urban transport information distribution, transmission and display.
Maternal vaccination and preterm birth: using data mining as a screening tool

DEFF Research Database (Denmark)

Orozova-Bekkevold, Ivanka; Jensen, Henrik; Stensballe, Lone

2007-01-01

Objective The main purpose of this study was to identify possible associations between medicines used in pregnancy and preterm deliveries using data mining as a screening tool. Settings Prospective cohort study. Methods We used data mining to identify possible correlates between preterm delivery...... measure Preterm birth, a delivery occurring before the 259th day of gestation (i.e., less than 37 full weeks). Results Data mining had indicated that maternal vaccination (among other factors) might be related to preterm birth. The following regression analysis showed that, the women who reported being...... further studies. Data mining, especially with additional refinements, may be a valuable and very efficient tool to screen large databases for relevant information which can be used in clinical and public health research....
Data Mining in Course Management Systems: Moodle Case Study and Tutorial

Science.gov (United States)

Romero, Cristobal; Ventura, Sebastian; Garcia, Enrique

2008-01-01

Educational data mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. This work is a survey of the specific application of data mining in learning management systems and a case study tutorial with the Moodle system. Our objective is to introduce it both…
Educational Data Mining Acceptance among Undergraduate Students

Science.gov (United States)

Wook, Muslihah; Yusof, Zawiyah M.; Nazri, Mohd Zakree Ahmad

2017-01-01

The acceptance of Educational Data Mining (EDM) technology is on the rise due to, its ability to extract new knowledge from large amounts of students' data. This knowledge is important for educational stakeholders, such as policy makers, educators, and students themselves to enhance efficiency and achievements. However, previous studies on EDM…
Mining on Big Data Using Hadoop MapReduce Model

Science.gov (United States)

Salman Ahmed, G.; Bhattacharya, Sweta

2017-11-01

Customary parallel calculations for mining nonstop item create opportunity to adjust stack of similar data among hubs. The paper aims to review this process by analyzing the critical execution downside of the common parallel recurrent item-set mining calculations. Given a larger than average dataset, data apportioning strategies inside the current arrangements endure high correspondence and mining overhead evoked by repetitive exchanges transmitted among registering hubs. We tend to address this downside by building up a learning apportioning approach referred as Hadoop abuse using the map-reduce programming model. All objectives of Hadoop are to zest up the execution of parallel recurrent item-set mining on Hadoop bunches. Fusing the comparability metric and furthermore the locality-sensitive hashing procedure, Hadoop puts to a great degree comparative exchanges into an information segment to lift neighborhood while not making AN exorbitant assortment of excess exchanges. We tend to execute Hadoop on a 34-hub Hadoop bunch, driven by a decent change of datasets made by IBM quest market-basket manufactured data generator. Trial uncovers the fact that Hadoop contributes towards lessening system and processing masses by the uprightness of dispensing with excess exchanges on Hadoop hubs. Hadoop impressively outperforms and enhances the other models considerably.
Quality of research results in agro-economy by data mining

Directory of Open Access Journals (Sweden)

Vukelić Gordana

2015-01-01

Full Text Available Data Mining (DM through data in agroeconomy is a scientific method that enables researchers not to go through set research scenarioes that are predetermined assumptions and hypotheses on the basis of insignificant atributes. On the contrary, by data mining detection of these atributes is made possible, in general, those hiden facts that enable setting a hypothesis. The DM method does this by an iterative way, including key atributes and factors and their influence on the quality of agro-resources. The research was conducted on a random sample, by analyzing the quality of eggs. The research subject is the posibility of classifying and predicting significant variablesatributes that determine the level of egg quality. The research starts from the use of Data Mining, as an area of machine studies, which significantly helps researchers in optimizing research. The applied methodology during research includes analyticalsintetic procedures and methods of Data Mining, with a special focus on using Supervised linear discrimination analysis and the Decision Tree. The results indicate significant posibilities of using DM as an additional analytical procedure in performing agroresearch and it can be concluded that it contributes to an improvement in effectiveness and validity of process in performing these researches.
Temporal data mining for hospital management

Science.gov (United States)

Tsumoto, Shusaku; Hirano, Shoji

2009-04-01

It has passed about twenty years since clinical information are stored electronically as a hospital information system since 1980's. Stored data include from accounting information to laboratory data and even patient records are now started to be accumulated: in other words, a hospital cannot function without the information system, where almost all the pieces of medical information are stored as multimedia databases. In this paper, we applied temporal data mining and exploratory data analysis techniques to hospital management data. The results show several interesting results, which suggests that the reuse of stored data will give a powerful tool for hospial management.
A Case Study for Student Performance Analysis based on Educational Data Mining (EDM)

OpenAIRE

Daxa Kundariya; Prof. Vaseem Ghada

2016-01-01

Educational Data Mining (EDM) is a study methodology and an application of data mining techniques related to student’s data from academic database. Like other domain, educational domain also produce vast amount of studying data. To enhance the quality of education system student performance analysis plays an important role for decision support. This paper elaborates a study on various Educational data mining technique and how they could be used to educational system to analysis student perfor...
A systematic review of data mining and machine learning for air pollution epidemiology.

Science.gov (United States)

Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

2017-11-28

Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air
Electronic structure prediction via data-mining the empirical pseudopotential method

Energy Technology Data Exchange (ETDEWEB)

Zenasni, H; Aourag, H [LEPM, URMER, Departement of Physics, University Abou Bakr Belkaid, Tlemcen 13000 (Algeria); Broderick, S R; Rajan, K [Department of Materials Science and Engineering, Iowa State University, Ames, Iowa 50011-2230 (United States)

2010-01-15

We introduce a new approach for accelerating the calculation of the electronic structure of new materials by utilizing the empirical pseudopotential method combined with data mining tools. Combining data mining with the empirical pseudopotential method allows us to convert an empirical approach to a predictive approach. Here we consider tetrahedrally bounded III-V Bi semiconductors, and through the prediction of form factors based on basic elemental properties we can model the band structure and charge density for these semi-conductors, for which limited results exist. This work represents a unique approach to modeling the electronic structure of a material which may be used to identify new promising semi-conductors and is one of the few efforts utilizing data mining at an electronic level. (Abstract Copyright [2010], Wiley Periodicals, Inc.)
Data mining goes multidimensional.

Science.gov (United States)

Hettler, M

1997-03-01

The success of a healthcare organization depends on its ability to acquire, store, analyze and compare data across many parts of the enterprise, by many individuals. While relational databases have been around since the 1970s, their two-dimensional structure has limited--or made impossible--the kind of cross-dimensional trend analysis so necessary to healthcare today. Enter online analytical processing (OLAP), in which servers store data in multiple dimensions, opening a world of opportunity for data-mining across the enterprise. In this issue of HEALTHCARE INFORMATICS, we feature our first report from the National Software Testing Laboratories (NSTL) about technologies that will change the way healthcare does business. A division of The McGraw-Hill Companies, NSTL is an independent software and hardware testing lab offering services that include compatibility testing, bug testing, comparison testing, documentation evaluation and usability.
Data Mining SIAM Presentation

Science.gov (United States)

Srivastava, Ashok; McIntosh, Dawn; Castle, Pat; Pontikakis, Manos; Diev, Vesselin; Zane-Ulman, Brett; Turkov, Eugene; Akella, Ram; Xu, Zuobing; Kumaresan, Sakthi Preethi

2006-01-01

This viewgraph document describes the data mining system developed at NASA Ames. Many NASA programs have large numbers (and types) of problem reports.These free text reports are written by a number of different people, thus the emphasis and wording vary considerably With so much data to sift through, analysts (subject experts) need help identifying any possible safety issues or concerns and help them confirm that they haven't missed important problems. Unsupervised clustering is the initial step to accomplish this; We think we can go much farther, specifically, identify possible recurring anomalies. Recurring anomalies may be indicators of larger systemic problems. The requirement to identify these anomalies has led to the development of Recurring Anomaly Discovery System (ReADS).
Data mining for the identification of metabolic syndrome status.

Science.gov (United States)

Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

2018-01-01

Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS.

Data mining for the identification of metabolic syndrome status

Science.gov (United States)

Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

2018-01-01

Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS. PMID:29383020
Explaining and predicting workplace accidents using data-mining techniques

International Nuclear Information System (INIS)

Rivas, T.; Paz, M.; Martin, J.E.; Matias, J.M.; Garcia, J.F.; Taboada, J.

2011-01-01

Current research into workplace risk is mainly conducted using conventional descriptive statistics, which, however, fail to properly identify cause-effect relationships and are unable to construct models that could predict accidents. The authors of the present study modelled incidents and accidents in two companies in the mining and construction sectors in order to identify the most important causes of accidents and develop predictive models. Data-mining techniques (decision rules, Bayesian networks, support vector machines and classification trees) were used to model accident and incident data compiled from the mining and construction sectors and obtained in interviews conducted soon after an incident/accident occurred. The results were compared with those for a classical statistical techniques (logistic regression), revealing the superiority of decision rules, classification trees and Bayesian networks in predicting and identifying the factors underlying accidents/incidents.
Explaining and predicting workplace accidents using data-mining techniques

Energy Technology Data Exchange (ETDEWEB)

Rivas, T., E-mail: trivas@uvigo.e [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Paz, M., E-mail: mpaz.minas@gmail.co [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Martin, J.E., E-mail: jmartin@cippinternacional.co [CIPP International, S.L. Parque Tecnologico de Asturias, Parcela 43, Oficina 11, 33428 Llanera (Spain); Matias, J.M., E-mail: jmmatias@uvigo.e [Dpto. Estadistica e Investigacion Operativa, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Garcia, J.F., E-mail: jgarcia@cippinternacional.co [CIPP International, S.L. Parque Tecnologico de Asturias, Parcela 43, Oficina 11, 33428 Llanera (Spain); Taboada, J., E-mail: jtaboada@uvigo.e [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain)

2011-07-15

Current research into workplace risk is mainly conducted using conventional descriptive statistics, which, however, fail to properly identify cause-effect relationships and are unable to construct models that could predict accidents. The authors of the present study modelled incidents and accidents in two companies in the mining and construction sectors in order to identify the most important causes of accidents and develop predictive models. Data-mining techniques (decision rules, Bayesian networks, support vector machines and classification trees) were used to model accident and incident data compiled from the mining and construction sectors and obtained in interviews conducted soon after an incident/accident occurred. The results were compared with those for a classical statistical techniques (logistic regression), revealing the superiority of decision rules, classification trees and Bayesian networks in predicting and identifying the factors underlying accidents/incidents.
The Potentials of Educational Data Mining for Researching Metacognition, Motivation and Self-Regulated Learning

Science.gov (United States)

Winne, Philip H.; Baker, Ryan S. J. D.

2013-01-01

Our article introduces the "Journal of Educational Data Mining's" Special Issue on Educational Data Mining on Motivation, Metacognition, and Self-Regulated Learning. We outline general research challenges for data mining researchers who conduct investigations in these areas, the potential of EDM to advance research in this area, and…
A Quantitative Analysis of Organizational Factors That Relate to Data Mining Success

Science.gov (United States)

Huebner, Richard A.

2017-01-01

The ubiquity of data in various forms has fueled the need for advanced data-mining techniques within organizations. The advent of data mining methods used to uncover hidden nuggets of information buried within large data sets has also fueled the need for determining how these unique projects can be successful. There are many challenges associated…
Near-line Archive Data Mining at the Goddard Distributed Active Archive Center

Science.gov (United States)

Pham, L.; Mack, R.; Eng, E.; Lynnes, C.

2002-12-01

NASA's Earth Observing System (EOS) is generating immense volumes of data, in some cases too much to provide to users with data-intensive needs. As an alternative to moving the data to the user and his/her research algorithms, we are providing a means to move the algorithms to the data. The Near-line Archive Data Mining (NADM) system is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web data mining portal to the EOS Data and Information System (EOSDIS) data pool, a 50-TB online disk cache. The NADM web portal enables registered users to submit and execute data mining algorithm codes on the data in the EOSDIS data pool. A web interface allows the user to access the NADM system. The users first develops personalized data mining code on their home platform and then uploads them to the NADM system. The C, FORTRAN and IDL languages are currently supported. The user developed code is automatically audited for any potential security problems before it is installed within the NADM system and made available to the user. Once the code has been installed the user is provided a test environment where he/she can test the execution of the software against data sets of the user's choosing. When the user is satisfied with the results, he/she can promote their code to the "operational" environment. From here the user can interactively run his/her code on the data available in the EOSDIS data pool. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the EOSDIS data pool. The generated mined data products are then made available for FTP pickup. The NADM system uses the GES DAAC-developed Simple Scalable Script-based Science Processor (S4P) to automate tasks and perform the actual data processing. Users will also have the option of selecting a DAAC-provided data mining algorithm and using it to process the data of their choice.
Proposta de reflexão teórica e análise de padrões conceituais com data mining Theoretical discussion and conceptual pattern analysis with data mining

Directory of Open Access Journals (Sweden)

Álvaro Machado Dias

2011-08-01

Full Text Available Mais do que uma teoria ou modelo, a Teoria da Mente se refere a um campo de estudos voltado à habilidade de se prospectar intenções alheias. Visando contribuir para a discussão teórica e a interpretação da literatura no tema, o presente estudo apresenta: 1. Um mapa conceitual do campo, baseado em data mining/text mining; 2. Uma abordagem conceitual inovadora e mais eficiente aos estudos de ToM informacional; 3. Uma discussão crítica da extensão e limites dos principais modelos, baseada na análise da literatura com data/text mining e nas perspectivas teóricas anteriormente alinhavadas.More than just a theory or a model, Theory of Mind represents a field of studies concerned with the ability to prospect someone else's intentions. Aiming to contribute to theoretical discussion and the interpretation of the literature on the matter, this study presents: 1. A conceptual map of the field, based on data mining/text mining techniques; 2. A new and advanced conceptual framework focused on informational ToM studies; 3. A critical discussion of the extensions and limits of the most prominent models, based on the outputs of the data/text mining analysis and on the theoretical perspectives that were previously raised.
An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

OpenAIRE

Yogendra Kumar Jain,; Vinod Kumar Yadav,; Geetika S. Panday

2011-01-01

The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and cru...
A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

OpenAIRE

Hongwei Tian; Weining Zhang; Shouhuai Xu; Patrick Sharkey

2012-01-01

Privacy-preserving data mining (PPDM) is an important problem and is currently studied in three approaches: the cryptographic approach, the data publishing, and the model publishing. However, each of these approaches has some problems. The cryptographic approach does not protect privacy of learned knowledge models and may have performance and scalability issues. The data publishing, although is popular, may suffer from too much utility loss for certain types of data mining applications. The m...
Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

OpenAIRE

Lahiru Iddamalgoda; Partha Sarathi Das; Partha Sarathi Das; Achala Aponso; Vijayaraghava Seshadri Sundararajan; Prashanth Suravajhala; Prashanth Suravajhala; Prashanth Suravajhala; Jayaraman K Valadi

2016-01-01

Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern r...
Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

OpenAIRE

Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

2016-01-01

Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited ...
Developing and Implementing the Data Mining Algorithms in RAVEN

International Nuclear Information System (INIS)

Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian

2015-01-01

The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.
Developing and Implementing the Data Mining Algorithms in RAVEN

Energy Technology Data Exchange (ETDEWEB)

Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

2015-09-01

The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.
Applying data mining techniques to improve diagnosis in neonatal jaundice

Directory of Open Access Journals (Sweden)

Ferreira Duarte

2012-12-01

Full Text Available Abstract Background Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies. Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. Methods This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology. This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa – EPE, from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer. Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48 and neural networks (multilayer perceptron. The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. Results The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. Conclusions The findings of our study sustain that, new approaches, such as data mining, may support
Improving clinical decision support using data mining techniques

Science.gov (United States)

Burn-Thornton, Kath E.; Thorpe, Simon I.

1999-02-01

Physicians, in their ever-demanding jobs, are looking to decision support systems for aid in clinical diagnosis. However, clinical decision support systems need to be of sufficiently high accuracy that they help, rather than hinder, the physician in his/her diagnosis. Decision support systems with accuracies, of patient state determination, of greater than 80 percent, are generally perceived to be sufficiently accurate to fulfill the role of helping the physician. We have previously shown that data mining techniques have the potential to provide the underpinning technology for clinical decision support systems. In this paper, an extension of the work in reverence 2, we describe how changes in data mining methodologies, for the analysis of 12-lead ECG data, improve the accuracy by which data mining algorithms determine which patients are suffering from heart disease. We show that the accuracy of patient state prediction, for all the algorithms, which we investigated, can be increased by up to 6 percent, using the combination of appropriate test training ratios and 5-fold cross-validation. The use of cross-validation greater than 5-fold, appears to reduce the improvement in algorithm classification accuracy gained by the use of this validation method. The accuracy of 84 percent in patient state predictions, obtained using the algorithm OCI, suggests that this algorithm will be capable of providing the required accuracy for clinical decision support systems.
The First International Conference on Soft Computing and Data Mining

CERN Document Server

Ghazali, Rozaida; Deris, Mustafa

2014-01-01

This book constitutes the refereed proceedings of the First International Conference on Soft Computing and Data Mining, SCDM 2014, held in Universiti Tun Hussein Onn Malaysia, in June 16th-18th, 2014. The 65 revised full papers presented in this book were carefully reviewed and selected from 145 submissions, and organized into two main topical sections; Data Mining and Soft Computing. The goal of this book is to provide both theoretical concepts and, especially, practical techniques on these exciting fields of soft computing and data mining, ready to be applied in real-world applications. The exchanges of views pertaining future research directions to be taken in this field and the resultant dissemination of the latest research findings makes this work of immense value to all those having an interest in the topics covered.
Data mining for signals in spontaneous reporting databases: proceed with caution.

Science.gov (United States)

Stephenson, Wendy P; Hauben, Manfred

2007-04-01

To provide commentary and points of caution to consider before incorporating data mining as a routine component of any Pharmacovigilance program, and to stimulate further research aimed at better defining the predictive value of these new tools as well as their incremental value as an adjunct to traditional methods of post-marketing surveillance. Commentary includes review of current data mining methodologies employed and their limitations, caveats to consider in the use of spontaneous reporting databases and caution against over-confidence in the results of data mining. Future research should focus on more clearly delineating the limitations of the various quantitative approaches as well as the incremental value that they bring to traditional methods of pharmacovigilance.
TargetMine, an integrated data warehouse for candidate gene prioritisation and target discovery.

Directory of Open Access Journals (Sweden)

Yi-An Chen

Full Text Available Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/.
SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES

Directory of Open Access Journals (Sweden)

H Benjamin Fredrick David

2017-04-01

Full Text Available Data Mining is the procedure which includes evaluating and examining large pre-existing databases in order to generate new information which may be essential to the organization. The extraction of new information is predicted using the existing datasets. Many approaches for analysis and prediction in data mining had been performed. But, many few efforts has made in the criminology field. Many few have taken efforts for comparing the information all these approaches produce. The police stations and other similar criminal justice agencies hold many large databases of information which can be used to predict or analyze the criminal movements and criminal activity involvement in the society. The criminals can also be predicted based on the crime data. The main aim of this work is to perform a survey on the supervised learning and unsupervised learning techniques that has been applied towards criminal identification. This paper presents the survey on the Crime analysis and crime prediction using several Data Mining techniques.
Predicting Software Projects Cost Estimation Based on Mining Historical Data

OpenAIRE

Najadat, Hassan; Alsmadi, Izzat; Shboul, Yazan

2012-01-01

In this research, a hybrid cost estimation model is proposed to produce a realistic prediction model that takes into consideration software project, product, process, and environmental elements. A cost estimation dataset is built from a large number of open source projects. Those projects are divided into three domains: communication, finance, and game projects. Several data mining techniques are used to classify software projects in terms of their development complexity. Data mining techniqu...

On the Suitability of Genetic-Based Algorithms for Data Mining

NARCIS (Netherlands)

Choenni, R.S.

1998-01-01

Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search
Extracting software static defect models using data mining

Directory of Open Access Journals (Sweden)

Ahmed H. Yousef

2015-03-01

Full Text Available Large software projects are subject to quality risks of having defective modules that will cause failures during the software execution. Several software repositories contain source code of large projects that are composed of many modules. These software repositories include data for the software metrics of these modules and the defective state of each module. In this paper, a data mining approach is used to show the attributes that predict the defective state of software modules. Software solution architecture is proposed to convert the extracted knowledge into data mining models that can be integrated with the current software project metrics and bugs data in order to enhance the prediction. The results show better prediction capabilities when all the algorithms are combined using weighted votes. When only one individual algorithm is used, Naïve Bayes algorithm has the best results, then the Neural Network and the Decision Trees algorithms.
Study and application of data mining and data warehouse in CIMS

Science.gov (United States)

Zhou, Lijuan; Liu, Chi; Liu, Daxin

2003-03-01

The interest in analyzing data has grown tremendously in recent years. To analyze data, a multitude of technologies is need, namely technologies from the fields of Data Warehouse, Data Mining, On-line Analytical Processing (OLAP). This paper gives a new architecture of data warehouse in CIMS according to CRGC-CIMS application engineering. The data source of this architecture comes from database of CRGC-CIMS system. The data is put in global data set by extracting, filtrating and integrating, and then the data is translated to data warehouse according information request. We have addressed two advantages of the new model in CRGC-CIMS application. In addition, a Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support, OLAP queries or data mining. It is important to select the right view to materialize that answer a given set of queries. In this paper, we also have designed algorithms for selecting a set of views to be materialized in a data warehouse in order to answer the most queries under the constraint of given space. First, we give a cost model for selecting materialized views. Then we give the algorithms that adopt gradually recursive method from bottom to top. We give description and realization of algorithms. Finally, we discuss the advantage and shortcoming of our approach and future work.
InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data.

Science.gov (United States)

Smith, Richard N; Aleksic, Jelena; Butano, Daniela; Carr, Adrian; Contrino, Sergio; Hu, Fengyuan; Lyne, Mike; Lyne, Rachel; Kalderimis, Alex; Rutherford, Kim; Stepan, Radek; Sullivan, Julie; Wakeling, Matthew; Watkins, Xavier; Micklem, Gos

2012-12-01

InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of 'widgets' performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages. Freely available from http://www.intermine.org under the LGPL license. g.micklem@gen.cam.ac.uk Supplementary data are available at Bioinformatics online.
Statistical and Visualization Data Mining Tools for Foundry Production

Directory of Open Access Journals (Sweden)

M. Perzyk

2007-07-01

Full Text Available In recent years a rapid development of a new, interdisciplinary knowledge area, called data mining, is observed. Its main task is extracting useful information from previously collected large amount of data. The main possibilities and potential applications of data mining in manufacturing industry are characterized. The main types of data mining techniques are briefly discussed, including statistical, artificial intelligence, data base and visualization tools. The statistical methods and visualization methods are presented in more detail, showing their general possibilities, advantages as well as characteristic examples of applications in foundry production. Results of the author’s research are presented, aimed at validation of selected statistical tools which can be easily and effectively used in manufacturing industry. A performance analysis of ANOVA and contingency tables based methods, dedicated for determination of the most significant process parameters as well as for detection of possible interactions among them, has been made. Several numerical tests have been performed using simulated data sets, with assumed hidden relationships as well some real data, related to the strength of ductile cast iron, collected in a foundry. It is concluded that the statistical methods offer relatively easy and fairly reliable tools for extraction of that type of knowledge about foundry manufacturing processes. However, further research is needed, aimed at explanation of some imperfections of the investigated tools as well assessment of their validity for more complex tasks.
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

Science.gov (United States)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Virtual Observatories, Data Mining, and Astroinformatics

Science.gov (United States)

Borne, Kirk

The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential
An application of data mining in district heating substations for improving energy performance

Science.gov (United States)

Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing

2017-11-01

Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.
Application of Data Mining for Card Fraud Detection

Directory of Open Access Journals (Sweden)

I.V. Andrianov

2012-03-01

Full Text Available This paper focuses on implementing Data Mining methods for card fraud detection. The approach to classification and prediction tasks for detection of unauthorized transactions is considered.
Data mining approach to model the diagnostic service management.

Science.gov (United States)

Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

2006-01-01

Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services.
Advances in research methods for information systems research data mining, data envelopment analysis, value focused thinking

CERN Document Server

Osei-Bryson, Kweku-Muata

2013-01-01

Advances in social science research methodologies and data analytic methods are changing the way research in information systems is conducted. New developments in statistical software technologies for data mining (DM) such as regression splines or decision tree induction can be used to assist researchers in systematic post-positivist theory testing and development. Established management science techniques like data envelopment analysis (DEA), and value focused thinking (VFT) can be used in combination with traditional statistical analysis and data mining techniques to more effectively explore
Analyzing Log Files using Data-Mining

Directory of Open Access Journals (Sweden)

Marius Mihut

2008-01-01

Full Text Available Information systems (i.e. servers, applications and communication devices create a large amount of monitoring data that are saved as log files. For analyzing them, a data-mining approach is helpful. This article presents the steps which are necessary for creating an ‘analyzing instrument’, based on an open source software called Waikato Environment for Knowledge Analysis (Weka [1]. For exemplification, a system log file created by a Windows-based operating system, is used as input file.
The use of data mining by private health insurance companies and customers' privacy.

Science.gov (United States)

Al-Saggaf, Yeslam

2015-07-01

This article examines privacy threats arising from the use of data mining by private Australian health insurance companies. Qualitative interviews were conducted with key experts, and Australian governmental and nongovernmental websites relevant to private health insurance were searched. Using Rationale, a critical thinking tool, the themes and considerations elicited through this empirical approach were developed into an argument about the use of data mining by private health insurance companies. The argument is followed by an ethical analysis guided by classical philosophical theories-utilitarianism, Mill's harm principle, Kant's deontological theory, and Helen Nissenbaum's contextual integrity framework. Both the argument and the ethical analysis find the use of data mining by private health insurance companies in Australia to be unethical. Although private health insurance companies in Australia cannot use data mining for risk rating to cherry-pick customers and cannot use customers' personal information for unintended purposes, this article nonetheless concludes that the secondary use of customers' personal information and the absence of customers' consent still suggest that the use of data mining by private health insurance companies is wrong.
toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research.

Science.gov (United States)

Rhee, David B; Croken, Matthew McKnight; Shieh, Kevin R; Sullivan, Julie; Micklem, Gos; Kim, Kami; Golden, Aaron

2015-01-01

Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. © The Author(s) 2015. Published by Oxford University Press.
Data Mining as a Service (DMaaS)

Science.gov (United States)

Tejedor, E.; Piparo, D.; Mascetti, L.; Moscicki, J.; Lamanna, M.; Mato, P.

2016-10-01

Data Mining as a Service (DMaaS) is a software and computing infrastructure that allows interactive mining of scientific data in the cloud. It allows users to run advanced data analyses by leveraging the widely adopted Jupyter notebook interface. Furthermore, the system makes it easier to share results and scientific code, access scientific software, produce tutorials and demonstrations as well as preserve the analyses of scientists. This paper describes how a first pilot of the DMaaS service is being deployed at CERN, starting from the notebook interface that has been fully integrated with the ROOT analysis framework, in order to provide all the tools for scientists to run their analyses. Additionally, we characterise the service backend, which combines a set of IT services such as user authentication, virtual computing infrastructure, mass storage, file synchronisation, development portals or batch systems. The added value acquired by the combination of the aforementioned categories of services is discussed, focusing on the opportunities offered by the CERNBox synchronisation service and its massive storage backend, EOS.
Mining the Kepler Data using Machine Learning

Science.gov (United States)

Walkowicz, Lucianne; Howe, A. R.; Nayar, R.; Turner, E. L.; Scargle, J.; Meadows, V.; Zee, A.

2014-01-01

Kepler's high cadence and incredible precision has provided an unprecedented view into stars and their planetary companions, revealing both expected and novel phenomena and systems. Due to the large number of Kepler lightcurves, the discovery of novel phenomena in particular has often been serendipitous in the course of searching for known forms of variability (for example, the discovery of the doubly pulsating elliptical binary KOI-54, originally identified by the transiting planet search pipeline). In this talk, we discuss progress on mining the Kepler data through both supervised and unsupervised machine learning, intended to both systematically search the Kepler lightcurves for rare or anomalous variability, and to create a variability catalog for community use. Mining the dataset in this way also allows for a quantitative identification of anomalous variability, and so may also be used as a signal-agnostic form of optical SETI. As the Kepler data are exceptionally rich, they provide an interesting counterpoint to machine learning efforts typically performed on sparser and/or noisier survey data, and will inform similar characterization carried out on future survey datasets.
Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance

Directory of Open Access Journals (Sweden)

Kuo-Chung Lin

2012-04-01

Full Text Available The health insurance claims application case the inspection usually relies on experts’ experience for verification and experienced personnel in charge for checking. However, due to the heavy work load and the insufficiency of manpower and experience, the ratio of miscarriages of justice is high, leading to improper settlement of claims and the waste of social resources. This paper takes advantage of data-mining technology to design models and find out cases requiring for manual inspection so as to save time and manpower. Six models are designed in this paper. By the analysis of the 20/80 principle and the coverage and accuracy ratio, a great number of periodic data (over 2 million records are fed back to the data-mining models after repetitive verification. Also, it is discovered that to integrate the data-mining technology and feed back to different business stages so as to establish early warning system will be an important topic for the health insurance system in hospital’s EMR in the future. Meanwhile, as the information acquired by data-mining needs to be stored and the traditional database technology has limitations. Next time, this paper explores the ontology framework to be set up by semantic network technology in the future in order to assist the storage of knowledge gained by data-mining.
Data Preparation for Web Mining – A survey

OpenAIRE

Amog Rajenderan

2012-01-01

An accepted trend is to categorize web mining intothree main areas: web content mining, webstructure mining and web usage mining. Webcontent mining involves extractingdetails/information from the contents of webpagesand performing things like knowledge synthesis.Web structure mining involves the usage of graphtheory to understand website structure/hierarchy.Web usage mining involves the mining of usefulinformation from things like server logs, tounderstand what the user does while on the inte...
4th International conference on Knowledge Discovery and Data Mining

CERN Document Server

Knowledge Discovery and Data Mining

2012-01-01

The volume includes a set of selected papers extended and revised from the 4th International conference on Knowledge Discovery and Data Mining, March 1-2, 2011, Macau, Chin. This Volume is to provide a forum for researchers, educators, engineers, and government officials involved in the general areas of knowledge discovery and data mining and learning to disseminate their latest research results and exchange views on the future research directions of these fields. 108 high-quality papers are included in the volume.
Visual cues for data mining

Science.gov (United States)

Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

1996-04-01

This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

Analysis of data on radon monitoring and dose estimates for uranium mines

International Nuclear Information System (INIS)

Khan, A.H.; Srivastava, G.K.; Jha, Shankar; Sagar, D.V.

1994-01-01

Radon progeny are the major contributors to the radiation dose to uranium miners. Monitoring for radon and gamma radiation is an integral part of radiation protection in such mines. Data for equilibrium equivalent radon and the estimated mean annual doses are presented in this paper for Jaduguda uranium mine from 1986 to 1992. The 1992 data for Jaduguda and Bhatin mines are compared. The average annual effective dose for uranium miners is estimated at around 15.5 mSv. (author). 1 ref., 2 figs
From data mining rules to medical logical modules and medical advices.

Science.gov (United States)

Gomoi, Valentin; Vida, Mihaela; Robu, Raul; Stoicu-Tivadar, Vasile; Bernad, Elena; Lupşe, Oana

2013-01-01

Using data mining in collaboration with Clinical Decision Support Systems adds new knowledge as support for medical diagnosis. The current work presents a tool which translates data mining rules supporting generation of medical advices to Arden Syntax formalism. The developed system was tested with data related to 2326 births that took place in 2010 at the Bega Obstetrics - Gynaecology Hospital, Timişoara. Based on processing these data, 14 medical rules regarding the Apgar score were generated and then translated in Arden Syntax language.
Use of Recurrent Neural Networks for Strategic Data Mining of Sales

OpenAIRE

Vadhavkar, Sanjeev; Shanmugasundaram, Jayavel; Gupta, Amar; Prasad, M.V. Nagendra

2002-01-01

An increasing number of organizations are involved in the development of strategic information systems for effective linkages with their suppliers, customers, and other channel partners involved in transportation, distribution, warehousing and maintenance activities. An efficient inter-organizational inventory management system based on data mining techniques is a significant step in this direction. This paper discusses the use of neural network based data mining and knowledge discovery techn...
A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications

Science.gov (United States)

Grossman, Robert L.; Northcutt, Dave

1996-01-01

Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.
Recommending Learning Activities in Social Network Using Data Mining Algorithms

Science.gov (United States)

Mahnane, Lamia

2017-01-01

In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…
Meta-mining: a meta-learning framework to support the recommendation, planning and optimization of data mining workflows

OpenAIRE

Nguyen, Phong

2015-01-01

La fouille de données ou data mining peut être un processus extrêmement complexe dans lequel le data miner doit assembler dans un ﬂux de travail un nombre d’opérateurs de traitement des données et d’analyse aﬁn d’accomplir sa tâche. Aﬁn de supporter le data miner dans la modélisation de son processus de découverte de connaissances, nous proposons un nouveau cadre de travail que nous appelons meta-mining ou méta-apprentissage orienté processus et qui étend de manière signiﬁcative l’état de l’a...
Data mining in bioinformatics using Weka.

Science.gov (United States)

Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

2004-10-12

The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.
Studies of MHD stability using data mining technique in helical plasmas

International Nuclear Information System (INIS)

Yamamoto, Satoshi; Pretty, David; Blackwell, Boyd

2010-01-01

Data mining techniques, which automatically extract useful knowledge from large datasets, are applied to multichannel magnetic probe signals of several helical plasmas in order to identify and classify MHD instabilities in helical plasmas. This method is useful to find new MHD instabilities as well as previously identified ones. Moreover, registering the results obtained from data mining in a database allows us to investigate the characteristics of MHD instabilities with parameter studies. We introduce the data mining technique consisted of pre-processing, clustering and visualizations using results from helical plasmas in H-1 and Heliotron J. We were successfully able to classify the MHD instabilities using the criterion of phase differences of each magnetic probe and identify them as energetic-ion-driven MHD instabilities using parameter study in Heliotron J plasmas. (author)
Effect of Temporal Relationships in Associative Rule Mining for Web Log Data

Science.gov (United States)

Mohd Khairudin, Nazli; Mustapha, Aida

2014-01-01

The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality. PMID:24587757
INFRASTRUCTURE FOR INTEGRATED DATA ENVIRONMENTS AND ANALYSIS (IIDEA) FOR MINING AND PROCESSING SYSTEMS

Energy Technology Data Exchange (ETDEWEB)

Dessureault, Sean

2007-06-29

Almost all the high-production businesses face a problem of having terabytes of data but very little information is extracted from them. Efforts are being made continuously to bring the raw data into a usable format so that the meaningful information can be inferred. Once the knowledge discovery is done, proper action can be taken accordingly. The data mining and process modeling approach are used in many business sectors to better understand the process interactions within production chains by analyzing huge data repositories. A decade of intense investment in information technology by mining companies as resulted in vast quantities of underutilized data. Other industries have undergone fundamental changes through the innovative application of IT and business intelligence. This project was to undertake the investigation of the tools and techniques that would bring such data mining and requisite business processes to the mining industry. Phase I of this project was to establish the research infrastructure for Phase II and to pilot the tools and techniques through the development of an Energy Consumption Model (ECM) to predict the energy consumption in the material handling processes based on the key input variables like distance, elevation, tons hauled etc. Data mining techniques that can extract meaningful information from a raw data is available. The model developed as part of this research is an example of how energy consumption can be estimated from fundamental data.
Data mining utilizando redes neuronales

OpenAIRE

Ale, Juan María; Bot, Romina Laura

2004-01-01

Las Redes Neuronales son ampliamente utilizadas para tareas relacionadas con reconocimiento de patrones y clasificación. Aunque son clasificadores muy precisos, no son comúnmente utilizadas para Data Mining porque producen modelos de aprendizaje inexplicables. El algoritmo TREPAN extrae hipótesis explicables de una Red Neuronal entrenada. Las hipótesis producidas por el algoritmo se representan con un árbol de decisión que aproxima a la red. Los árboles de decisión extraídos por TREPAN no pue...
Drug safety data mining with a tree-based scan statistic.

Science.gov (United States)

Kulldorff, Martin; Dashevsky, Inna; Avery, Taliser R; Chan, Arnold K; Davis, Robert L; Graham, David; Platt, Richard; Andrade, Susan E; Boudreau, Denise; Gunter, Margaret J; Herrinton, Lisa J; Pawloski, Pamala A; Raebel, Marsha A; Roblin, Douglas; Brown, Jeffrey S

2013-05-01

In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.
Parallel object-oriented data mining system

Science.gov (United States)

Kamath, Chandrika; Cantu-Paz, Erick

2004-01-06

A data mining system uncovers patterns, associations, anomalies and other statistically significant structures in data. Data files are read and displayed. Objects in the data files are identified. Relevant features for the objects are extracted. Patterns among the objects are recognized based upon the features. Data from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) sky survey was used to search for bent doubles. This test was conducted on data from the Very Large Array in New Mexico which seeks to locate a special type of quasar (radio-emitting stellar object) called bent doubles. The FIRST survey has generated more than 32,000 images of the sky to date. Each image is 7.1 megabytes, yielding more than 100 gigabytes of image data in the entire data set.
Using data mining techniques to characterize participation in observational studies.

Science.gov (United States)

Linden, Ariel; Yarnold, Paul R

2016-12-01

Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention. © 2016 John Wiley & Sons, Ltd.
Geographical Information System Model for Potential Mines Data Management Presentation in Kabupaten Gorontalo

Science.gov (United States)

Roviana, D.; Tajuddin, A.; Edi, S.

2017-03-01

Mining potential in Indonesian is very abundant, ranging from Sabang to Marauke. Kabupaten Gorontalo is one of many places in Indonesia that have different types of minerals and natural resources that can be found in every district. The abundant of mining potential must be balanced with good management and ease of getting information by investors. The current issue is, (1) ways of presenting data/information about potential mines area is still manually (the maps that already capture from satellite image, then printed and attached to information board in the office) it caused the difficulties of getting information; (2) the high cost of maps printing; (3) the difficulties of regency leader (bupati) to obtain information for strategic decision making about mining potential. The goal of this research is to build a model of Geographical Information System that could provide data management of potential mines, so that the investors could easily get information according to their needs. To achieve that goal Research and Development method is used. The result of this research, is a model of Geographical Information System that implemented in an application to presenting data management of mines.
Data mining algorithms for land cover change detection: a review

Indian Academy of Sciences (India)

Sangram Panigrahi

2017-11-24

Nov 24, 2017 ... values, poor quality measurement, high resolution and high dimensional data. The land cover .... These data sets also include quality assurance information, ...... 2012 A new data mining framework for forest fire mapping.
Briefly on the GUHA Method of Data Mining

Czech Academy of Sciences Publication Activity Database

Hájek, Petr

-, č. 3 (2003), s. 112-114 ISSN 1509-4553 R&D Projects: GA MŠk OC 274.001 Grant - others:COST(XE) Action 274 TARSKI Institutional research plan: AV0Z1030915 Keywords : GUHA method * data mining * exploratory data analuysis Subject RIV: BA - General Mathematics http://www.nit.eu/czasopisma/JTIT/2003/3/112.pdf
Data Mining in Earth System Science (DMESS 2011)

Science.gov (United States)

Forrest M. Hoffman; J. Walter Larson; Richard Tran Mills; Bhorn-Gustaf Brooks; Auroop R. Ganguly; William Hargrove; et al

2011-01-01

From field-scale measurements to global climate simulations and remote sensing, the growing body of very large and long time series Earth science data are increasingly difficult to analyze, visualize, and interpret. Data mining, information theoretic, and machine learning techniquesâsuch as cluster analysis, singular value decomposition, block entropy, Fourier and...
Mining Social Media and DBpedia Data Using Gephi and R

Directory of Open Access Journals (Sweden)

Sadiq HUSSAIN

2018-04-01

Full Text Available The big data is playing a big role in the field of machine learning and data mining. To extract meaningful and interesting information from big data mining is a challenge. The size of the data at social media and Wikipedia are increasing exponentially. To visualize such huge data is another aspect of big data. The roles of graphs are becoming important in case of visualization and modelling of such data. Gephi and R are two important visualization and exploration tools in this field. Using graph, one may find and calculate modularity, eccentricity, Indegree, Outdegree, betweenness centrality etc. In this paper, we had used Dbpedia, facebook and twitter datasets. We had used Gephi and R to look inside the structure of such data and comparing different statistics based on the graph by exploring the graphs.
Data Mining in Institutional Economics Tasks

Science.gov (United States)

Kirilyuk, Igor; Kuznetsova, Anna; Senko, Oleg

2018-02-01

The paper discusses problems associated with the use of data mining tools to study discrepancies between countries with different types of institutional matrices by variety of potential explanatory variables: climate, economic or infrastructure indicators. An approach is presented which is based on the search of statistically valid regularities describing the dependence of the institutional type on a single variable or a pair of variables. Examples of regularities are given.

Systematic Review of Data Mining Applications in Patient-Centered Mobile-Based Information Systems.

Science.gov (United States)

Fallah, Mina; Niakan Kalhori, Sharareh R

2017-10-01

Smartphones represent a promising technology for patient-centered healthcare. It is claimed that data mining techniques have improved mobile apps to address patients' needs at subgroup and individual levels. This study reviewed the current literature regarding data mining applications in patient-centered mobile-based information systems. We systematically searched PubMed, Scopus, and Web of Science for original studies reported from 2014 to 2016. After screening 226 records at the title/abstract level, the full texts of 92 relevant papers were retrieved and checked against inclusion criteria. Finally, 30 papers were included in this study and reviewed. Data mining techniques have been reported in development of mobile health apps for three main purposes: data analysis for follow-up and monitoring, early diagnosis and detection for screening purpose, classification/prediction of outcomes, and risk calculation (n = 27); data collection (n = 3); and provision of recommendations (n = 2). The most accurate and frequently applied data mining method was support vector machine; however, decision tree has shown superior performance to enhance mobile apps applied for patients' self-management. Embedded data-mining-based feature in mobile apps, such as case detection, prediction/classification, risk estimation, or collection of patient data, particularly during self-management, would save, apply, and analyze patient data during and after care. More intelligent methods, such as artificial neural networks, fuzzy logic, and genetic algorithms, and even the hybrid methods may result in more patients-centered recommendations, providing education, guidance, alerts, and awareness of personalized output.
Radiological data acquisition, investigation and evaluation of mining relics

International Nuclear Information System (INIS)

1992-01-01

Within the scope of a Federal Project, the environmental radioactivity and the radon concentration in buildings caused by mining relics in the new Federal Lands of Germany are investigated. In the first phase of the project, about 8000 relics of former mining were identified by analysing existing documents, categorised, and recorded in a special data bank. Thereby, 'areas of suspicion' of 1500 km 2 spaciously defined in the beginning could be reduced to 'areas of investigation' of 250 km 2 now to be examined in close coordination with the land and district authorities by a programme gradually adapted to the radiological significance of the relics. Experience with site-specific measuring programmes have already been gained through three pilot projects at typical sites of former mining activities. Recommendations of the German Radiation Protection Commission serve for the evaluation of the results. By the measuring programme for radon in buildings of mining and geological predestined regions more than 25000 buildings of 210 communities have been investigated. The results confirm the expected prevailing influence of the geologic underground on the radon concentration. Extreme values are observed where direct connections additionally exist to mining relics in the ground. (orig./HP) With 11 figs. in annex [de
Mining Diagnostic Assessment Data for Concept Similarity

Science.gov (United States)

Madhyastha, Tara; Hunt, Earl

2009-01-01

This paper introduces a method for mining multiple-choice assessment data for similarity of the concepts represented by the multiple choice responses. The resulting similarity matrix can be used to visualize the distance between concepts in a lower-dimensional space. This gives an instructor a visualization of the relative difficulty of concepts…
Data warehousing as a basis for web-based documentation of data mining and analysis.

Science.gov (United States)

Karlsson, J; Eklund, P; Hallgren, C G; Sjödin, J G

1999-01-01

In this paper we present a case study for data warehousing intended to support data mining and analysis. We also describe a prototype for data retrieval. Further we discuss some technical issues related to a particular choice of a patient record environment.
Data Mining and Optimization Tools for Developing Engine Parameters Tools

Science.gov (United States)

Dhawan, Atam P.

1998-01-01

This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.
Data Mining Tools Make Flights Safer, More Efficient

Science.gov (United States)

2014-01-01

A small data mining team at Ames Research Center developed a set of algorithms ideal for combing through flight data to find anomalies. Dallas-based Southwest Airlines Co. signed a Space Act Agreement with Ames in 2011 to access the tools, helping the company refine its safety practices, improve its safety reviews, and increase flight efficiencies.
Mining top-k frequent closed itemsets in data streams using sliding window

International Nuclear Information System (INIS)

Rehman, Z.; Shahbaz, M.

2013-01-01

Frequent itemset mining has become a popular research area in data mining community since the last few years. T here are two main technical hitches while finding frequent itemsets. First, to provide an appropriate minimum support value to start and user need to tune this minimum support value by running the algorithm again and again. Secondly, generated frequent itemsets are mostly numerous and as a result a number of association rules generated are also very large in numbers. Applications dealing with streaming environment need to process the data received at high rate, therefore, finding frequent itemsets in data streams becomes complex. In this paper, we present an algorithm to mine top-k frequent closed itemsets using sliding window approach from streaming data. We developed a single-pass algorithm to find frequent closed itemsets of length between user's defined minimum and maximum- length. To improve the performance of algorithm and to avoid rescanning of data, we have transformed data into bitmap based tree data structure. (author)
Modeling issues & choices in the data mining optimization ontology

CSIR Research Space (South Africa)

Keet, CM

2013-05-01

Full Text Available We describe the Data Mining Optimization Ontology (DMOP), which was developed to support informed decision-making at various choice points of the knowledge discovery (KD) process. It can be used as a reference by data miners, but its primary purpose...
Development of National Health Data Warehouse for Data Mining

Directory of Open Access Journals (Sweden)

Shahidul Islam Khan

2015-07-01

Full Text Available Health informatics is currently one of the top focuses of computer science researchers. Availability of timely and accurate data is essential for medical decision making. Health care organizations face a common problem with the large amount of data they have in numerous systems. Researchers, health care providers and patients will not be able to utilize the knowledge stored in different repositories unless amalgamate the information from disparate sources is done. This problem can be solved by Data warehousing. Data warehousing techniques share a common set of tasks, include requirements analysis, data design, architectural design, implementation and deployment. Developing health data warehouse is complex and time consuming but is also essential to deliver quality health services. This paper depicts prospects and complexities of health data warehousing and mining and illustrate a data-warehousing model suitable for integrating data from different health care sources to discover effective knowledge.
Data mining to detect clinical mastitis with automatic milking

NARCIS (Netherlands)

Kamphuis, C.; Mollenhorst, H.; Heesterbeek, J.A.P.; Hogeveen, H.

2010-01-01

Our objective was to use data mining to develop and validate a detection model for clinical mastitis (CM) using sensor data collected at nine Dutch dairy herds milking automatically. Sensor data was available for almost 3.5 million quarter milkings (QM) from 1,109 cows; 348 QM with CM were observed
Mining Personal Data Using Smartphones and Wearable Devices: A Survey

Science.gov (United States)

Rehman, Muhammad Habib ur; Liew, Chee Sun; Wah, Teh Ying; Shuja, Junaid; Daghighi, Babak

2015-01-01

The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific) data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices. PMID:25688592
Mining Personal Data Using Smartphones and Wearable Devices: A Survey

Directory of Open Access Journals (Sweden)

Muhammad Habib ur Rehman

2015-02-01

Full Text Available The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices.
Supplementary data: Eucalyptus microsatellites mined in silico ...

Indian Academy of Sciences (India)

Supplementary data: Eucalyptus microsatellites mined in silico: survey and evaluation. R. Yasodha, R. Sumathi, P. Chezhian, S. Kavitha and M. Ghosh. J. Genet. 87, XX-XX. Tm. CT. 2222. NA. 60 125. 192. Table 1. List of EST-SSR primers developed for E. globulus. No. of. Tm Product. Acc. no. SSR repeats. Forward primer.
3D Visual Data Mining: goals and experiences

DEFF Research Database (Denmark)

Bøhlen, Michael Hanspeter; Bukauskas, Linas; Eriksen, Poul Svante

2003-01-01

, statistical analyses, perceptual and cognitive psychology, and scientific visualization. At the conceptual level we offer perceptual and cognitive insights to guide the information visualization process. We then choose cluster surfaces to exemplify the data mining process, to discuss the tasks involved...
Data mining of air traffic control operational errors

Science.gov (United States)

2006-01-01

In this paper we present the results of : applying data mining techniques to identify patterns and : anomalies in air traffic control operational errors (OEs). : Reducing the OE rate is of high importance and remains a : challenge in the aviation saf...
Asymmetric threat data mining and knowledge discovery

Science.gov (United States)

Gilmore, John F.; Pagels, Michael A.; Palk, Justin

2001-03-01

Asymmetric threats differ from the conventional force-on- force military encounters that the Defense Department has historically been trained to engage. Terrorism by its nature is now an operational activity that is neither easily detected or countered as its very existence depends on small covert attacks exploiting the element of surprise. But terrorism does have defined forms, motivations, tactics and organizational structure. Exploiting a terrorism taxonomy provides the opportunity to discover and assess knowledge of terrorist operations. This paper describes the Asymmetric Threat Terrorist Assessment, Countering, and Knowledge (ATTACK) system. ATTACK has been developed to (a) data mine open source intelligence (OSINT) information from web-based newspaper sources, video news web casts, and actual terrorist web sites, (b) evaluate this information against a terrorism taxonomy, (c) exploit country/region specific social, economic, political, and religious knowledge, and (d) discover and predict potential terrorist activities and association links. Details of the asymmetric threat structure and the ATTACK system architecture are presented with results of an actual terrorist data mining and knowledge discovery test case shown.
Application Of Data Mining Techniques For Student Success And Failure Prediction The Case Of DebreMarkos University

OpenAIRE

Muluken Alemu Yehuala

2015-01-01

Abstract This research work has investigated the potential applicability of data mining technology to predict student success and failure cases on University students datasets. CRISP-DM Cross Industry Standard Process for Data mining is a data mining methodology to be used by the research. Classification and prediction data mining functionalities are used to extract hidden patterns from students data. These patterns can be seen in relation to different variables in the students records. The ...
Data Mining on Distributed Medical Databases: Recent Trends and Future Directions

Science.gov (United States)

Atilgan, Yasemin; Dogan, Firat

As computerization in healthcare services increase, the amount of available digital data is growing at an unprecedented rate and as a result healthcare organizations are much more able to store data than to extract knowledge from it. Today the major challenge is to transform these data into useful information and knowledge. It is important for healthcare organizations to use stored data to improve quality while reducing cost. This paper first investigates the data mining applications on centralized medical databases, and how they are used for diagnostic and population health, then introduces distributed databases. The integration needs and issues of distributed medical databases are described. Finally the paper focuses on data mining studies on distributed medical databases.
PERANCANGAN SISTEM PREDIKSI CHURN PELANGGAN PT. TELEKOMUNIKASI SELULER DENGAN MEMANFAATKAN PROSES DATA MINING

Directory of Open Access Journals (Sweden)

Rajesri Govindaraju

2008-01-01

Full Text Available The purpose of this research is to design a customer churn prediction system using data mining approach. This system is able to perform data integration, data cleaning, data transformation, sampling and data splitting, prediction model building, predicting customer churn, and show the results in certain agreed forms. Churn prediction variables were identified based on earlier research reports that include customer information, payment method, call pattern, complaint data, telecommunication services usage and change of telecommunication services usage behavior data. The preferred mining technique used is the classification with decision tree algorithm. The decision tree can present visual model which represents customer churn and non churn pattern behavior. This system was tested using Kartu Halo customer data in Bandung area and testing result showed 70,94% accuracy of the prediction model. Abstract in Bahasa Indonesia : Penelitian ini bertujuan merancang sistem prediksi churn pelanggan yang memanfaatkan proses data mining. Sistem yang dihasilkan dapat melakukan integrasi data, pembersihan data, transformasi data, sampling dan pemisahan data, konstruksi model prediksi, memprediksi churn pelanggan dan menampilkan hasil prediksi dalam format laporan tertentu yang diperlukan. Identifikasi variabel-variabel prediksi churn dilakukan berdasarkan model prediksi churn yang telah dikembangkan pada penelitian terdahulu yang antara lain mencakup informasi mengenai pelanggan, metode pembayaran, data percakapan, data penggunaan jenis-jenis layanan telekomunikasi dan data yang menggambarkan perubahan perilaku penggunaan layanan telekomunikasi tersebut. Teknik mining yang dipilih adalah teknik klasifikasi dengan algoritma decision tree. Decision tree menghasilkan model visual yang merepresentasikan pola perilaku pelanggan yang churn dan tidak churn. Uji coba sistem yang dilakukan menggunakan data pelanggan Kartu Halo daerah Bandung menghasilkan tingkat akurasi
Data Mining Thesis Topics in Finland

OpenAIRE

Bajo Rouvinen, Ari

2017-01-01

The Theseus open repository contains metadata about more than 100,000 thesis publications from the different universities of applied sciences in Finland. Different data mining techniques were applied to the Theseus dataset to build a web application to explore thesis topics and degree programmes using different libraries in Python and JavaScript. Thesis topics were extracted from manually annotated keywords by the authors and curated subjects by the librarians. During the project, the quality...

A Data Mining Approach to Intelligence Operations

DEFF Research Database (Denmark)

Memon, Nasrullah; Hicks, David; Harkiolakis, Nicholas

2008-01-01

agencies. An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain. Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist...... with the investigation and analysis of terrorist organizations. The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved....
Mining Significant Semantic Locations from GPS Data

DEFF Research Database (Denmark)

Cao, Xin; Cong, Gao; Jensen, Christian Søndergaard

2010-01-01

With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data. We present techniques capable...... of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between...
Mining significant semantic locations from GPS data

DEFF Research Database (Denmark)

Cao, Xin; Cong, Gao; Jensen, Christian S.

2010-01-01

With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data. We present techniques capable...... of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between...
Data mining a functional neuroimaging database for functional segregation in brain regions

DEFF Research Database (Denmark)

Nielsen, Finn Årup; Balslev, Daniela; Hansen, Lars Kai

2006-01-01

We describe a specialized neuroinformatic data mining technique in connection with a meta-analytic functional neuroimaging database: We mine for functional segregation within brain regions by identifying journal articles that report brain activations within the regions and clustering the abstract...
Data mining a functional neuroimaging database for functional|segregation in brain regions

DEFF Research Database (Denmark)

Nielsen, Finn Årup

2006-01-01

We describe a specialized neuroinformatic data mining technique in connection with a meta-analytic functional neuroimaging database: We mine for functional segregation within brain regions by identifying journal articles that report brain activations within the regions and clustering the abstract...
Tools for Educational Data Mining: A Review

Science.gov (United States)

Slater, Stefan; Joksimovic, Srecko; Kovanovic, Vitomir; Baker, Ryan S.; Gasevic, Dragan

2017-01-01

In recent years, a wide array of tools have emerged for the purposes of conducting educational data mining (EDM) and/or learning analytics (LA) research. In this article, we hope to highlight some of the most widely used, most accessible, and most powerful tools available for the researcher interested in conducting EDM/LA research. We will…
BAGEL2 : mining for bacteriocins in genomic data

NARCIS (Netherlands)

de Jong, Anne; van Heel, Auke J.; Kok, Jan; Kuipers, Oscar P.

Mining bacterial genomes for bacteriocins is a challenging task due to the substantial structure and sequence diversity, and generally small sizes, of these antimicrobial peptides. Major progress in the research of antimicrobial peptides and the ever-increasing quantities of genomic data, varying
Advances in learning analytics and educational data mining

NARCIS (Netherlands)

Vahdat, Mehrnoosh; Ghio, A; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

2015-01-01

The growing interest in recent years towards Learning An- alytics (LA) and Educational Data Mining (EDM) has enabled novel ap- proaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from
Archetypal analysis for machine learning and data mining

DEFF Research Database (Denmark)

Mørup, Morten; Hansen, Lars Kai

2012-01-01

of the observed data. We further demonstrate that the aa model is relevant for feature extraction and dimensionality reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, chemistry, text mining and collaborative filtering leading to highly interpretable...
Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage

Directory of Open Access Journals (Sweden)

Heather E. Driscoll

2017-08-01

Full Text Available Here we describe microarray expression data (raw and normalized, experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993, chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km2. We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875.
Case study: how to apply data mining techniques in a healthcare data warehouse.

Science.gov (United States)

Silver, M; Sakata, T; Su, H C; Herman, C; Dolins, S B; O'Shea, M J

2001-01-01

Healthcare provider organizations are faced with a rising number of financial pressures. Both administrators and physicians need help analyzing large numbers of clinical and financial data when making decisions. To assist them, Rush-Presbyterian-St. Luke's Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and perform a series of case study analyses. This article focuses on one analysis, which was performed by a team of physicians and computer science researchers, using a commercially available on-line analytical processing (OLAP) tool in conjunction with proprietary data mining techniques developed by HAL researchers. The initial objective of the analysis was to discover how to use data mining techniques to make business decisions that can influence cost, revenue, and operational efficiency while maintaining a high level of care. Another objective was to understand how to apply these techniques appropriately and to find a repeatable method for analyzing data and finding business insights. The process used to identify opportunities and effect changes is described.
DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM

Directory of Open Access Journals (Sweden)

FRANCISCA NONYELUM OGWUELEKA

2011-06-01

Full Text Available Data mining is popularly used to combat frauds because of its effectiveness. It is a well-defined procedure that takes data as input and produces models or patterns as output. Neural network, a data mining technique was used in this study. The design of the neural network (NN architecture for the credit card detection system was based on unsupervised method, which was applied to the transactions data to generate four clusters of low, high, risky and high-risk clusters. The self-organizing map neural network (SOMNN technique was used for solving the problem of carrying out optimal classification of each transaction into its associated group, since a prior output is unknown. The receiver-operating curve (ROC for credit card fraud (CCF detection watch detected over 95% of fraud cases without causing false alarms unlike other statistical models and the two-stage clusters. This shows that the performance of CCF detection watch is in agreement with other detection software, but performs better.
A Data Mining Approach to Reveal Representative Collaboration Indicators in Open Collaboration Frameworks

Science.gov (United States)

Anaya, Antonio R.; Boticario, Jesus G.

2009-01-01

Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…
An overview of data mining algorithms in drug induced toxicity prediction.

Science.gov (United States)

Omer, Ankur; Singh, Poonam; Yadav, N K; Singh, R K

2014-04-01

The growth in chemical diversity has increased the need to adjudicate the toxicity of different chemical compounds raising the burden on the demand of animal testing. The toxicity evaluation requires time consuming and expensive undertaking, leading to the deprivation of the methods employed for screening chemicals pointing towards the need to develop more efficient toxicity assessment systems. Computational approaches have reduced the time as well as the cost for evaluating the toxicity and kinetic behavior of any chemical. The accessibility of a large amount of data and the intense need of turning this data into useful information have attracted the attention towards data mining. Machine Learning, one of the powerful data mining techniques has evolved as the most effective and potent tool for exploring new insights on combinatorial relationships among various experimental data generated. The article accounts on some sophisticated machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machine (SVM), k-mean clustering and Self Organizing Maps (SOM) with some of the available tools used for classification, sorting and toxicological evaluation of data, clarifying, how data mining and machine learning interact cooperatively to facilitate knowledge discovery. Addressing the association of some commonly used expert systems, we briefly outline some real world applications to consider the crucial role of data set partitioning.
Data mining for ontology development.

Energy Technology Data Exchange (ETDEWEB)

Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

2010-06-01

A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.
An Integrative data mining approach to identifying Adverse ...

Science.gov (United States)

The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP
Data-Mining – A Valuable Managerial Tool for Improving Power Plants Efficiency

Directory of Open Access Journals (Sweden)

Danubianu Mirela

2014-05-01

Full Text Available Energy and environment are top priorities for the EU’s Europe 2020 Strategy. Both fields imply complex approaches and consistent investment. The paper presents an alternative to large investments to improve the efficiencies of existing (outdated power installations: namely the use of data-mining techniques for analysing existing operational data. Data-mining is based upon exhaustive analysis of operational records, inferring high-value information by simply processing records with advanced mathematical / statistical tools. Results can be: assessment of the consistency of measurements, identification of new hardware needed for improving the quality of data, deducing the most efficient level for operation (internal benchmarking, correlation of consumptions with power/ heat production, of technical parameters with environmental impact, scheduling the optimal maintenance time, fuel stock optimization, simulating scenarios for equipment operation, anticipating periods of maximal stress of equipment, identification of medium and long term trends, planning and decision support for new investment, etc. The paper presents a data mining process carried out at the TERMICA - Suceava power plant. The analysis calls for a multidisciplinary approach, a complex team (experts in power&heat production, mechanics, environmental protection, economists, and last but not least IT experts and can be carried out with lower expenses than an investment in new equipment. Involvement of top management of the company is essential, being the driving force and motivation source for the data-mining team. The approach presented is self learning as once established, the data-mining analytical, modelling and simulation procedures and associated parameter databases can adjust themselves by absorbing and processing new relevant information and can be used on a long term basis for monitoring the performance of the installation, certifying the soundness of managerial measures taken
Applying Data-mining techniques to study drought periods in Spain

Science.gov (United States)

Belda, F.; Penades, M. C.

2010-09-01

Data-mining is a technique that it can be used to interact with large databases and to help in the discovery relations between parameters by extracting information from massive and multiple data archives. Drought affects many economic and social sectors, from agricultural to transportation, going through urban water deficit and the development of modern industries. With these problems and drought geographical and temporal distribution it's difficult to find a single definition of drought. Improving the understanding of the knowledge of climatic index is necessary to reduce the impacts of drought and to facilitate quick decisions regarding this problem. The main objective is to analyze drought periods from 1950 to 2009 in Spain. We use several kinds of information, different formats, sources and transmission mode. We use satellite-based Vegetation Index, dryness index for several temporal periods. We use daily and monthly precipitation and temperature data and soil moisture data from numerical weather model. We calculate mainly Standardized Precipitation Index (SPI) that it has been used amply in the bibliography. We use OLAP-Mining techniques to discovery of association rules between remote-sensing, numerical weather model and climatic index. Time series Data- Mining techniques organize data as a sequence of events, with each event having a time of recurrence, to cluster the data into groups of records or cluster with similar characteristics. Prior climatological classification is necessary if we want to study drought periods over all Spain.
Educational Data Mining Applications and Tasks: A Survey of the Last 10 Years

Science.gov (United States)

Bakhshinategh, Behdad; Zaiane, Osmar R.; ElAtia, Samira; Ipperciel, Donald

2018-01-01

Educational Data Mining (EDM) is the field of using data mining techniques in educational environments. There exist various methods and applications in EDM which can follow both applied research objectives such as improving and enhancing learning quality, as well as pure research objectives, which tend to improve our understanding of the learning…
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

Directory of Open Access Journals (Sweden)

Wencheng Sun

2018-01-01

Full Text Available Currently, medical institutes generally use EMR to record patient’s condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition and RE (relation extraction. This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.

Contract Mining versus Owner Mining

African Journals Online (AJOL)

Owner

mining companies can concentrate on their core businesses while using specialists for ... 2 Definition of Contract and Owner. Mining ... equipment maintenance, scheduling and budgeting ..... No. Region. Amount Spent on. Contract Mining. ($ billion). Percent of. Total. 1 ... cost and productivity data based on a large range.
DATA MINING IN EDUCATION: CURRENT STATE AND PERSPECTIVES OF DEVELOPMENT

Directory of Open Access Journals (Sweden)

Yurii O. Kovalchuk

2016-01-01

Full Text Available The main tasks (classification and regression, association rules, clustering and the basic principles of the Data Mining algorithms in the context of their use for a variety of research in the field of education which are the subject of a relatively new independent direction Educational Data Mining are considered. The findings about the most popular topics of research within this area as well as the perspectives of its development are presented. Presentation of the material is illustrated by simple examples. This article is intended for readers who are engaged in research in the field of education at various levels, especially those involved in the use of e-learning systems, but little familiar with this area of data analysis.
Using Advanced Data Mining And Integration In Environmental Prediction Scenarios

Directory of Open Access Journals (Sweden)

Habala Ondrej

2012-01-01

Full Text Available We present one of the meteorological and hydrological experiments performed in the FP7 project ADMIRE. It serves as an experimental platform for hydrologists, and we have used it also as a testing platform for a suite of advanced data integration and data mining (DMI tools, developed within ADMIRE. The idea of ADMIRE is to develop an advanced DMI platform accessible even to users who are not familiar with data mining techniques. To this end, we have designed a novel DMI architecture, supported by a set of software tools, managed by DMI process descriptions written in a specialized high-level DMI language called DISPEL, and controlled via several different user interfaces, each performing a different set of tasks and targeting different user group.
An Overview on Data Mining of Nighttime Light Remote Sensing

Directory of Open Access Journals (Sweden)

LI Deren

2015-06-01

Full Text Available When observing the Earth from above at night, it is clear that the human settlement and major economic regions emit glorious light. At cloud-free nights, some remote sensing satellites can record visible radiance source, including city light, fishing boat light and fire, and these nighttime cloud-free images are remotely sensed nighttime light images. Different from daytime remote sensing, nighttime light remote sensing provides a unique perspective on human social activities, thus it has been widely used for spatial data mining of socioeconomic domains. Historically, researches on nighttime light remote sensing mostly focus on urban land cover and urban expansion mapping using DMSP/OLS imagery, but the nighttime light images are not the unique remote sensing source to do these works. Through decades of development of nighttime light product, the nighttime light remote sensing application has been extended to numerous interesting and scientific study domains such as econometrics, poverty estimation, light pollution, fishery and armed conflict. Among the application cases, it is surprising to see the Gross Domestic Production (GDP data can be corrected using the nighttime light data, and it is interesting to see mechanism of several diseases can be revealed by nighttime light images, while nighttime light are the unique remote sensing source to do the above works. As the nighttime light remote sensing has numerous applications, it is important to summarize the application of nighttime light remote sensing and its data mining fields. This paper introduced major satellite platform and sensors for observing nighttime light at first. Consequently, the paper summarized the progress of nighttime light remote sensing data mining in socioeconomic parameter estimation, urbanization monitoring, important event evaluation, environmental and healthy effects, fishery dynamic mapping, epidemiological research and natural gas flaring monitoring. Finally, future
Usage reporting on recorded lectures using educational data mining

NARCIS (Netherlands)

Gorissen, Pierre; Van Bruggen, Jan; Jochems, Wim

2012-01-01

Gorissen, P., Van Bruggen, J., & Jochems, W. M. G. (2012). Usage reporting on recorded lectures using educational data mining. International Journal of Learning Technology, 7, 23-40. doi:10.1504/IJLT.2012.046864
Evaluation of Documentation Patterns of Trainees and Supervising Physicians Using Data Mining.

Science.gov (United States)

Madhavan, Ramesh; Tang, Chi; Bhattacharya, Pratik; Delly, Fadi; Basha, Maysaa M

2014-09-01

The electronic health record (EHR) includes a rich data set that may offer opportunities for data mining and natural language processing to answer questions about quality of care, key aspects of resident education, or attributes of the residents' learning environment. We used data obtained from the EHR to report on inpatient documentation practices of residents and attending physicians at a large academic medical center. We conducted a retrospective observational study of deidentified patient notes entered over 7 consecutive months by a multispecialty university physician group at an urban hospital. A novel automated data mining technology was used to extract patient note-related variables. A sample of 26 802 consecutive patient notes was analyzed using the data mining and modeling tool Healthcare Smartgrid. Residents entered most of the notes (33%, 8178 of 24 787) between noon and 4 pm and 31% (7718 of 24 787) of notes between 8 am and noon. Attending physicians placed notes about teaching attestations within 24 hours in only 73% (17 843 of 24 443) of the records. Surgical residents were more likely to place notes before noon (P Data related to patient note entry was successfully used to objectively measure current work flow of resident physicians and their supervising faculty, and the findings have implications for physician oversight of residents' clinical work. We were able to demonstrate the utility of a data mining model as an assessment tool in graduate medical education.
Profiling Oman education data using data mining approach

Science.gov (United States)

Alawi, Sultan Juma Sultan; Shaharanee, Izwan Nizal Mohd; Jamil, Jastini Mohd

2017-10-01

Nowadays, with a large amount of data generated by many application services in different learning fields has led to the new challenges in education field. Education portal is an important system that leads to a better development of education field. This research paper presents an innovative data mining techniques to understand and summarizes the information of Oman's education data generated from the Ministry of Education Oman "Educational Portal". This research embarks into performing student profiling of the Oman student database. This study utilized the k-means clustering technique to determine the students' profiles. An amount of 42484-student records from Sultanate of Oman has been extracted for this study. The findings of this study show the practicality of clustering technique to investigating student's profiles. Allowing for a better understanding of student's behavior and their academic performance. Oman Education Portal contain a large amounts of user activity and interaction data. Analyses of this large data can be meaningful for educator to improve the student performance level and recognize students who needed additional attention.
artery disease guidelines with extracted knowledge from data mining

Directory of Open Access Journals (Sweden)

Peyman Rezaei-Hachesu

2017-06-01

Conclusion: Guidelines confirm the achieved results from data mining (DM techniques and help to rank important risk factors based on national and local information. Evaluation of extracted rules determined new patterns for CAD patients.
Data Mining Methods to Generate Severe Wind Gust Models

Directory of Open Access Journals (Sweden)

Subana Shanmuganathan

2014-01-01

Full Text Available Gaining knowledge on weather patterns, trends and the influence of their extremes on various crop production yields and quality continues to be a quest by scientists, agriculturists, and managers. Precise and timely information aids decision-making, which is widely accepted as intrinsically necessary for increased production and improved quality. Studies in this research domain, especially those related to data mining and interpretation are being carried out by the authors and their colleagues. Some of this work that relates to data definition, description, analysis, and modelling is described in this paper. This includes studies that have evaluated extreme dry/wet weather events against reported yield at different scales in general. They indicate the effects of weather extremes such as prolonged high temperatures, heavy rainfall, and severe wind gusts. Occurrences of these events are among the main weather extremes that impact on many crops worldwide. Wind gusts are difficult to anticipate due to their rapid manifestation and yet can have catastrophic effects on crops and buildings. This paper examines the use of data mining methods to reveal patterns in the weather conditions, such as time of the day, month of the year, wind direction, speed, and severity using a data set from a single location. Case study data is used to provide examples of how the methods used can elicit meaningful information and depict it in a fashion usable for management decision making. Historical weather data acquired between 2008 and 2012 has been used for this study from telemetry devices installed in a vineyard in the north of New Zealand. The results show that using data mining techniques and the local weather conditions, such as relative pressure, temperature, wind direction and speed recorded at irregular intervals, can produce new knowledge relating to wind gust patterns for vineyard management decision making.
Multimedia data mining and analytics disruptive innovation

CERN Document Server

Baughman, Aaron; Pan, Jia-Yu; Petrushin, Valery A

2015-01-01

This authoritative text/reference provides fresh insights into the cutting edge of multimedia data mining, reflecting how the research focus has shifted towards networked social communities, mobile devices and sensors. Presenting a detailed exploration into the progression of the field, the book describes how the history of multimedia data processing can be viewed as a sequence of disruptive innovations. Across the chapters, the discussion covers the practical frameworks, libraries, and open source software that enable the development of ground-breaking research into practical applications.
Problem Areas in Data Warehousing and Data Mining in a Surgical Clinic

Science.gov (United States)

Tusch, Guenter; Mueller, Margarete; Rohwer-Mensching, Katrin; Heiringhoff, Karlheinz; Klempnauer, Juergen

2001-01-01

Hospitals and clinics have taken advantage of information systems to streamline many clinical and administrative processes. However, the potential of health care information technology as a source of data for clinical and administrative decision support has not been fully explored. In response to pressure for timely information, many hospitals are developing clinical data warehouses. This paper attempts to identify problem areas in the process of developing a data warehouse to support data mining in surgery. Based on the experience from a data warehouse in surgery several solutions are discussed.
Using multi-relational data mining to discriminate blended therapy efficiency on patients based on log data

Directory of Open Access Journals (Sweden)

Artur Rocha

2018-06-01

Full Text Available Introduction: Clinical trials of blended Internet-based treatments deliver a wealth of data from various sources, such as self-report questionnaires, diagnostic interviews, treatment platform log files and Ecological Momentary Assessments (EMA. Mining these complex data for clinically relevant patterns is a daunting task for which no definitive best method exists. In this paper, we explore the expressive power of the multi-relational Inductive Logic Programming (ILP data mining approach, using combined trial data of the EU E-COMPARED depression trial. Methods: We explored the capability of ILP to handle and combine (implicit multiple relationships in the E-COMPARED data. This data set has the following features that favor ILP analysis: 1 Time reasoning is involved; 2 there is a reasonable amount of explicit useful relations to be analyzed; 3 ILP is capable of building comprehensible models that might be perceived as putative explanations by domain experts; 4 both numerical and statistical models may coexist within ILP models if necessary. In our analyses, we focused on scores of the PHQ-8 self-report questionnaire (which taps depressive symptom severity, and on EMA of mood and various other clinically relevant factors. Both measures were administered during treatment, which lasted between 9 to 16 weeks. Results: E-COMPARED trial data revealed different individual improvement patterns: PHQ-8 scores suggested that some individuals improved quickly during the first weeks of the treatment, while others improved at a (much slower pace, or not at all. Combining self-reported Ecological Momentary Assessments (EMA, PHQ-8 scores and log data about the usage of the ICT4D platform in the context of blended care, we set out to unveil possible causes for these different trajectories. Discussion: This work complements other studies into alternative data mining approaches to E-COMPARED trial data analysis, which are all aimed to identify clinically
Pattern recognition algorithms for data mining scalability, knowledge discovery and soft granular computing

CERN Document Server

Pal, Sankar K

2004-01-01

Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.
A Data Mining and Survey Study on Diseases Associated with Paraesophageal Hernia

OpenAIRE

Yang, Jianji; Logan, Judith

2006-01-01

Paraesophageal hernia is a severe form of hiatal hernia, characterized by the upward dislocation of the gastric fundus into the thoracic cavity. In this study, the 1999 National Inpatient Sample dataset of the Healthcare Cost and Utilization Project was analyzed using data mining techniques to explore disorders associated with paraesophageal hernia. The result of this data mining process was compared with a subsequent expert knowledge survey of 97 gastrointestinal tract surgeons. This two-ste...
Dengue fatality prediction using data mining | Rahim | Journal of ...

African Journals Online (AJOL)

The aim of this research is to study the current implementation of dengue outbreak control in Malaysia and predict dengue fever cases using data mining techniques. Real data on dengue fever and weather are collected from the Ministry of Health in its Perak Tengah district office and Perak Meteorological office respectively ...
Process cubes : slicing, dicing, rolling up and drilling down event data for process mining

NARCIS (Netherlands)

Aalst, van der W.M.P.

2013-01-01

Recent breakthroughs in process mining research make it possible to discover, analyze, and improve business processes based on event data. The growth of event data provides many opportunities but also imposes new challenges. Process mining is typically done for an isolated well-defined process in
DATA MINING IN SPORTS BETTING

Directory of Open Access Journals (Sweden)

Cristian Georgescu

2013-12-01

Full Text Available n this paper, we have made a brief analysis on how to make decisions in betting on European football with the help of data mining techniques. Whether you refer to betting a few days in advance of the sporting event or live betting, both options have been taken into consideration. By using a clustering algorithm for analyzing both the database containing events from football matches and the odds given by bookmakers, we have obtained graphs indicating the probabilities associated with analyzed events. Given the purely informative aspect of the current paper, we have only analyzed the number of corners from a match.
Managing Multiuser Database Buffers Using Data Mining Techniques

NARCIS (Netherlands)

Feng, L.; Lu, H.J.

2004-01-01

In this paper, we propose a data-mining-based approach to public buffer management for a multiuser database system, where database buffers are organized into two areas – public and private. While the private buffer areas contain pages to be updated by particular users, the public
Mine drivage in hydraulic mines

Energy Technology Data Exchange (ETDEWEB)

Ehkber, B Ya

1983-09-01

From 20 to 25% of labor cost in hydraulic coal mines falls on mine drivage. Range of mine drivage is high due to the large number of shortwalls mined by hydraulic monitors. Reducing mining cost in hydraulic mines depends on lowering drivage cost by use of new drivage systems or by increasing efficiency of drivage systems used at present. The following drivage methods used in hydraulic mines are compared: heading machines with hydraulic haulage of cut rocks and coal, hydraulic monitors with hydraulic haulage, drilling and blasting with hydraulic haulage of blasted rocks. Mining and geologic conditions which influence selection of the optimum mine drivage system are analyzed. Standardized cross sections of mine roadways driven by the 3 methods are shown in schemes. Support systems used in mine roadways are compared: timber supports, roof bolts, roof bolts with steel elements, and roadways driven in rocks without a support system. Heading machines (K-56MG, GPKG, 4PU, PK-3M) and hydraulic monitors (GMDTs-3M, 12GD-2) used for mine drivage are described. Data on mine drivage in hydraulic coal mines in the Kuzbass are discussed. From 40 to 46% of roadways are driven by heading machines with hydraulic haulage and from 12 to 15% by hydraulic monitors with hydraulic haulage.
Clustering-based approaches to SAGE data mining

Directory of Open Access Journals (Sweden)

Wang Haiying

2008-07-01

Full Text Available Abstract Serial analysis of gene expression (SAGE is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation.

Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

Science.gov (United States)

Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

2018-05-01

The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.
Gaining Insights on Nasopharyngeal Carcinoma Treatment Outcome Using Clinical Data Mining Techniques.

Science.gov (United States)

Ghaibeh, A Ammar; Kasem, Asem; Ng, Xun Jin; Nair, Hema Latha Krishna; Hirose, Jun; Thiruchelvam, Vinesh

2018-01-01

The analysis of Electronic Health Records (EHRs) is attracting a lot of research attention in the medical informatics domain. Hospitals and medical institutes started to use data mining techniques to gain new insights from the massive amounts of data that can be made available through EHRs. Researchers in the medical field have often used descriptive statistics and classical statistical methods to prove assumed medical hypotheses. However, discovering new insights from large amounts of data solely based on experts' observations is difficult. Using data mining techniques and visualizations, practitioners can find hidden knowledge, identify interesting patterns, or formulate new hypotheses to be further investigated. This paper describes a work in progress on using data mining methods to analyze clinical data of Nasopharyngeal Carcinoma (NPC) cancer patients. NPC is the fifth most common cancer among Malaysians, and the data analyzed in this study was collected from three states in Malaysia (Kuala Lumpur, Sabah and Sarawak), and is considered to be the largest up-to-date dataset of its kind. This research is addressing the issue of cancer recurrence after the completion of radiotherapy and chemotherapy treatment. We describe the procedure, problems, and insights gained during the process.
USING ADVANCED DATA MINING AND INTEGRATION IN ENVIRONMENTAL PREDICTION SCENARIOS

Directory of Open Access Journals (Sweden)

Ondrej Habala

2012-01-01

Full Text Available We present one of the meteorological and hydrological experiments performed inthe FP7 project ADMIRE. It serves as an experimental platform for hydrologists,and we have used it also as a testing platform for a suite of advanced dataintegration and data mining (DMI tools, developed within ADMIRE. The ideaof ADMIRE is to develop an advanced DMI platform accessible even to userswho are not familiar with data mining techniques. To this end, we have designeda novel DMI architecture, supported by a set of software tools, managed by DMIprocess descriptions written in a specialized high-level DMI language calledDISPEL, and controlled via several different user interfaces, each performinga different set of tasks and targeting different user group.
Study of the Korean anthracite for utilization and the coal mine data management

Energy Technology Data Exchange (ETDEWEB)

NONE

1995-12-01

This report consists of two articles. (1) Petrographic study of the Korean anthracite for utilization (5): This research was initiated for the development of filtering materials those can be used in waste water treatment sites The small scale of filtration tester was built on the waste water treatment site of Chungjoo electric Co. to use waste water processed by purifying system for the feasibility study. (2) Study of the closed coal mine data management: Underground maps about 1700 adits of 100 coal mines, and related graphic data have been collected in the database. And all those data were entered into the database in vectorial form, coordinates obtaining from the digitizing tablet. Detailed works are described in the other report, including the discussions of graphic database and data handling of graphical mine data. Comments about the GIS is also provided in the volume. (author). 25 refs., 45 figs., 50 tabs., 3 maps.
Redo log process mining in real life : data challenges & opportunities

NARCIS (Netherlands)

González López de Murillas, E.; Hoogendoorn, G.E.; Reijers, H.A.; Teniente, E.; Weidlich, M.

2018-01-01

Data extraction and preparation are the most time-consuming phases of any process mining project. Due to the variability on the sources of event data, it remains a highly manual process in most of the cases. Moreover, it is very difficult to obtain reliable event data in enterprise systems that are
DISEASES: text mining and data integration of disease-gene associations.

Science.gov (United States)

Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

2015-03-01

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Data Mining for Imbalanced Datasets: An Overview

Science.gov (United States)

Chawla, Nitesh V.

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.
Perancangan Data Mining Untuk Analisis Kriteria Nasabah Kredit Yang Potensial Dan Manfaatnya Untuk Customer Relationship Management Perbankan

OpenAIRE

Kurniawan, Putu Sukma

2015-01-01

The presence of data mining problems caused by the explosion of data experienced by many organizations that have accumulated so many years of data (purchasing data, sales data, customer data, transaction data, and others). Examples of industries that use data mining is the banking industry. There are still many banks using conventional methods in the analysis of their customers. This would lead to high operating costs for the bank. The concept of data mining can help banks to get a better ana...
A Framework for Investigating Influence of Organizational Decision Makers on Data Mining Process Achievement

OpenAIRE

Hanieh Hajisafari; Shaaban Elahi

2012-01-01

Currently, few studies deal with evaluation of data mining plans in context of solvng organizational problems. A successful data miner is searching to solve a fully defined business problem. To make the data mining (DM) results actionable, the data miner must explain them to the business insider. The interaction process between the business insiders and data miners is actually a knowledge-sharing process. In this study through representing a framwork, influence of organizational decision mak...
Ensemble Data Mining Methods

Science.gov (United States)

Oza, Nikunj C.

2004-01-01

Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
Data Mining Approaches for Landslide Susceptibility Mapping in Umyeonsan, Seoul, South Korea

Directory of Open Access Journals (Sweden)

Sunmin Lee

2017-07-01

Full Text Available The application of data mining models has become increasingly popular in recent years in assessments of a variety of natural hazards such as landslides and floods. Data mining techniques are useful for understanding the relationships between events and their influencing variables. Because landslides are influenced by a combination of factors including geomorphological and meteorological factors, data mining techniques are helpful in elucidating the mechanisms by which these complex factors affect landslide events. In this study, spatial data mining approaches based on data on landslide locations in the geographic information system environment were investigated. The topographical factors of slope, aspect, curvature, topographic wetness index, stream power index, slope length factor, standardized height, valley depth, and downslope distance gradient were determined using topographical maps. Additional soil and forest variables using information obtained from national soil and forest maps were also investigated. A total of 17 variables affecting the frequency of landslide occurrence were selected to construct a spatial database, and support vector machine (SVM and artificial neural network (ANN models were applied to predict landslide susceptibility from the selected factors. In the SVM model, linear, polynomial, radial base function, and sigmoid kernels were applied in sequence; the model yielded 72.41%, 72.83%, 77.17% and 72.79% accuracy, respectively. The ANN model yielded a validity accuracy of 78.41%. The results of this study are useful in guiding effective strategies for the prevention and management of landslides in urban areas.
Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis.

Science.gov (United States)

Sigurdardottir, Arun K; Jonsdottir, Helga; Benediktsson, Rafn

2007-07-01

To analyze which factors contribute to improvement in glycemic control in educational interventions in type 2 diabetes reported in randomized controlled trials (RCT) published in 2001-2005. Papers were extracted from Medline and Scopus using educational intervention and adults with type 2 diabetes as keywords. Inclusion criteria were RCT design. Data were analyzed with a data-mining program. Of 464 titles extracted, 21 articles reporting 18 studies met the inclusion criteria. Data mining showed that for initial glycosylated hemoglobin (HbA1c) level education intervention achieved a small change in HbA1c level, or from +0.1 to -0.7%. For initial HbA1c > or = 8.0%, a significant drop in HbA1c level of 0.8-2.5% was found. Data mining indicated that duration, educational content and intensity of education did not predict changes in HbA1c levels. Initial HbA1c level is the single most important factor affecting improvements in glycemic control in response to patient education. Data mining is an appropriate and sufficiently sensitive method to analyze outcomes of educational interventions. Diversity in conceptualization of interventions and diversity of instruments used for outcome measurements could have hampered actual discovery of effective educational practices. Participation in educational interventions generally seems to benefit people with type 2 diabetes. Use of standardized instruments is encouraged as it gives better opportunities to identify conclusive results with consequent development of clinical guidelines.
Model Validation and Verification of Data Mining from the ...

African Journals Online (AJOL)

Michael Horsfall

In this paper, we seek to present a hybrid method for Model Validation and Verification of Data Mining from the ... This model generally states the numerical value of knowledge .... procedures found in the field of software engineering should be ...
Data mining usage in health care management: literature survey and decision tree application

Directory of Open Access Journals (Sweden)

Dijana Ćosić

2008-02-01

Full Text Available Aim To show the benefits of data mining in health care management.In this example, we are going to show a way to raise awarenessof women in terms of contraceptive methods they use (do notuse.Methods Goal of the data mining analysis was to determine ifthere are common characteristics of the women according to theirchoice of contraception (typical classification problem. Therefore,we decided to use decision trees. We have generated a CHAIDmodel in “Statistica”, based on the database that was formed as aresult of an Indonesian research that was conducted in 1987. Thesample contains married women who were either not pregnant ordid not know if they were pregnant at the time of the interview.The database consists of 1473 cases. Also, an extensive internetsearch was conducted in order to detect a number of articles citedin scientific databases published on the subject of data mining inhealth care management.Results It has shown that the most important variable in case ofwomen’s choice of contraceptive methods is – a husband’s profession.Also we retrieved 221 articles published on the application ofdata mining in health care.Conclusion The goal of the paper is achieved in two ways: first,retrieving 221 articles published on the subject we have proved thebenefits of data mining in the health care management. Second,the decision tree method is successfully applied in explanation ofwomen’s choice of contraceptive methods.
Data Mining for Understanding and Impriving Decision-Making Affecting Ground Delay Programs

Science.gov (United States)

Kulkarni, Deepak; Wang, Yao Xun; Sridhar, Banavar

2013-01-01

The continuous growth in the demand for air transportation results in an imbalance between airspace capacity and traffic demand. The airspace capacity of a region depends on the ability of the system to maintain safe separation between aircraft in the region. In addition to growing demand, the airspace capacity is severely limited by convective weather. During such conditions, traffic managers at the FAA's Air Traffic Control System Command Center (ATCSCC) and dispatchers at various Airlines' Operations Center (AOC) collaborate to mitigate the demand-capacity imbalance caused by weather. The end result is the implementation of a set of Traffic Flow Management (TFM) initiatives such as ground delay programs, reroute advisories, flow metering, and ground stops. Data Mining is the automated process of analyzing large sets of data and then extracting patterns in the data. Data mining tools are capable of predicting behaviors and future trends, allowing an organization to benefit from past experience in making knowledge-driven decisions. The work reported in this paper is focused on ground delay programs. Data mining algorithms have the potential to develop associations between weather patterns and the corresponding ground delay program responses. If successful, they can be used to improve and standardize TFM decision resulting in better predictability of traffic flows on days with reliable weather forecasts. The approach here seeks to develop a set of data mining and machine learning models and apply them to historical archives of weather observations and forecasts and TFM initiatives to determine the extent to which the theory can predict and explain the observed traffic flow behaviors.
Data Mining Foundations and Intelligent Paradigms Volume 2 Statistical, Bayesian, Time Series and other Theoretical Aspects

CERN Document Server

Jain, Lakhmi

2012-01-01

Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 2 of this three volume series, we have brought together contributions from some of the most prestigious researchers in theoretical data mining. Each of the chapters is self contained. Statisticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in data mining.
Proactive data mining with decision trees

CERN Document Server

Dahan, Haim; Rokach, Lior; Maimon, Oded

2014-01-01

This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite
Patent data mining method and apparatus

Science.gov (United States)

Boyack, Kevin W.; Grafe, V. Gerald; Johnson, David K.; Wylie, Brian N.

2002-01-01

A method of data mining represents related patents in a multidimensional space. Distance between patents in the multidimensional space corresponds to the extent of relationship between the patents. The relationship between pairings of patents can be expressed based on weighted combinations of several predicates. The user can select portions of the space to perceive. The user also can interact with and control the communication of the space, focusing attention on aspects of the space of most interest. The multidimensional spatial representation allows more ready comprehension of the structure of the relationships among the patents.
Data Mining Methods for Recommender Systems

Science.gov (United States)

Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.
Perancangan Data Mining Untuk Analisis Kriteria Nasabah Kredit yang Potensial dan Manfaatnya Untuk Customer Relationship Management Perbankan

Directory of Open Access Journals (Sweden)

Putu Sukma Kurniawan

2016-03-01

Full Text Available The presence of data mining problems caused by the explosion of data experienced by many organizations that have accumulated so many years of data (purchasing data, sales data, customer data, transaction data, and others. Examples of industries that use data mining is the banking industry. There are still many banks using conventional methods in the analysis of their customers. This would lead to high operating costs for the bank. The concept of data mining can help banks to get a better analysis of their customers and also help in making the concept of customer relationship management. Data mining can help bank to create profiling customer. Results or final output obtained if the bank can execute customer relationship management is increasing customer loyalty to the bank, increasing profitability, and reducing customer acquisition costs.

Recommendation in Higher Education Using Data Mining Techniques

Science.gov (United States)

Vialardi, Cesar; Bravo, Javier; Shafti, Leila; Ortigosa, Alvaro

2009-01-01

One of the main problems faced by university students is to take the right decision in relation to their academic itinerary based on available information (for example courses, schedules, sections, classrooms and professors). In this context, this work proposes the use of a recommendation system based on data mining techniques to help students to…
Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

Science.gov (United States)

Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

2016-01-01

Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.
Exploring the potential of data mining techniques for the analysis of accident patterns

DEFF Research Database (Denmark)

Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

2010-01-01

Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...
Detection of abandoned mines/caves using airborne LWIR hyperspectral data

Science.gov (United States)

Shen, Sylvia S.; Roettiger, Kurt A.

2012-09-01

The detection of underground structures, both natural and man-made, continues to be an important requirement in both the military/intelligence and civil communities. There are estimates that as many as 70,000 abandoned mines/caves exist across the nation. These mines represent significant hazards to public health and safety, and they are of concern to Government agencies at the local, state, and federal levels. NASA is interested in the detection of caves on Mars and the Moon in anticipation of future manned space missions. And, the military/ intelligence community is interested in detecting caves, mines, and other underground structures that may be used to conceal the production of weapons of mass destruction or to harbor insurgents or other persons of interest by the terrorists. Locating these mines/caves scattered over millions of square miles is an enormous task, and limited resources necessitate the development of an efficient and effective broad area search strategy using remote sensing technologies. This paper describes an internally-funded research project of The Aerospace Corporation (Aerospace) to assess the feasibility of using airborne hyperspectral data to detect abandoned cave/mine entrances in a broad-area search application. In this research, we have demonstrated the potential utility of using thermal contrast between the cave/mine entrance and the ambient environment as a discriminatory signature. We have also demonstrated the use of a water vapor absorption line at12.55 μm and a quartz absorption feature at 9.25 μm as discriminatory signatures. Further work is required to assess the broader applicability of these signatures.
Energy Efficient in-Sensor Data Cleaning for Mining Frequent Itemsets

Directory of Open Access Journals (Sweden)

Jacques M. BAHI

2012-03-01

Full Text Available Limited energy, storage, computational power represent the main constraint of sensor networks. Development of algorithms that take into consideration this extremely demanding and constrained environment of sensor networks became a major challenge. Communicating messages over a sensor network consume far more energy than processing it and mining sensors data should respect the characteristics of sensor networks in terms of energy and computation constraints, network dynamics, and faults. This lead us to think of a data cleaning pre processing phase to reduce the packet size transmitted and prepare the data for an efficient and scalable data mining. This paper introduces a tree-based bi-level periodic data cleaning approach implemented on both the source node and the aggregator levels. Our contribution in this paper is two folds. First we look on a periodic basis at each data measured and periodically clean it while taking into consideration the number of occurrences of the measures captured which we shall call weight. Then, a data cleaning is performed between groups of nodes on the level of the aggregator, which contains lists of measures along with their weights. The quality of the information should be preserved during the in-network transmission through the weight of each measure captured by the sensors. This weight will constitute the key optimization of the frequent pattern tree. The result set will constitute a perfect training set to mine without higher CPU consumption allowing us to send only the useful information to the sink. The experimental results show the effectiveness of this technique in terms of energy efficiency and quality of the information by focusing on a periodical data cleaning while taking into consideration the weight of the data captured.
Process mining on databases: Unearthing historical data from redo logs

NARCIS (Netherlands)

González-López de Murillas, E.; van der Aalst, W.M.P.; Reijers, H.A.

2015-01-01

Process Mining techniques rely on the existence of event data. However, in many cases it is far from trivial to obtain such event data. Considerable efforts may need to be spent on making IT systems record historic data at all. But even if such records are available, it may not be possible to derive
Educational data mining: a sample of review and study case

Directory of Open Access Journals (Sweden)

Alejandro Pena, Rafael Domínguez, Jose de Jesus Medel

2009-12-01

Full Text Available The aim of this work is to encourage the research in a novel merged field: Educational data mining (EDM. Thereby, twosubjects are outlined: The first one corresponds to a review of data mining (DM methods and EDM applications. Thesecond topic represents an EDM study case. As a result of the application of DM in Web-based Education Systems (WBES,stratified groups of students were found during a trial. Such groups reveal key attributes of volunteers that deserted orremained during a WBES experiment. This kind of discovered knowledge inspires the statement of correlational hypothesisto set relations between attributes and behavioral patterns of WBES users. We concluded that: When EDM findings aretaken into account for designing and managing WBES, the learning objectives are improved
Feature extraction for classification in the data mining process

NARCIS (Netherlands)

Pechenizkiy, M.; Puuronen, S.; Tsymbal, A.

2003-01-01

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". Three different eigenvector-based feature extraction approaches
A framework for query optimization to support data mining

NARCIS (Netherlands)

S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

1996-01-01

textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization
Visualizing data mining results with the Brede tools

DEFF Research Database (Denmark)

Nielsen, Finn Årup

2009-01-01

has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web......A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them --- the BrainMap database. Since then the Brede Toolbox...
Opinion data mining based on DNA method and ORA software

Science.gov (United States)

Tian, Ru-Ya; Wu, Lei; Liang, Xiao-He; Zhang, Xue-Fu

2018-01-01

Public opinion, especially the online public opinion is a critical issue when it comes to mining its characteristics. Because it can be formed directly and intensely in a short time, and may lead to the outbreak of online group events, and the formation of online public opinion crisis. This may become the pushing hand of a public crisis event, or even have negative social impacts, which brings great challenges to the government management. Data from the mass media which reveal implicit, previously unknown, and potentially valuable information, can effectively help us to understand the evolution law of public opinion, and provide a useful reference for rumor intervention. Based on the Dynamic Network Analysis method, this paper uses ORA software to mine characteristics of public opinion information, opinion topics, and public opinion agents through a series of indicators, and quantitatively analyzed the relationships between them. The results show that through the analysis of the 8 indexes associating with opinion data mining, we can have a basic understanding of the public opinion characteristics of an opinion event, such as who is important in the opinion spreading process, the information grasping condition, and the opinion topics release situation.
Combined data mining/NIR spectroscopy for purity assessment of lime juice

Science.gov (United States)

Shafiee, Sahameh; Minaei, Saeid

2018-06-01

This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
IBM SPSS modeler essentials effective techniques for building powerful data mining and predictive analytics solutions

CERN Document Server

McCormick, Keith; Wei, Bowen

2017-01-01

IBM SPSS Modeler allows quick, efficient predictive analytics and insight building from your data, and is a popularly used data mining tool. This book will guide you through the data mining process, and presents relevant statistical methods which are used to build predictive models and conduct other analytic tasks using IBM SPSS Modeler. From ...
Rule-based statistical data mining agents for an e-commerce application

Science.gov (United States)

Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar

2003-03-01

Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.
Data mining application in customer relationship management for hospital inpatients.

Science.gov (United States)

Lee, Eun Whan

2012-09-01

This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM.
Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

Directory of Open Access Journals (Sweden)

Lahiru Iddamalgoda

2016-08-01

Full Text Available Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification and scoring based prioritization methods for determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI methods in conjunction with the K nearest neighbors’ could be used in accurately categorizing the genetic factors in disease causation
Prediction of pork quality parameters by applying fractals and data mining on MRI

DEFF Research Database (Denmark)

Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés

2017-01-01

This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One...... Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear...... regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate...
Web Mining

Science.gov (United States)

Fürnkranz, Johannes

The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.
Application of data mining in three-dimensional space time reactor model

International Nuclear Information System (INIS)

Jiang Botao; Zhao Fuyu

2011-01-01

A high-fidelity three-dimensional space time nodal method has been developed to simulate the dynamics of the reactor core for real time simulation. This three-dimensional reactor core mathematical model can be composed of six sub-models, neutron kinetics model, cay heat model, fuel conduction model, thermal hydraulics model, lower plenum model, and core flow distribution model. During simulation of each sub-model some operation data will be produced and lots of valuable, important information reflecting the reactor core operation status could be hidden in, so how to discovery these information becomes the primary mission people concern. Under this background, data mining (DM) is just created and developed to solve this problem, no matter what engineering aspects or business fields. Generally speaking, data mining is a process of finding some useful and interested information from huge data pool. Support Vector Machine (SVM) is a new technique of data mining appeared in recent years, and SVR is a transformed method of SVM which is applied in regression cases. This paper presents only two significant sub-models of three-dimensional reactor core mathematical model, the nodal space time neutron kinetics model and the thermal hydraulics model, based on which the neutron flux and enthalpy distributions of the core are obtained by solving the three-dimensional nodal space time kinetics equations and energy equations for both single and two-phase flows respectively. Moreover, it describes that the three-dimensional reactor core model can also be used to calculate and determine the reactivity effects of the moderator temperature, boron concentration, fuel temperature, coolant void, xenon worth, samarium worth, control element positions (CEAs) and core burnup status. Besides these, the main mathematic theory of SVR is introduced briefly next, on the basis of which SVR is applied to dealing with the data generated by two sample calculation, rod ejection transient and axial
A way toward analyzing high-content bioimage data by means of semantic annotation and visual data mining

Science.gov (United States)

Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.

2009-02-01

In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.

Mining Building Metadata by Data Stream Comparison

DEFF Research Database (Denmark)

Holmegaard, Emil; Kjærgaard, Mikkel Baun

2016-01-01

to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...
Known or knowing publics? Social media data mining and the question of public agency

Directory of Open Access Journals (Sweden)

Helen Kennedy

2015-10-01

Full Text Available New methods to analyse social media data provide a powerful way to know publics and capture what they say and do. At the same time, access to these methods is uneven, with corporations and governments tending to have best access to relevant data and analytics tools. Critics raise a number of concerns about the implications dominant uses of data mining and analytics may have for the public: they result in less privacy, more surveillance and social discrimination, and they provide new ways of controlling how publics come to be represented and so understood. In this paper, we consider if a different relationship between the public and data mining might be established, one in which publics might be said to have greater agency and reflexivity vis-à-vis data power. Drawing on growing calls for alternative data regimes and practices, we argue that to enable this different relationship, data mining and analytics need to be democratised in three ways: they should be subject to greater public supervision and regulation, available and accessible to all, and used to create not simply known but reflexive, active and knowing publics. We therefore imagine conditions in which data mining is not just used as a way to know publics, but can become a means for publics to know themselves.
Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

Science.gov (United States)

Khoshahval, S.; Farnaghi, M.; Taleai, M.

2017-09-01

Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.
Data mining teaching throughout cards game competition

OpenAIRE

Antoñanzas-Torres, Javier; Urraca, Ruben; Sodupe-Ortega, Enrique; Martínez-de-Pison, Francisco; Pernía-Espinoza, Alpha

2015-01-01

[EN] Data-mining techniques and statistical metrics learning can be complicated because of the complexity and overwhelming nature of this field. In this paper a class competition to improve learning of designing Decision Support Systems (DSS) by playing a classic cards game named "Copo" is proposed. The fact that this game is based on a probabilistic problem and that different solutions can be obtained represents a very typical kind of problem in the field of engineering and compu...
Proceedings of the International Conference on Educational Data Mining (EDM) (5th, Chania, Greece, June 19-21, 2012)

Science.gov (United States)

International Educational Data Mining Society, 2012

2012-01-01

The 5th International Conference on Educational Data Mining (EDM 2012) is held in picturesque Chania on the beautiful Crete island in Greece, under the auspices of the International Educational Data Mining Society (IEDMS). The EDM 2012 conference is a leading international forum for high quality research that mines large data sets of educational…
Data mining for water resource management part 2 - methods and approaches to solving contemporary problems

Science.gov (United States)

Roehl, Edwin A.; Conrads, Paul

2010-01-01

This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.
GROUND DEFORMATION EXTRACTION USING VISIBLE IMAGES AND LIDAR DATA IN MINING AREA

Directory of Open Access Journals (Sweden)

W. Hu

2016-06-01

Full Text Available Recognition and extraction of mining ground deformation can help us understand the deformation process and space distribution, and estimate the deformation laws and trends. This study focuses on the application of ground deformation detection and extraction combining with high resolution visible stereo imagery, LiDAR observation point cloud data and historical data. The DEM in large mining area is generated using high-resolution satellite stereo images, and ground deformation is obtained through time series analysis combined with historical DEM data. Ground deformation caused by mining activities are detected and analyzed to explain the link between the regional ground deformation and local deformation. A district of covering 200 km2 around the West Open Pit Mine in Fushun of Liaoning province, a city located in the Northeast China is chosen as the test area for example. Regional and local ground deformation from 2010 to 2015 time series are detected and extracted with DEMs derived from ZY-3 images and LiDAR point DEMs in the case study. Results show that the mean regional deformation is 7.1 m of rising elevation with RMS 9.6 m. Deformation of rising elevation and deformation of declining elevation couple together in local area. The area of higher elevation variation is 16.3 km2 and the mean rising value is 35.8 m with RMS 15.7 m, while the deformation area of lower elevation variation is 6.8 km2 and the mean declining value is 17.6 m with RMS 9.3 m. Moreover, local large deformation and regional slow deformation couple together, the deformation in local mining activities has expanded to the surrounding area, a large ground fracture with declining elevation has been detected and extracted in the south of West Open Pit Mine, the mean declining elevation of which is 23.1 m and covering about 2.3 km2 till 2015. The results in this paper are preliminary currently; we are making efforts to improve more precision results with
Data Mining in Distributed Database of the First Egyptian Thermal Research Reactor (ETRR-1)

International Nuclear Information System (INIS)

Abo Elez, R.H.; Ayad, N.M.A.; Ghuname, A.A.A.

2006-01-01

Distributed database (DDB)technology application systems are growing up to cover many fields an domains, and at different levels. the aim of this paper is to shade some lights on applying the new technology of distributed database on the ETRR-1 operation data logged by the data acquisition system (DACQUS)and one can extract a useful knowledge. data mining with scientific methods and specialize tools is used to support the extraction of useful knowledge from the rapidly growing volumes of data . there are many shapes and forms for data mining methods. predictive methods furnish models capable of anticipating the future behavior of quantitative or qualitative database variables. when the relationship between the dependent an independent variables is nearly liner, linear regression method is the appropriate data mining strategy. so, multiple linear regression models have been applied to a set of data samples of the ETRR-1 operation data, using least square method. the results show an accurate analysis of the multiple linear regression models as applied to the ETRR-1 operation data
Data mining practical machine learning tools and techniques

CERN Document Server

Witten, Ian H

2005-01-01

As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same
A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques

Science.gov (United States)

Uswatun Khasanah, Annisa; Harwati

2017-06-01

Student’s performance prediction is essential to be conducted for a university to prevent student fail. Number of student drop out is one of parameter that can be used to measure student performance and one important point that must be evaluated in Indonesia university accreditation. Data Mining has been widely used to predict student’s performance, and data mining that applied in this field usually called as Educational Data Mining. This study conducted Feature Selection to select high influence attributes with student performance in Department of Industrial Engineering Universitas Islam Indonesia. Then, two popular classification algorithm, Bayesian Network and Decision Tree, were implemented and compared to know the best prediction result. The outcome showed that student’s attendance and GPA in the first semester were in the top rank from all Feature Selection methods, and Bayesian Network is outperforming Decision Tree since it has higher accuracy rate.
Geological survey of Maryland using EREP flight data. [mining, mapping, Chesapeake Bay islands, coastal water features

Science.gov (United States)

Weaver, K. N. (Principal Investigator)

1973-01-01

The author has identified the following significant results. Underflight photography has been used in the Baltimore County mined land inventory to determine areas of disturbed land where surface mining of sand and ground clay, or stone has taken place. Both active and abandoned pits and quarries were located. Aircraft data has been used to update cultural features of Calvert, Caroline, St. Mary's, Somerset, Talbot, and Wicomico Counties. Islands have been located and catalogued for comparison with older film and map data for erosion data. Strip mined areas are being mapped to obtain total area disturbed to aid in future mining and reclamation problems. Coastal estuarine and Atlantic Coast features are being studied to determine nearshore bedforms, sedimentary, and erosional patterns, and manmade influence on natural systems.
Mining data from hemodynamic simulations via Bayesian emulation

Directory of Open Access Journals (Sweden)

Nair Prasanth B

2007-12-01

Full Text Available Abstract Background: Arterial geometry variability is inevitable both within and across individuals. To ensure realistic prediction of cardiovascular flows, there is a need for efficient numerical methods that can systematically account for geometric uncertainty. Methods and results: A statistical framework based on Bayesian Gaussian process modeling was proposed for mining data generated from computer simulations. The proposed approach was applied to analyze the influence of geometric parameters on hemodynamics in the human carotid artery bifurcation. A parametric model in conjunction with a design of computer experiments strategy was used for generating a set of observational data that contains the maximum wall shear stress values for a range of probable arterial geometries. The dataset was mined via a Bayesian Gaussian process emulator to estimate: (a the influence of key parameters on the output via sensitivity analysis, (b uncertainty in output as a function of uncertainty in input, and (c which settings of the input parameters result in maximum and minimum values of the output. Finally, potential diagnostic indicators were proposed that can be used to aid the assessment of stroke risk for a given patient's geometry.
Genomics Portals: integrative web-platform for mining genomics data.

Science.gov (United States)

Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

2010-01-13

A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Geographical variation in morphometry, craniometry, and diet of amammalian species (Stone marten, Martes foina) using data mining

OpenAIRE

PAPAKOSTA, MALAMATI; KITIKIDOU, KYRIAKI; BAKALOUDIS, DIMITRIOS; VLACHOS, CHRISTOS; CHATZINIKOS, EVANGELOS; ALEXANDROU, OLGA; SAKOULIS, ANASTASIOS

2018-01-01

Ecologists use various data mining techniques to make predictions and estimations, to identify patterns in datasets and relationships between qualitative and quantitative variables, or to classify variables. The aim of this study was to investigate if the application of data mining could be used to study geographical variation in the morphometry, craniometry, and diet of a mammalian species (Martes foina), and to determine whether data mining can complement genetic analysis to recognize subsp...
DATA MINING WORKSPACE AS AN OPTIMIZATION PREDICTION TECHNIQUE FOR SOLVING TRANSPORT PROBLEMS

Directory of Open Access Journals (Sweden)

Anastasiia KUPTCOVA

2016-09-01

Full Text Available This article addresses the study related to forecasting with an actual high-speed decision making under careful modelling of time series data. The study uses data-mining modelling for algorithmic optimization of transport goals. Our finding brings to the future adequate techniques for the fitting of a prediction model. This model is going to be used for analyses of the future transaction costs in the frontiers of the Czech Republic. Time series prediction methods for the performance of prediction models in the package of Statistics are Exponential, ARIMA and Neural Network approaches. The primary target for a predictive scenario in the data mining workspace is to provide modelling data faster and with more versatility than the other management techniques.
An Enhanced Text-Mining Framework for Extracting Disaster Relevant Data through Social Media and Remote Sensing Data Fusion

Science.gov (United States)

Scheele, C. J.; Huang, Q.

2016-12-01

In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
Data mining for clustering naming of the village at Java Island

Science.gov (United States)

Setiawan Abdullah, Atje; Nurani Ruchjana, Budi; Hidayat, Akik; Akmal; Setiana, Deni

2017-10-01

Clustering of query based data mining to identify the meaning of the naming of the village in Java island, done by exploring the database village with three categories namely: prefix in the naming of the village, syllables contained in the naming of the village, and full word naming of the village which is actually used. While syllables contained in the naming of the village are classified by the behaviour of the culture and character of each province that describes the business, feelings, circumstances, places, nature, respect, plants, fruits, and animals. Sources of data used for the clustering of the naming of the village on the island of Java was obtained from Geospatial Information Agency (BIG) in the form of a complete village name data with the coordinates in six provinces in Java, which is arranged in a hierarchy of provinces, districts / cities, districts and villages. The research method using KDD (Knowledge Discovery in Database) through the process of preprocessing, data mining and postprocessing to obtain knowledge. In this study, data mining applications to facilitate the search query based on the name of the village, using Java software. While the contours of a map is processed using ArcGIS software. The results of the research can give recommendations to stakeholders such as the Department of Tourism to describe the meaning of the classification of naming the village according to the character in each province at Java island.
CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy

Science.gov (United States)

Ball, N. M.

2013-10-01

To-date, computing systems have allowed either sophisticated analysis of small datasets, as exemplified by most astronomy software, or simple analysis of large datasets, such as database queries. At the Canadian Astronomy Data Centre, we have combined our cloud computing system, the Canadian Advanced Network for Astronomical Research (CANFAR), with the world's most advanced machine learning software, Skytree, to create the world's first cloud computing system for data mining in astronomy. CANFAR provides a generic environment for the storage and processing of large datasets, removing the requirement for an individual or project to set up and maintain a computing system when implementing an extensive undertaking such as a survey pipeline. 500 processor cores and several hundred terabytes of persistent storage are currently available to users, and both the storage and processing infrastructure are expandable. The storage is implemented via the International Virtual Observatory Alliance's VOSpace protocol, and is available as a mounted filesystem accessible both interactively, and to all processing jobs. The user interacts with CANFAR by utilizing virtual machines, which appear to them as equivalent to a desktop. Each machine is replicated as desired to perform large-scale parallel processing. Such an arrangement enables the user to immediately install and run the same astronomy code that they already utilize, in the same way as on a desktop. In addition, unlike many cloud systems, batch job scheduling is handled for the user on multiple virtual machines by the Condor job queueing system. Skytree is installed and run just as any other software on the system, and thus acts as a library of command line data mining functions that can be integrated into one's wider analysis. Thus we have created a generic environment for large-scale analysis by data mining, in the same way that CANFAR itself has done for storage and processing. Because Skytree scales to large data in
Developing an open source-based spatial data infrastructure for integrated monitoring of mining areas

Science.gov (United States)

Lahn, Florian; Knoth, Christian; Prinz, Torsten; Pebesma, Edzer

2014-05-01

In all phases of mining campaigns, comprehensive spatial information is an essential requirement in order to ensure economically efficient but also safe mining activities as well as to reduce environmental impacts. Earth observation data acquired from various sources like remote sensing or ground measurements is important e.g. for the exploration of mineral deposits, the monitoring of mining induced impacts on vegetation or the detection of ground subsidence. The GMES4Mining project aims at exploring new remote sensing techniques and developing analysis methods on various types of sensor data to provide comprehensive spatial information during mining campaigns (BENECKE et al. 2013). One important task in this project is the integration of the data gathered (e.g. hyperspectral images, spaceborne radar data and ground measurements) as well as results of the developed analysis methods within a web-accessible data source based on open source software. The main challenges here are to provide various types and formats of data from different sensors and to enable access to analysis and processing techniques without particular software or licensing requirements for users. Furthermore the high volume of the involved data (especially hyperspectral remote sensing images) makes data transfer a major issue in this use case. To engage these problems a spatial data infrastructure (SDI) including a web portal as user frontend is being developed which allows users to access not only the data but also several analysis methods. The Geoserver software is used for publishing the data, which is then accessed and visualized in a JavaScript-based web portal. In order to perform descriptive statistics and some straightforward image processing techniques on the raster data (e.g. band arithmetic or principal component analysis) the statistics software R is implemented on a server and connected via Rserve. The analysis is controlled and executed directly by the user through the web portal and
The Evaluation on Data Mining Methods of Horizontal Bar Training Based on BP Neural Network

Directory of Open Access Journals (Sweden)

Zhang Yanhui

2015-01-01

Full Text Available With the rapid development of science and technology, data analysis has become an indispensable part of people’s work and life. Horizontal bar training has multiple categories. It is an emphasis for the re-search of related workers that categories of the training and match should be reduced. The application of data mining methods is discussed based on the problem of reducing categories of horizontal bar training. The BP neural network is applied to the cluster analysis and the principal component analysis, which are used to evaluate horizontal bar training. Two kinds of data mining methods are analyzed from two aspects, namely the operational convenience of data mining and the rationality of results. It turns out that the principal component analysis is more suitable for data processing of horizontal bar training.

Groundwater-quality data associated with abandoned underground coal mine aquifers in West Virginia, 1973-2016: Compilation of existing data from multiple sources

Science.gov (United States)

McAdoo, Mitchell A.; Kozar, Mark D.

2017-11-14

This report describes a compilation of existing water-quality data associated with groundwater resources originating from abandoned underground coal mines in West Virginia. Data were compiled from multiple sources for the purpose of understanding the suitability of groundwater from abandoned underground coal mines for public supply, industrial, agricultural, and other uses. This compilation includes data collected for multiple individual studies conducted from July 13, 1973 through September 7, 2016. Analytical methods varied by the time period of data collection and requirements of the independent studies.This project identified 770 water-quality samples from 294 sites that could be attributed to abandoned underground coal mine aquifers originating from multiple coal seams in West Virginia.
Data Mining Application in Customer Relationship Management for Hospital Inpatients

Science.gov (United States)

2012-01-01

Objectives This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. Methods A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Results Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. Conclusions To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM. PMID:23115740
Data Mining: Comparing the Empiric CFS to the Canadian ME/CFS Case Definition

OpenAIRE

Jason, Leonard A.; Skendrovic, Beth; Furst, Jacob; Brown, Abigail; Weng, Angela; Bronikowski, Christine

2011-01-01

This article contrasts two case definitions for Myalgic Encephalomyelitis/chronic fatigue syndrome (ME/CFS). We compared the empiric CFS case definition (Reeves et al., 2005) and the Canadian ME/CFS Clinical case definition (Carruthers et al., 2003) with a sample of individuals with CFS versus those without. Data mining with decision trees was used to identify the best items to identify patients with CFS. Data mining is a statistical technique that was used to help determine which of the surv...
Data Mining for Anomaly Detection

Science.gov (United States)

Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

2013-01-01

The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.
The use of Data Mining in the categorization of patients with Azoospermia.

Science.gov (United States)

Mikos, Themistoklis; Maglaveras, Nikolaos; Pantazis, Konstantinos; Goulis, Dimitrios G; Bontis, John N; Papadimas, John

2005-01-01

Data Mining is a relatively new field of Medical Informatics. The aim of this study was to compare Data Mining diagnosis with clinical diagnosis by applying a Data Miner (DM) to a clinical dataset of infertile men with azoospermia. One hundred and forty-seven azoospermic men were clinically classified into four groups: a) obstructive azoospermia (n=63), b) non-obstructive azoospermia (n=71), c) hypergonadotropic hypogonadism (n=2), and d) hypogonadotropic hypogonadism (n=11). The DM (IBM's DB2/Intelligent Miner for Data 6.1) was asked to reproduce a four-cluster model. DM formed four groups of patients: a) eugonadal men with normal testicular volume and normal FSH levels (n=86), b) eugonadal men with significantly reduced testicular volume (median 6.5 cm3) and very high FSH levels (n=29), c) eugonadal men with moderately reduced testicular volume (median 14.5 cm3) and raised FSH levels (n=20), and d) hypogonadal men (n=12). Overall DM concordance rate in hypogonadal men was 92%, in obstructive azoospermia 73%, and in non-obstructive azoospermia 69%. Data Mining produces clinically meaningful results but different from those of the clinical diagnosis. It is possible that the use of large sets of structured and formalised data and continuous evaluation of DM results will generate a useful methodology for the Clinician.
The multiple zeta value data mine

International Nuclear Information System (INIS)

Buemlein, J.; Broadhurst, D.J.

2009-07-01

We provide a data mine of proven results for multiple zeta values (MZVs) of the form ζ(s 1 ,s 2 ,..,s k ) = sum ∞ n 1 >n 2 >...>n k >0 {1/(n 1 s 1 ..n k s k )} with weight w = sum K i=1 s i and depth k and for Euler sums of the form sum ∞ n 1 >n 2 >...>n k >0 {(ε 1 n 1 ..ε 1 n k )/(n 1 s 1 ..n k s k )} with signs ε i = ± 1. Notably, we achieve explicit proven reductions of all MZVs with weights w≤22, and all Euler sums with weights w≤12, to bases whose dimensions, bigraded by weight and depth, have sizes in precise agreement with the Broadhurst. Kreimer and Broadhurst conjectures. Moreover, we lend further support to these conjectures by studying even greater weights (w≤30), using modular arithmetic. To obtain these results we derive a new type of relation for Euler sums, the Generalized Doubling Relations. We elucidate the ''pushdown'' mechanism, whereby the ornate enumeration of primitive MZVs, by weight and depth, is reconciled with the far simpler enumeration of primitive Euler sums. There is some evidence that this pushdown mechanism finds its origin in doubling relations. We hope that our data mine, obtained by exploiting the unique power of the computer algebra language FORM, will enable the study of many more such consequences of the double-shuffle algebra of MZVs, and their Euler cousins, which are already the subject of keen interest, to practitioners of quantum field theory, and to mathematicians alike. (orig.)
A case-based reasoning tool for breast cancer knowledge management with data mining concepts and techniques

Science.gov (United States)

Demigha, Souâd.

2016-03-01

The paper presents a Case-Based Reasoning Tool for Breast Cancer Knowledge Management to improve breast cancer screening. To develop this tool, we combine both concepts and techniques of Case-Based Reasoning (CBR) and Data Mining (DM). Physicians and radiologists ground their diagnosis on their expertise (past experience) based on clinical cases. Case-Based Reasoning is the process of solving new problems based on the solutions of similar past problems and structured as cases. CBR is suitable for medical use. On the other hand, existing traditional hospital information systems (HIS), Radiological Information Systems (RIS) and Picture Archiving Information Systems (PACS) don't allow managing efficiently medical information because of its complexity and heterogeneity. Data Mining is the process of mining information from a data set and transform it into an understandable structure for further use. Combining CBR to Data Mining techniques will facilitate diagnosis and decision-making of medical experts.
Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov

Directory of Open Access Journals (Sweden)

Eric Wen Su

2017-03-01

Full Text Available Drug repositioning (i.e., drug repurposing is the process of discovering new uses for marketed drugs. Historically, such discoveries were serendipitous. However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically identify drugs with the potential to be repurposed. Described here is a novel method of drug repositioning by mining ClinicalTrials.gov. The text mining tools I2E (Linguamatics and PolyAnalyst (Megaputer were utilized. An I2E query extracts “Serious Adverse Events” (SAE data from randomized trials in ClinicalTrials.gov. Through a statistical algorithm, a PolyAnalyst workflow ranks the drugs where the treatment arm has fewer predefined SAEs than the control arm, indicating that potentially the drug is reducing the level of SAE. Hypotheses could then be generated for the new use of these drugs based on the predefined SAE that is indicative of disease (for example, cancer.
Using Data Mining to Predict Possible Future Depression Cases

OpenAIRE

Daimi, Kevin; Banitaan, Shadi

2014-01-01

Depression is a disorder characterized by misery and gloominess felt over a period of time. Some symptoms of depression overlap with somatic illnesses implying considerable difficulty in diagnosing it. This paper contributes to its diagnosis through the application of data mining, namely classification, to predict patients who will most likely develop depression or are currently suffering from depression. Synthetic data is used for this study. To acquire the results, the popular suite of mach...
SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM

Directory of Open Access Journals (Sweden)

S. Khoshahval

2017-09-01

Full Text Available Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user’s visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users’ behaviour in a system and can be utilized in various location-based applications.
Mining Genome-Scale Growth Phenotype Data through Constant-Column Biclustering

KAUST Repository

Alzahrani, Majed A.

2017-01-01

for mining in growth phenotype data. Here, we propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large
Highlights of recent articles on data mining in genomics & proteomics

Science.gov (United States)

This editorial elaborates on investigations consisting of different “OMICS” technologies and their application to biological sciences. In addition, advantages and recent development of the proteomic, genomic and data mining technologies are discussed. This information will be useful to scientists ...
Unsupervised Tensor Mining for Big Data Practitioners.

Science.gov (United States)

Papalexakis, Evangelos E; Faloutsos, Christos

2016-09-01

Multiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data. In this article, our goal is to popularize tensors and tensor decompositions to Big Data practitioners by demonstrating their effectiveness, outlining challenges that pertain to their application in Big Data scenarios, and presenting our recent work that tackles those challenges. We view this work as a step toward a fully automated, unsupervised tensor mining tool that can be easily and broadly adopted by practitioners in academia and industry.
Data Mining and Knowledge Discovery tools for exploiting big Earth-Observation data

Science.gov (United States)

Espinoza Molina, D.; Datcu, M.

2015-04-01

The continuous increase in the size of the archives and in the variety and complexity of Earth-Observation (EO) sensors require new methodologies and tools that allow the end-user to access a large image repository, to extract and to infer knowledge about the patterns hidden in the images, to retrieve dynamically a collection of relevant images, and to support the creation of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In this context, we are concerned with providing a platform for data mining and knowledge discovery content from EO archives. The platform's goal is to implement a communication channel between Payload Ground Segments and the end-user who receives the content of the data coded in an understandable format associated with semantics that is ready for immediate exploitation. It will provide the user with automated tools to explore and understand the content of highly complex images archives. The challenge lies in the extraction of meaningful information and understanding observations of large extended areas, over long periods of time, with a broad variety of EO imaging sensors in synergy with other related measurements and data. The platform is composed of several components such as 1.) ingestion of EO images and related data providing basic features for image analysis, 2.) query engine based on metadata, semantics and image content, 3.) data mining and knowledge discovery tools for supporting the interpretation and understanding of image content, 4.) semantic definition of the image content via machine learning methods. All these components are integrated and supported by a relational database management system, ensuring the integrity and consistency of Terabytes of Earth Observation data.
Data Mining of the Thermal Performance of Cool-Pipes in Massive Concrete via In Situ Monitoring

Directory of Open Access Journals (Sweden)

Zheng Zuo

2014-01-01

Full Text Available Embedded cool-pipes are very important for massive concrete because their cooling effect can effectively avoid thermal cracks. In this study, a data mining approach to analyzing the thermal performance of cool-pipes via in situ monitoring is proposed. Delicate monitoring program is applied in a high arch dam project that provides a good and mass data source. The factors and relations related to the thermal performance of cool-pipes are obtained in a built theory thermal model. The supporting vector machine (SVM technology is applied to mine the data. The thermal performances of iron pipes and high-density polyethylene (HDPE pipes are compared. The data mining result shows that iron pipe has a better heat removal performance when flow rate is lower than 50 L/min. It has revealed that a turning flow rate exists for iron pipe which is 80 L/min. The prediction and classification results obtained from the data mining model agree well with the monitored data, which illustrates the validness of the approach.
A practitioners guide to resampling for data analysis, data mining, and modeling: A cookbook for starters

NARCIS (Netherlands)

van den Broek, Egon

A practitioner’s guide to resampling for data analysis, data mining, and modeling provides a gentle and pragmatic introduction in the proposed topics. Its supporting Web site was offline and, hence, its potentially added value could not be verified. The book refrains from using advanced mathematics
Visual mining of semi-structured data

CERN Multimedia

CERN. Geneva; Posada, Jorge; Quartulli, Marco

2013-01-01

Background Vicomtech is visiting CERN to expose their activities and explore possible lines of collaboration. As part of the programme they will be offering a presentation, staged in three parts: Presentation of Vicomtech – Seán Gaines Descriptions of technologies and specialities – Dr. Jorge Posada Details on projects related to the development of visually-based algorithms for intelligent storage, processing, visualization and interaction with Big Data, for massive sources of information. – Dr. Marco Quartulli. The full programme to the visit is here Abstract Mining semi-structured data is fundamental for archive monitoring, understanding and exploitation. Typical analysis systems are based on a three-tiered architecture, in which efficient databases feed highly parallelised application servers that in turn feed client user interfaces. Yet the sharing of analysis, content identification and semantic level summarization tasks among the two bot...
Data Mining Activities for Bone Discipline - Current Status

Science.gov (United States)

Sibonga, J. D.; Pietrzyk, R. A.; Johnston, S. L.; Arnaud, S. B.

2008-01-01

The disciplinary goals of the Human Research Program are broadly discussed. There is a critical need to identify gaps in the evidence that would substantiate a skeletal health risk during and after spaceflight missions. As a result, data mining activities will be engaged to gather reviews of medical data and flight analog data and to propose additional measures and specific analyses. Several studies are briefly reviewed which have topics that partially address these gaps in knowledge, including bone strength recovery with recovery of bone mass density, current renal stone formation knowledge, herniated discs, and a review of bed rest studies conducted at Ames Human Research Facility.
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.

Science.gov (United States)

Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki

2010-08-25

This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
Effects of Epilepsy on Language Functions: Scoping Review and Data Mining Findings.

Science.gov (United States)

Dutta, Manaswita; Murray, Laura; Miller, Wendy; Groves, Doyle

2018-03-01

This study involved a scoping review to identify possible gaps in the empirical description of language functioning in epilepsy in adults. With access to social network data, data mining was used to determine if individuals with epilepsy are expressing language-related concerns. For the scoping review, scientific databases were explored to identify pertinent articles. Findings regarding the nature of epilepsy etiologies, patient characteristics, tested language modalities, and language measures were compiled. Data mining focused on social network databases to obtain a set of relevant language-related posts. The search yielded 66 articles. Epilepsy etiologies except temporal lobe epilepsy and older adults were underrepresented. Most studies utilized aphasia tests and primarily assessed single-word productions; few studies included healthy control groups. Data mining revealed several posts regarding epilepsy-related language problems, including word retrieval, reading, writing, verbal memory difficulties, and negative effects of epilepsy treatment on language. Our findings underscore the need for future specification of the integrity of language in epilepsy, particularly with respect to discourse and high-level language abilities. Increased awareness of epilepsy-related language issues and understanding the patients' perspectives about their language concerns will allow researchers and speech-language pathologists to utilize appropriate assessments and improve quality of care.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.