WorldWideScience

Sample records for multiple post-acquisition data-mining

  1. Post-acquisition data mining techniques for LC-MS/MS-acquired data in drug metabolite identification.

    Science.gov (United States)

    Dhurjad, Pooja Sukhdev; Marothu, Vamsi Krishna; Rathod, Rajeshwari

    2017-08-01

    Metabolite identification is a crucial part of the drug discovery process. LC-MS/MS-based metabolite identification has gained widespread use, but the data acquired by the LC-MS/MS instrument is complex, and thus the interpretation of data becomes troublesome. Fortunately, advancements in data mining techniques have simplified the process of data interpretation with improved mass accuracy and provide a potentially selective, sensitive, accurate and comprehensive way for metabolite identification. In this review, we have discussed the targeted (extracted ion chromatogram, mass defect filter, product ion filter, neutral loss filter and isotope pattern filter) and untargeted (control sample comparison, background subtraction and metabolomic approaches) post-acquisition data mining techniques, which facilitate the drug metabolite identification. We have also discussed the importance of integrated data mining strategy.

  2. The multiple zeta value data mine

    International Nuclear Information System (INIS)

    Buemlein, J.; Broadhurst, D.J.

    2009-07-01

    We provide a data mine of proven results for multiple zeta values (MZVs) of the form ζ(s 1 ,s 2 ,..,s k ) = sum ∞ n 1 >n 2 >...>n k >0 {1/(n 1 s 1 ..n k s k )} with weight w = sum K i=1 s i and depth k and for Euler sums of the form sum ∞ n 1 >n 2 >...>n k >0 {(ε 1 n 1 ..ε 1 n k )/(n 1 s 1 ..n k s k )} with signs ε i = ± 1. Notably, we achieve explicit proven reductions of all MZVs with weights w≤22, and all Euler sums with weights w≤12, to bases whose dimensions, bigraded by weight and depth, have sizes in precise agreement with the Broadhurst. Kreimer and Broadhurst conjectures. Moreover, we lend further support to these conjectures by studying even greater weights (w≤30), using modular arithmetic. To obtain these results we derive a new type of relation for Euler sums, the Generalized Doubling Relations. We elucidate the ''pushdown'' mechanism, whereby the ornate enumeration of primitive MZVs, by weight and depth, is reconciled with the far simpler enumeration of primitive Euler sums. There is some evidence that this pushdown mechanism finds its origin in doubling relations. We hope that our data mine, obtained by exploiting the unique power of the computer algebra language FORM, will enable the study of many more such consequences of the double-shuffle algebra of MZVs, and their Euler cousins, which are already the subject of keen interest, to practitioners of quantum field theory, and to mathematicians alike. (orig.)

  3. The multiple zeta value data mine

    Energy Technology Data Exchange (ETDEWEB)

    Buemlein, J. [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany); Broadhurst, D.J. [Open Univ., Milton Keynes (United Kingdom). Physics and Astronomy Dept.; Vermaseren, J.A.M. [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany); NIKHEF, Amsterdam (Netherlands)

    2009-07-15

    We provide a data mine of proven results for multiple zeta values (MZVs) of the form {zeta}(s{sub 1},s{sub 2},..,s{sub k}) = sum {sup {infinity}}{sub n{sub 1}}{sub >n{sub 2}}{sub >...>n{sub k}}{sub >0} {l_brace}1/(n{sub 1}{sup s{sub 1}}..n{sub k}{sup s{sub k}}){r_brace} with weight w = sum {sup K}{sub i=1}s{sub i} and depth k and for Euler sums of the form sum {sup {infinity}}{sub n{sub 1}}{sub >n{sub 2}}{sub >...>n{sub k}}{sub >0} {l_brace}({epsilon}{sub 1}{sup n{sub 1}}..{epsilon}{sub 1}{sup n{sub k}})/(n{sub 1}{sup s{sub 1}}..n{sub k}{sup s{sub k}}){r_brace} with signs {epsilon}{sub i} = {+-} 1. Notably, we achieve explicit proven reductions of all MZVs with weights w{<=}22, and all Euler sums with weights w{<=}12, to bases whose dimensions, bigraded by weight and depth, have sizes in precise agreement with the Broadhurst. Kreimer and Broadhurst conjectures. Moreover, we lend further support to these conjectures by studying even greater weights (w{<=}30), using modular arithmetic. To obtain these results we derive a new type of relation for Euler sums, the Generalized Doubling Relations. We elucidate the ''pushdown'' mechanism, whereby the ornate enumeration of primitive MZVs, by weight and depth, is reconciled with the far simpler enumeration of primitive Euler sums. There is some evidence that this pushdown mechanism finds its origin in doubling relations. We hope that our data mine, obtained by exploiting the unique power of the computer algebra language FORM, will enable the study of many more such consequences of the double-shuffle algebra of MZVs, and their Euler cousins, which are already the subject of keen interest, to practitioners of quantum field theory, and to mathematicians alike. (orig.)

  4. PRIVACY PRESERVING DATA MINING USING MULTIPLE OBJECTIVE OPTIMIZATION

    Directory of Open Access Journals (Sweden)

    V. Shyamala Susan

    2016-10-01

    Full Text Available Privacy preservation is that the most targeted issue in information publication, because the sensitive data shouldn't be leaked. For this sake, several privacy preservation data mining algorithms are proposed. In this work, feature selection using evolutionary algorithm and data masking coupled with slicing is treated as a multiple objective optimisation to preserve privacy. To start with, Genetic Algorithm (GA is carried out over the datasets to perceive the sensitive attributes and prioritise the attributes for treatment as per their determined sensitive level. In the next phase, to distort the data, noise is added to the higher level sensitive value using Hybrid Data Transformation (HDT method. In the following phase slicing algorithm groups the correlated attributes organized and by this means reduces the dimensionality by retaining the Advanced Clustering Algorithm (ACA. With the aim of getting the optimal dimensions of buckets, tuple segregating is accomplished by Metaheuristic Firefly Algorithm (MFA. The investigational consequences imply that the anticipated technique can reserve confidentiality and therefore the information utility is additionally high. Slicing algorithm allows the protection of association and usefulness in which effects in decreasing the information dimensionality and information loss. Performance analysis is created over OCC 7 and OCC 15 and our optimization method proves its effectiveness over two totally different datasets by showing 92.98% and 96.92% respectively.

  5. Analyzing clinical symptoms in multiple sclerosis using data mining

    Directory of Open Access Journals (Sweden)

    Zahra Raeisi

    2017-04-01

    Full Text Available Background: One of the today most common and incurable diseases that is associated with central neural system is ‘MS’ disease. Multiple sclerosis (MS is a demyelinating disease in which the insulating covers of nerve cells in the brain and spinal cord are damaged. In this disease become apparent a wide spectrum of symptoms such as lose muscles control and their coordination and vision derangement. The goal of this research is to consider to two problems: 1- Recognition of effective clinical symptoms on MS disease and 2- Considering levels of effectiveness of age, sex and education levels factors on MS disease and association between these factors according to verity of categories of this disease. Methods: Data mining science in medicine is worthy of attention with main application in diagnosis, therapy and prognosis, respectively high volume of collected datum. The data that were used in this article are about patients of Chaharmahal and Bakhtiari Province and collected by cure assistance. In this paper classification and association methods in software engineering field are used. Classification is a general process related to categorization, the process in which ideas and objects are recognized, differentiated, and understood. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Results: In consideration of first problem in this paper, concluded vision-clinical symptoms are the most effective symptoms and in consideration of second problem, concluded that from 584 records, women affected four times more than men. In other word 70% of MS patients with high graduate are in relapsing-remitting category and 62.5% of MS patients are 20-40 years old. Conclusion: Some of symptoms are quite temporary and transitory and are ignored by people. Awareness of clinical-symptoms prevalence manner can be warning for people before starting

  6. Multiple Additive Regression Trees a Methodology for Predictive Data Mining for Fraud Detection

    National Research Council Canada - National Science Library

    da

    2002-01-01

    ...) is using new and innovative techniques for fraud detection. Their primary techniques for fraud detection are the data mining tools of classification trees and neural networks as well as methods for pooling the results of multiple model fits...

  7. Datafish Multiphase Data Mining Technique to Match Multiple Mutually Inclusive Independent Variables in Large PACS Databases.

    Science.gov (United States)

    Kelley, Brendan P; Klochko, Chad; Halabi, Safwan; Siegal, Daniel

    2016-06-01

    Retrospective data mining has tremendous potential in research but is time and labor intensive. Current data mining software contains many advanced search features but is limited in its ability to identify patients who meet multiple complex independent search criteria. Simple keyword and Boolean search techniques are ineffective when more complex searches are required, or when a search for multiple mutually inclusive variables becomes important. This is particularly true when trying to identify patients with a set of specific radiologic findings or proximity in time across multiple different imaging modalities. Another challenge that arises in retrospective data mining is that much variation still exists in how image findings are described in radiology reports. We present an algorithmic approach to solve this problem and describe a specific use case scenario in which we applied our technique to a real-world data set in order to identify patients who matched several independent variables in our institution's picture archiving and communication systems (PACS) database.

  8. Collaborative Data Mining

    Science.gov (United States)

    Moyle, Steve

    Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.

  9. Ensemble Data Mining Methods

    Data.gov (United States)

    National Aeronautics and Space Administration — Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve...

  10. Data mining

    CERN Document Server

    Gorunescu, Florin

    2011-01-01

    The knowledge discovery process is as old as Homo sapiens. Until some time ago, this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since 'knowledge is power'. The goal of this book is to provide, in a friendly way

  11. Post-Acquisition IT Integration

    DEFF Research Database (Denmark)

    Henningsson, Stefan; Yetton, Philip

    2013-01-01

    The extant research on post-acquisition IT integration analyzes how acquirers realize IT-based value in individual acquisitions. However, serial acquirers make 60% of acquisitions. These acquisitions are not isolated events, but are components in growth-by-acquisition programs. To explain how...... serial acquirers realize IT-based value, we develop three propositions on the sequential effects on post-acquisition IT integration in acquisition programs. Their combined explanation is that serial acquirers must have a growth-by-acquisition strategy that includes the capability to improve...... IT integration capabilities, to sustain high alignment across acquisitions and to maintain a scalable IT infrastructure with a flat or decreasing cost structure. We begin the process of validating the three propositions by investigating a longitudinal case study of a growth-by-acquisition program....

  12. Detecting a Weak Association by Testing its Multiple Perturbations: a Data Mining Approach

    Science.gov (United States)

    Lo, Min-Tzu; Lee, Wen-Chung

    2014-05-01

    Many risk factors/interventions in epidemiologic/biomedical studies are of minuscule effects. To detect such weak associations, one needs a study with a very large sample size (the number of subjects, n). The n of a study can be increased but unfortunately only to an extent. Here, we propose a novel method which hinges on increasing sample size in a different direction-the total number of variables (p). We construct a p-based `multiple perturbation test', and conduct power calculations and computer simulations to show that it can achieve a very high power to detect weak associations when p can be made very large. As a demonstration, we apply the method to analyze a genome-wide association study on age-related macular degeneration and identify two novel genetic variants that are significantly associated with the disease. The p-based method may set a stage for a new paradigm of statistical tests.

  13. A novel data mining system points out hidden relationships between immunological markers in multiple sclerosis

    Directory of Open Access Journals (Sweden)

    Gironi Maira

    2013-01-01

    Full Text Available Abstract Background Multiple Sclerosis (MS is a multi-factorial disease, where a single biomarker unlikely can provide comprehensive information. Moreover, due to the non-linearity of biomarkers, traditional statistic is both unsuitable and underpowered to dissect their relationship. Patients affected with primary (PP=14, secondary (SP=33, benign (BB=26, relapsing-remitting (RR=30 MS, and 42 sex and age matched healthy controls were studied. We performed a depth immune-phenotypic and functional analysis of peripheral blood mononuclear cell (PBMCs by flow-cytometry. Semantic connectivity maps (AutoCM were applied to find the natural associations among immunological markers. AutoCM is a special kind of Artificial Neural Network able to find consistent trends and associations among variables. The matrix of connections, visualized through minimum spanning tree, keeps non linear associations among variables and captures connection schemes among clusters. Results Complex immunological relationships were shown to be related to different disease courses. Low CD4IL25+ cells level was strongly related (link strength, ls=0.81 to SP MS. This phenotype was also associated to high CD4ROR+ cells levels (ls=0.56. BB MS was related to high CD4+IL13 cell levels (ls=0.90, as well as to high CD14+IL6 cells percentage (ls=0.80. RR MS was strongly (ls=0.87 related to CD4+IL25 high cell levels, as well indirectly to high percentages of CD4+IL13 cells. In this latter strong (ls=0.92 association could be confirmed the induction activity of the former cells (CD4+IL25 on the latter (CD4+IL13. Another interesting topographic data was the isolation of Th9 cells (CD4IL9 from the main part of the immunological network related to MS, suggesting a possible secondary role of this new described cell phenotype in MS disease. Conclusions This novel application of non-linear mathematical techniques suggests peculiar immunological signatures for different MS phenotypes. Notably, the

  14. Development of a data mining and imaging informatics display tool for a multiple sclerosis e-folder system

    Science.gov (United States)

    Liu, Margaret; Loo, Jerry; Ma, Kevin; Liu, Brent

    2011-03-01

    Multiple sclerosis (MS) is a debilitating autoimmune disease of the central nervous system that damages axonal pathways through inflammation and demyelination. In order to address the need for a centralized application to manage and study MS patients, the MS e-Folder - a web-based, disease-specific electronic medical record system - was developed. The e-Folder has a PHP and MySQL based graphical user interface (GUI) that can serve as both a tool for clinician decision support and a data mining tool for researchers. This web-based GUI gives the e-Folder a user friendly interface that can be securely accessed through the internet and requires minimal software installation on the client side. The e-Folder GUI displays and queries patient medical records--including demographic data, social history, past medical history, and past MS history. In addition, DICOM format imaging data, and computer aided detection (CAD) results from a lesion load algorithm are also displayed. The GUI interface is dynamic and allows manipulation of the DICOM images, such as zoom, pan, and scrolling, and the ability to rotate 3D images. Given the complexity of clinical management and the need to bolster research in MS, the MS e-Folder system will improve patient care and provide MS researchers with a function-rich patient data hub.

  15. Data Mining for CRM

    Science.gov (United States)

    Thearling, Kurt

    Data Mining technology allows marketing organizations to better understand their customers and respond to their needs. This chapter describes how Data Mining can be combined with customer relationship management to help drive improved interactions with customers. An example showing how to use Data Mining to drive customer acquisition activities is presented.

  16. Data mining in radiology

    International Nuclear Information System (INIS)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining

  17. Ensemble Data Mining Methods

    Science.gov (United States)

    Oza, Nikunj C.

    2004-01-01

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  18. Data mining goes multidimensional.

    Science.gov (United States)

    Hettler, M

    1997-03-01

    The success of a healthcare organization depends on its ability to acquire, store, analyze and compare data across many parts of the enterprise, by many individuals. While relational databases have been around since the 1970s, their two-dimensional structure has limited--or made impossible--the kind of cross-dimensional trend analysis so necessary to healthcare today. Enter online analytical processing (OLAP), in which servers store data in multiple dimensions, opening a world of opportunity for data-mining across the enterprise. In this issue of HEALTHCARE INFORMATICS, we feature our first report from the National Software Testing Laboratories (NSTL) about technologies that will change the way healthcare does business. A division of The McGraw-Hill Companies, NSTL is an independent software and hardware testing lab offering services that include compatibility testing, bug testing, comparison testing, documentation evaluation and usability.

  19. Data mining for service

    CERN Document Server

    2014-01-01

    Virtually all nontrivial and modern service related problems and systems involve data volumes and types that clearly fall into what is presently meant as "big data", that is, are huge, heterogeneous, complex, distributed, etc. Data mining is a series of processes which include collecting and accumulating data, modeling phenomena, and discovering new information, and it is one of the most important steps to scientific analysis of the processes of services.  Data mining application in services requires a thorough understanding of the characteristics of each service and knowledge of the compatibility of data mining technology within each particular service, rather than knowledge only in calculation speed and prediction accuracy. Varied examples of services provided in this book will help readers understand the relation between services and data mining technology. This book is intended to stimulate interest among researchers and practitioners in the relation between data mining technology and its application to ...

  20. The importance of cultural leadership during post-acquisition integration

    OpenAIRE

    Mcconnon, Tom

    2013-01-01

    Mergers and acquisitions (M&A) are not only financial decisions but can also be understood as social processes. Due to the myriad of changes generated by an acquisition, the integration period is characterised by multiple adjustment difficulties. A substantive body of research blames post-acquisition ‘cultural clash’ caused by cultural differences between the two merging organisations as a major cause of disappointing integration outcomes. Yet research into the process of cultural leadership ...

  1. Data mining in agriculture

    CERN Document Server

    Mucherino, Antonio; Pardalos, Panos M

    2009-01-01

    Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.

  2. Applied data mining

    CERN Document Server

    Xu, Guandong

    2013-01-01

    Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.

  3. Data mining and visualization techniques

    Science.gov (United States)

    Wong, Pak Chung [Richland, WA; Whitney, Paul [Richland, WA; Thomas, Jim [Richland, WA

    2004-03-23

    Disclosed are association rule identification and visualization methods, systems, and apparatus. An association rule in data mining is an implication of the form X.fwdarw.Y where X is a set of antecedent items and Y is the consequent item. A unique visualization technique that provides multiple antecedent, consequent, confidence, and support information is disclosed to facilitate better presentation of large quantities of complex association rules.

  4. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  5. Data mining for dummies

    CERN Document Server

    Brown, Meta S

    2014-01-01

    Delve into your data for the key to success Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business''s entire paradigm for a more successful outcome. Data Mining for Dummies shows you why it doesn''t take a data scientist to gain

  6. Data mining mobile devices

    CERN Document Server

    Mena, Jesus

    2013-01-01

    With today's consumers spending more time on their mobiles than on their PCs, new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire.Data Mining Mobile Devices defines the collection of machine-sensed environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users

  7. Factors Associated With Healthcare-Acquired Catheter-Associated Urinary Tract Infections: Analysis Using Multiple Data Sources and Data Mining Techniques.

    Science.gov (United States)

    Park, Jung In; Bliss, Donna Z; Chi, Chih-Lin; Delaney, Connie W; Westra, Bonnie L

    The purpose of this study was to identify factors associated with healthcare-acquired catheter-associated urinary tract infections (HA-CAUTIs) using multiple data sources and data mining techniques. Three data sets were integrated for analysis: electronic health record data from a university hospital in the Midwestern United States was combined with staffing and environmental data from the hospital's National Database of Nursing Quality Indicators and a list of patients with HA-CAUTIs. Three data mining techniques were used for identification of factors associated with HA-CAUTI: decision trees, logistic regression, and support vector machines. Fewer total nursing hours per patient-day, lower percentage of direct care RNs with specialty nursing certification, higher percentage of direct care RNs with associate's degree in nursing, and higher percentage of direct care RNs with BSN, MSN, or doctoral degree are associated with HA-CAUTI occurrence. The results also support the association of the following factors with HA-CAUTI identified by previous studies: female gender; older age (>50 years); longer length of stay; severe underlying disease; glucose lab results (>200 mg/dL); longer use of the catheter; and RN staffing. Additional findings from this study demonstrated that the presence of more nurses with specialty nursing certifications can reduce HA-CAUTI occurrence. While there may be valid reasons for leaving in a urinary catheter, findings show that having a catheter in for more than 48 hours contributes to HA-CAUTI occurrence. Finally, the findings suggest that more nursing hours per patient-day are related to better patient outcomes.

  8. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  9. Biomedical Data Mining

    NARCIS (Netherlands)

    Peek, N.; Combi, C.; Tucker, A.

    2009-01-01

    Objective: To introduce the special topic of Methods of Information in Medicine on data mining in biomedicine, with selected papers from two workshops on Intelligent Data Analysis in bioMedicine (IDAMAP) held in Verona (2006) and Amsterdam (2007). Methods: Defining the field of biomedical data

  10. Security Measures in Data Mining

    OpenAIRE

    Anish Gupta; Vimal Bibhu; Rashid Hussain

    2012-01-01

    Data mining is a technique to dig the data from the large databases for analysis and executive decision making. Security aspect is one of the measure requirement for data mining applications. In this paper we present security requirement measures for the data mining. We summarize the requirements of security for data mining in tabular format. The summarization is performed by the requirements with different aspects of security measure of data mining. The performances and outcomes are determin...

  11. Downscaling 250-m MODIS growing season NDVI based on multiple-date landsat images and data mining approaches

    Science.gov (United States)

    Gu, Yingxin; Wylie, Bruce K.

    2015-01-01

    The satellite-derived growing season time-integrated Normalized Difference Vegetation Index (GSN) has been used as a proxy for vegetation biomass productivity. The 250-m GSN data estimated from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors have been used for terrestrial ecosystem modeling and monitoring. High temporal resolution with a wide range of wavelengths make the MODIS land surface products robust and reliable. The long-term 30-m Landsat data provide spatial detailed information for characterizing human-scale processes and have been used for land cover and land change studies. The main goal of this study is to combine 250-m MODIS GSN and 30-m Landsat observations to generate a quality-improved high spatial resolution (30-m) GSN database. A rule-based piecewise regression GSN model based on MODIS and Landsat data was developed. Results show a strong correlation between predicted GSN and actual GSN (r = 0.97, average error = 0.026). The most important Landsat variables in the GSN model are Normalized Difference Vegetation Indices (NDVIs) in May and August. The derived MODIS-Landsat-based 30-m GSN map provides biophysical information for moderate-scale ecological features. This multiple sensor study retains the detailed seasonal dynamic information captured by MODIS and leverages the high-resolution information from Landsat, which will be useful for regional ecosystem studies.

  12. Data mining in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Ruxandra-Ştefania PETRE

    2012-10-01

    Full Text Available This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs.Data mining techniques and applications are very much needed in the cloud computing paradigm. The implementation of data mining techniques through Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage.

  13. Data Mining and Analysis

    Science.gov (United States)

    Samms, Kevin O.

    2015-01-01

    The Data Mining project seeks to bring the capability of data visualization to NASA anomaly and problem reporting systems for the purpose of improving data trending, evaluations, and analyses. Currently NASA systems are tailored to meet the specific needs of its organizations. This tailoring has led to a variety of nomenclatures and levels of annotation for procedures, parts, and anomalies making difficult the realization of the common causes for anomalies. Making significant observations and realizing the connection between these causes without a common way to view large data sets is difficult to impossible. In the first phase of the Data Mining project a portal was created to present a common visualization of normalized sensitive data to customers with the appropriate security access. The tool of the visualization itself was also developed and fine-tuned. In the second phase of the project we took on the difficult task of searching and analyzing the target data set for common causes between anomalies. In the final part of the second phase we have learned more about how much of the analysis work will be the job of the Data Mining team, how to perform that work, and how that work may be used by different customers in different ways. In this paper I detail how our perspective has changed after gaining more insight into how the customers wish to interact with the output and how that has changed the product.

  14. Organizational Data Mining

    Science.gov (United States)

    Nemati, Hamid R.; Barko, Christopher D.

    Many organizations today possess substantial quantities of business information but have very little real business knowledge. A recent survey of 450 business executives reported that managerial intuition and instinct are more prevalent than hard facts in driving organizational decisions. To reverse this trend, businesses of all sizes would be well advised to adopt Organizational Data Mining (ODM). ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage. ODM has helped many organizations optimize internal resource allocations while better understanding and responding to the needs of their customers. The fundamental aspects of ODM can be categorized into Artificial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT), with OT being the key distinction between ODM and Data Mining. In this chapter, we introduce ODM, explain its unique characteristics, and report on the current status of ODM research. Next we illustrate how several leading organizations have adopted ODM and are benefiting from it. Then we examine the evolution of ODM to the present day and conclude our chapter by contemplating ODM's challenging yet opportunistic future.

  15. Data mining and education.

    Science.gov (United States)

    Koedinger, Kenneth R; D'Mello, Sidney; McLaughlin, Elizabeth A; Pardos, Zachary A; Rosé, Carolyn P

    2015-01-01

    An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and metacognition on learning, and analysis of language data and collaborative learning. For example, we discuss (1) how different statistical assessment methods were used in a data mining competition to improve prediction of student responses to intelligent tutor tasks, (2) how better cognitive models can be discovered from data and used to improve instruction, (3) how data-driven models of student affect can be used to focus discussion in a dialog-based tutoring system, and (4) how machine learning techniques applied to discussion data can be used to produce automated agents that support student learning as they collaborate in a chat room or a discussion board. © 2015 John Wiley & Sons, Ltd.

  16. Data Mining SIAM Presentation

    Science.gov (United States)

    Srivastava, Ashok; McIntosh, Dawn; Castle, Pat; Pontikakis, Manos; Diev, Vesselin; Zane-Ulman, Brett; Turkov, Eugene; Akella, Ram; Xu, Zuobing; Kumaresan, Sakthi Preethi

    2006-01-01

    This viewgraph document describes the data mining system developed at NASA Ames. Many NASA programs have large numbers (and types) of problem reports.These free text reports are written by a number of different people, thus the emphasis and wording vary considerably With so much data to sift through, analysts (subject experts) need help identifying any possible safety issues or concerns and help them confirm that they haven't missed important problems. Unsupervised clustering is the initial step to accomplish this; We think we can go much farther, specifically, identify possible recurring anomalies. Recurring anomalies may be indicators of larger systemic problems. The requirement to identify these anomalies has led to the development of Recurring Anomaly Discovery System (ReADS).

  17. Data mining for bioinformatics applications

    CERN Document Server

    Zengyou, He

    2015-01-01

    Data Mining for Bioinformatics Applications provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems, including problem definition, data collection, data preprocessing, modeling, and validation. The text uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems, containing 45 bioinformatics problems that have been investigated in recent research. For each example, the entire data mining process is described, ranging from data preprocessing to modeling and result validation. Provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems Uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems Contains 45 bioinformatics problems that have been investigated in recent research.

  18. Data Mining Aplications in Livestock

    Directory of Open Access Journals (Sweden)

    Feyza ALEV ÇETİN

    2016-03-01

    Full Text Available Data mining provides discovering the required and applicable knowledge from very large amounts of information collected in one centre. Data mining has been used in the information industry and society. Although many methods of data mining has been used, these techniques has been remarkable in animal husbandry in recent years. For the solution of complex problems in animal husbandry many methods were discussed and developed. Brief information on data mining techniques such as k-means approach, k-nearest neighbor approach, multivariate adaptive regression function (MARS, naive Bayesian classifiers (NBC, artificial neural networks (ANN, support vector machines (SVM, decision trees are given in the study. Some data mining methods are presented and examples of the application of data mining in the field of animal husbandry in the world are provided with this study.

  19. Data mining applications in healthcare.

    Science.gov (United States)

    Koh, Hian Chye; Tan, Gerald

    2005-01-01

    Data mining has been used intensively and extensively by many organizations. In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Data mining applications can greatly benefit all parties involved in the healthcare industry. For example, data mining can help healthcare insurers detect fraud and abuse, healthcare organizations make customer relationship management decisions, physicians identify effective treatments and best practices, and patients receive better and more affordable healthcare services. The huge amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional methods. Data mining provides the methodology and technology to transform these mounds of data into useful information for decision making. This article explores data mining applications in healthcare. In particular, it discusses data mining and its applications within healthcare in major areas such as the evaluation of treatment effectiveness, management of healthcare, customer relationship management, and the detection of fraud and abuse. It also gives an illustrative example of a healthcare data mining application involving the identification of risk factors associated with the onset of diabetes. Finally, the article highlights the limitations of data mining and discusses some future directions.

  20. Finding Gold in Data Mining

    Science.gov (United States)

    Flaherty, Bill

    2013-01-01

    Data-mining systems provide a variety of opportunities for school district personnel to streamline operations and focus on student achievement. This article describes the value of data mining for school personnel, finance departments, teacher evaluations, and in the classroom. It suggests that much could be learned about district practices if one…

  1. Data Mining for Intrusion Detection

    Science.gov (United States)

    Singhal, Anoop; Jajodia, Sushil

    Data Mining Techniques have been successfully applied in many different fields including marketing, manufacturing, fraud detection and network management. Over the past years there is a lot of interest in security technologies such as intrusion detection, cryptography, authentication and firewalls. This chapter discusses the application of Data Mining techniques to computer security. Conclusions are drawn and directions for future research are suggested.

  2. Real world data mining applications

    CERN Document Server

    Abou-Nasr, Mahmoud; Stahlbock, Robert; Weiss, Gary M

    2014-01-01

    Data mining applications range from commercial to social domains, with novel applications appearing swiftly; for example, within the context of social networks. The expanding application sphere and social reach of advanced data mining raise pertinent issues of privacy and security. Present-day data mining is a progressive multidisciplinary endeavor. This inter- and multidisciplinary approach is well reflected within the field of information systems. The information systems research addresses software and hardware requirements for supporting computationally and data-intensive applications. Furthermore, it encompasses analyzing system and data aspects, and all manual or automated activities. In that respect, research at the interface of information systems and data mining has significant potential to produce actionable knowledge vital for corporate decision-making. The aim of the proposed volume is to provide a balanced treatment of the latest advances and developments in data mining; in particular, exploring s...

  3. Data preprocessing in data mining

    CERN Document Server

    García, Salvador; Herrera, Francisco

    2015-01-01

    Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying t...

  4. Privacy Preserving Distributed Data Mining

    Data.gov (United States)

    National Aeronautics and Space Administration — Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring...

  5. The handbook of data mining

    CERN Document Server

    Ye, Nong

    2003-01-01

    This bk is the 1st comprehensive one to feature systematic coverage of the concepts, techniques, examples, issues, software tools and future advancements of data mining. The demand for DM apps are increasing in indus, gov, & academia.

  6. Knowledge-sharing Behavior and Post-acquisition Integration Failure

    DEFF Research Database (Denmark)

    Gammelgaard, Jens; Husted, Kenneth; Michailova, Snejina

    2004-01-01

    AbstractNot achieving the anticipated synergy effects in the post-acquisition integration context is a serious causefor the high acquisition failure rate. While existing studies on failures of acquisitions exist fromeconomics, finance, strategy, organization theory, and human resources management......, this paper appliesinsights from the knowledge-sharing literature. The paper establishes a conceptual link between obstaclesin the post-acquisition integration processes and individual knowledge-sharing behavior as related toknowledge transmitters and knowledge receivers. We argue that such an angle offers...... important insights toexplaining the high failure rate in acquisitions.Descriptors: post-acquisition integration, acquisition failure, individual knowledge-sharing behavior...

  7. Learning data mining with R

    CERN Document Server

    Makhabel, Bater

    2015-01-01

    This book is intended for the budding data scientist or quantitative analyst with only a basic exposure to R and statistics. This book assumes familiarity with only the very basics of R, such as the main data types, simple functions, and how to move data around. No prior experience with data mining packages is necessary; however, you should have a basic understanding of data mining concepts and processes.

  8. Implications of Emerging Data Mining

    Science.gov (United States)

    Kulathuramaiyer, Narayanan; Maurer, Hermann

    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.

  9. Solar Data Mining at Georgia State University

    Science.gov (United States)

    Angryk, R.; Martens, P. C.; Schuh, M.; Aydin, B.; Kempton, D.; Banda, J.; Ma, R.; Naduvil-Vadukootu, S.; Akkineni, V.; Küçük, A.; Filali Boubrahimi, S.; Hamdi, S. M.

    2016-12-01

    In this talk we give an overview of research projects related to solar data analysis that are conducted at Georgia State University. We will provide update on multiple advances made by our research team on the analysis of image parameters, spatio-temporal patterns mining, temporal data analysis and our experiences with big, heterogeneous solar data visualization, analysis, processing and storage. We will talk about up-to-date data mining methodologies, and their importance for big data-driven solar physics research.

  10. Data mining methods and applications

    CERN Document Server

    Lawrence, Kenneth D; Klimberg, Ronald K

    2007-01-01

    With today's information explosion, many organizations are now able to access a wealth of valuable data. Unfortunately, most of these organizations find they are ill-equipped to organize this information, let alone put it to work for them. Gain a Competitive Advantage Employ data mining in research and forecasting Build models with data management tools and methodology optimization Gain sophisticated breakdowns and complex analysis through multivariate, evolutionary, and neural net methodsLearn how to classify data and maintain qualityTransform Data into Business Acumen Data Mining Methods and

  11. PROGRAMS WITH DATA MINING CAPABILITIES

    Directory of Open Access Journals (Sweden)

    Ciobanu Dumitru

    2012-03-01

    Full Text Available The fact that the Internet has become a commodity in the world has created a framework for anew economy. Traditional businesses migrate to this new environment that offers many features and options atrelatively low prices. However competitiveness is fierce and successful Internet business is tied to rigorous use of allavailable information. The information is often hidden in data and for their retrieval is necessary to use softwarecapable of applying data mining algorithms and techniques. In this paper we want to review some of the programswith data mining capabilities currently available in this area.We also propose some classifications of this softwareto assist those who wish to use such software.

  12. Mastering SQL Server 2014 data mining

    CERN Document Server

    Bassan, Amarpreet Singh

    2014-01-01

    If you are a developer who is working on data mining for large companies and would like to enhance your knowledge of SQL Server Data Mining Suite, this book is for you. Whether you are brand new to data mining or are a seasoned expert, you will be able to master the skills needed to build a data mining solution.

  13. Data Mining Tools in Science Education

    OpenAIRE

    Premysl Zaskodny

    2012-01-01

    The main principle of paper is Data Mining in Science Education (DMSE) as Problem Solving. The main goal of paper is consisting in Delimitation of Complex Data Mining Tool and Partial Data Mining Tool of DMSE. The procedure of paper is consisting of Data Preprocessing in Science Education, Data Processing in Science Education, Description of Curricular Process as Complex Data Mining Tool (CP-DMSE), Description of Analytical Synthetic Modeling as Partial Data Mining Tool (ASM-DMSE) and finally...

  14. Data mining concepts and techniques

    CERN Document Server

    Han, Jiawei

    2005-01-01

    Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and app...

  15. Data preprocessing for data mining

    OpenAIRE

    Ren, Yifei

    2013-01-01

    People have increasing amounts data in the current prosperous information age. In order to improve competitive power and work efficiency, discovering knowledge from data is becoming more and more important. Data mining, as an emerging interdisciplinary applications field, plays a significant role in various trades’ and industries' decision making. However, it is known that original data is always dirty and not suitable for further analysis which have become a major obstacle of finding knowled...

  16. data mining in distributed database

    International Nuclear Information System (INIS)

    Ghunaim, A.A.A.

    2007-01-01

    as we march into the age of digital information, the collection and the storage of large quantities of data is increased, and the problem of data overload looms ominously ahead. it is estimated today that the volume of data stored by a company doubles every year but the amount of meaningful information is decreases rapidly. the ability to analyze and understand massive datasets lags far behind the ability to gather and store the data. the unbridled growth of data will inevitably lead to a situation in which it is increasingly difficult to access the desired information; it will always be like looking for a needle in a haystack, and where only the amount of hay will be growing all the time . so, a new generation of computational techniques and tools is required to analyze and understand the rapidly growing volumes of data . and, because the information technology (it) has become a strategic weapon in the modern life, it is needed to use a new decision support tools to be an international powerful competitor.data mining is one of these tools and its methods make it possible to extract decisive knowledge needed by an enterprise and it means that it concerned with inferring models from data , including statistical pattern recognition, applied statistics, machine learning , and neural networks. data mining is a tool for increasing productivity of people trying to build predictive models. data mining techniques have been applied successfully to several real world problem domains; but the application in the nuclear reactors field has only little attention . one of the main reasons, is the difficulty in obtaining the data sets

  17. Data mining utilizando redes neuronales

    OpenAIRE

    Ale, Juan María; Bot, Romina Laura

    2004-01-01

    Las Redes Neuronales son ampliamente utilizadas para tareas relacionadas con reconocimiento de patrones y clasificación. Aunque son clasificadores muy precisos, no son comúnmente utilizadas para Data Mining porque producen modelos de aprendizaje inexplicables. El algoritmo TREPAN extrae hipótesis explicables de una Red Neuronal entrenada. Las hipótesis producidas por el algoritmo se representan con un árbol de decisión que aproxima a la red. Los árboles de decisión extraídos por TREPAN no pue...

  18. DATA MINING IN SPORTS BETTING

    Directory of Open Access Journals (Sweden)

    Cristian Georgescu

    2013-12-01

    Full Text Available n this paper, we have made a brief analysis on how to make decisions in betting on European football with the help of data mining techniques. Whether you refer to betting a few days in advance of the sporting event or live betting, both options have been taken into consideration. By using a clustering algorithm for analyzing both the database containing events from football matches and the odds given by bookmakers, we have obtained graphs indicating the probabilities associated with analyzed events. Given the purely informative aspect of the current paper, we have only analyzed the number of corners from a match.

  19. Contrast data mining concepts, algorithms, and applications

    CERN Document Server

    Dong, Guozhu

    2012-01-01

    A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies

  20. Statistically significant relational data mining :

    Energy Technology Data Exchange (ETDEWEB)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  1. Industry Relatedness and Post-Acquisition Innovative Performance

    DEFF Research Database (Denmark)

    Cefis, Elena; Marsili, Orietta; Rigamonti, Damiana

    2015-01-01

    This paper examines how characteristics of acquiring and acquired firms influence the curvilinear (inverted U-shaped) relationship between relatedness and post-acquisition innovative performance. Using a relatedness index based on industry co-occurrence in a sample of 1,736 Dutch acquisitions, we...... find that acquirer's internal R&D and acquisition experience, and the small size of acquired firms, help to reach a balance between exploration of novelty and exploitation of synergies in unrelated acquisitions, and to achieve higher post-acquisition performance. However, while the acquirer's R......&D increases flexibility in the acquisition process in presence of deviations from the optimal level of relatedness, acquisition experience may enhance rigidities....

  2. Data mining: childhood injury control and beyond.

    Science.gov (United States)

    Tepas, Joseph J

    2009-08-01

    Data mining is defined as the automatic extraction of useful, often previously unknown information from large databases or data sets. It has become a major part of modern life and is extensively used in industry, banking, government, and health care delivery. The process requires a data collection system that integrates input from multiple sources containing critical elements that define outcomes of interest. Appropriately designed data mining processes identify and adjust for confounding variables. The statistical modeling used to manipulate accumulated data may involve any number of techniques. As predicted results are periodically analyzed against those observed, the model is consistently refined to optimize precision and accuracy. Whether applying integrated sources of clinical data to inferential probabilistic prediction of risk of ventilator-associated pneumonia or population surveillance for signs of bioterrorism, it is essential that modern health care providers have at least a rudimentary understanding of what the concept means, how it basically works, and what it means to current and future health care.

  3. Visual cues for data mining

    Science.gov (United States)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  4. Data Mining Solutions for the Business Environment

    Directory of Open Access Journals (Sweden)

    Ruxandra-Stefania PETRE

    2014-02-01

    Full Text Available Over the past years, data mining became a matter of considerable importance due to the large amounts of data available in the applications belonging to various domains. Data mining, a dynamic and fast-expanding field, that applies advanced data analysis techniques, from statistics, machine learning, database systems or artificial intelligence, in order to discover relevant patterns, trends and relations contained within the data, information impossible to observe using other techniques. The paper focuses on presenting the applications of data mining in the business environment. It contains a general overview of data mining, providing a definition of the concept, enumerating six primary data mining techniques and mentioning the main fields for which data mining can be applied. The paper also presents the main business areas which can benefit from the use of data mining tools, along with their use cases: retail, banking and insurance. Also the main commercially available data mining tools and their key features are presented within the paper. Besides the analysis of data mining and the business areas that can successfully apply it, the paper presents the main features of a data mining solution that can be applied for the business environment and the architecture, with its main components, for the solution, that would help improve customer experiences and decision-making

  5. Post-acquisition Integration as Sensemaking: Glimpses of Ambiguity, Confusion, Hypocrisy, and Politicization

    OpenAIRE

    Vaara, Eero

    2003-01-01

    Though many studies have examined post-acquisition integration challenges, they have mainly focused on rationalistic explanations for the difficulties encountered in post-acquisition integration. There remains little knowledge of how the ‘irrational’ features of post-acquisition decision-making may impede organizational integration. This study attempts to bridge that gap by examining post-acquisition decision-making from a sensemaking perspective. The paper presents an in-depth analysis of a ...

  6. Applied data mining for business and industry

    CERN Document Server

    Giudici, Paolo

    2009-01-01

    The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...

  7. Applications of Data Mining in Higher Education

    OpenAIRE

    Monika Goyal; Rajan Vohra

    2012-01-01

    Data analysis plays an important role for decision support irrespective of type of industry like any manufacturing unit and educations system. There are many domains in which data mining techniques plays an important role. This paper proposes the use of data mining techniques to improve the efficiency of higher education institution. If data mining techniques such as clustering, decision tree and association are applied to higher education processes, it would help to improve students performa...

  8. Data-Mining Research in Education

    OpenAIRE

    Cheng, Jiechao

    2017-01-01

    As an interdisciplinary discipline, data mining (DM) is popular in education area especially when examining students' learning performances. It focuses on analyzing educational related data to develop models for improving learners' learning experiences and enhancing institutional effectiveness. Therefore, DM does help education institutions provide high-quality education for its learners. Applying data mining in education also known as educational data mining (EDM), which enables to better un...

  9. Data mining in pharma sector: benefits.

    Science.gov (United States)

    Ranjan, Jayanthi

    2009-01-01

    The amount of data getting generated in any sector at present is enormous. The information flow in the pharma industry is huge. Pharma firms are progressing into increased technology-enabled products and services. Data mining, which is knowledge discovery from large sets of data, helps pharma firms to discover patterns in improving the quality of drug discovery and delivery methods. The paper aims to present how data mining is useful in the pharma industry, how its techniques can yield good results in pharma sector, and to show how data mining can really enhance in making decisions using pharmaceutical data. This conceptual paper is written based on secondary study, research and observations from magazines, reports and notes. The author has listed the types of patterns that can be discovered using data mining in pharma data. The paper shows how data mining is useful in the pharma industry and how its techniques can yield good results in pharma sector. Although much work can be produced for discovering knowledge in pharma data using data mining, the paper is limited to conceptualizing the ideas and view points at this stage; future work may include applying data mining techniques to pharma data based on primary research using the available, famous significant data mining tools. Research papers and conceptual papers related to data mining in Pharma industry are rare; this is the motivation for the paper.

  10. Data Mining for Anomaly Detection

    Science.gov (United States)

    Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

    2013-01-01

    The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.

  11. The Hazards of Data Mining in Healthcare.

    Science.gov (United States)

    Househ, Mowafa; Aldosari, Bakheet

    2017-01-01

    From the mid-1990s, data mining methods have been used to explore and find patterns and relationships in healthcare data. During the 1990s and early 2000's, data mining was a topic of great interest to healthcare researchers, as data mining showed some promise in the use of its predictive techniques to help model the healthcare system and improve the delivery of healthcare services. However, it was soon discovered that mining healthcare data had many challenges relating to the veracity of healthcare data and limitations around predictive modelling leading to failures of data mining projects. As the Big Data movement has gained momentum over the past few years, there has been a reemergence of interest in the use of data mining techniques and methods to analyze healthcare generated Big Data. Much has been written on the positive impacts of data mining on healthcare practice relating to issues of best practice, fraud detection, chronic disease management, and general healthcare decision making. Little has been written about the limitations and challenges of data mining use in healthcare. In this review paper, we explore some of the limitations and challenges in the use of data mining techniques in healthcare. Our results show that the limitations of data mining in healthcare include reliability of medical data, data sharing between healthcare organizations, inappropriate modelling leading to inaccurate predictions. We conclude that there are many pitfalls in the use of data mining in healthcare and more work is needed to show evidence of its utility in facilitating healthcare decision-making for healthcare providers, managers, and policy makers and more evidence is needed on data mining's overall impact on healthcare services and patient care.

  12. Virtual Observatories, Data Mining, and Astroinformatics

    Science.gov (United States)

    Borne, Kirk

    The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential

  13. Collaborative Data Mining Tool for Education

    Science.gov (United States)

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

    2009-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…

  14. A survey of temporal data mining

    Indian Academy of Sciences (India)

    Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams ...

  15. Educational data mining and learning analytics

    OpenAIRE

    Vera Hernández, Joan Carles

    2017-01-01

    Treball basat en Educational Data Mining & Learning Analitics d'anàlisi de la matriculació dels alumnes i el seu impacte sobre la decisió de tornar-se a matricular. Trabajo basado en Educational Data Mining & Learning Analytics análisis de la matriculación de los alumnos y su impacto sobre la decisión de volverse a matricular. Work based on Educational Data Mining & Learning Analytics analysis of student enrollment and its impact on the decision to re-enroll.

  16. Data Mining Tools for Malware Detection

    CERN Document Server

    Masud, Mehedy; Thuraisingham, Bhavani; Andreasson, Kim J

    2011-01-01

    Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets. The authors describe the systems they have designed and devel

  17. Data Mining and Statistics for Decision Making

    CERN Document Server

    Tufféry, Stéphane

    2011-01-01

    Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin

  18. Cognition and Knowledge Sharing in Post-acquisition Integration

    DEFF Research Database (Denmark)

    Jaura, Manya; Michailova, Snejina

    2014-01-01

    conducted with ten respondents in four Indian IT companies that have acquired firms abroad. Findings: The authors find evidence for supporting the negative effect of in- and out-groups differentiation and the positive effect of interpersonal interaction on knowledge sharing among employees of the acquired...... of organisational objectives in a post-acquisition context. Managers should understand that the knowledge their employees possess is a strategic asset, and therefore how they use it is influential in attaining organisational goals in general, and acquisition integration objectives in particular. The creation...... of task- and project-related communities or groups can help in establishing a shared organisational identity, especially after the turbulent event of one company acquiring another one. The creation of communities or groups where socialisation is encouraged can lead to employees interacting with one...

  19. IT Data Mining Tool Uses in Aerospace

    Science.gov (United States)

    Monroe, Gilena A.; Freeman, Kenneth; Jones, Kevin L.

    2012-01-01

    Data mining has a broad spectrum of uses throughout the realms of aerospace and information technology. Each of these areas has useful methods for processing, distributing, and storing its corresponding data. This paper focuses on ways to leverage the data mining tools and resources used in NASA's information technology area to meet the similar data mining needs of aviation and aerospace domains. This paper details the searching, alerting, reporting, and application functionalities of the Splunk system, used by NASA's Security Operations Center (SOC), and their potential shared solutions to address aircraft and spacecraft flight and ground systems data mining requirements. This paper also touches on capacity and security requirements when addressing sizeable amounts of data across a large data infrastructure.

  20. A survey of temporal data mining

    Indian Academy of Sciences (India)

    other subtle relationships in the data using a combination of techniques from ... stamped list of items bought by customers lends itself to data mining analysis that ...... Frequent episode mining can be used here as part of an alarm management.

  1. DATA MINING THE GALAXY ZOO MERGERS

    Data.gov (United States)

    National Aeronautics and Space Administration — DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the...

  2. Data Mining and Homeland Security: An Overview

    National Research Council Canada - National Science Library

    Seifert, Jeffrey W

    2008-01-01

    .... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...

  3. Data Mining and Homeland Security: An Overview

    National Research Council Canada - National Science Library

    Seifert, Jeffrey W

    2007-01-01

    .... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...

  4. Data Mining and Homeland Security: An Overview

    National Research Council Canada - National Science Library

    Seifert, Jeffrey W

    2006-01-01

    .... Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets...

  5. Open-source tools for data mining.

    Science.gov (United States)

    Zupan, Blaz; Demsar, Janez

    2008-03-01

    With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.

  6. Data mining and business analytics with R

    CERN Document Server

    Ledolter, Johannes

    2013-01-01

    Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification. Highlighting both underlying concepts and practical computational skills, Data Mining

  7. Challenges in computational statistics and data mining

    CERN Document Server

    Mielniczuk, Jan

    2016-01-01

    This volume contains nineteen research papers belonging to the areas of computational statistics, data mining, and their applications. Those papers, all written specifically for this volume, are their authors’ contributions to honour and celebrate Professor Jacek Koronacki on the occcasion of his 70th birthday. The book’s related and often interconnected topics, represent Jacek Koronacki’s research interests and their evolution. They also clearly indicate how close the areas of computational statistics and data mining are.

  8. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  9. Data Mining Solutions for the Business Environment

    OpenAIRE

    Ruxandra-Stefania PETRE

    2013-01-01

    Over the past years, data mining became a matter of considerable importance due to the large amounts of data available in the applications belonging to various domains. Data mining, a dynamic and fast-expanding field, that applies advanced data analysis techniques, from statistics, machine learning, database systems or artificial intelligence, in order to discover relevant patterns, trends and relations contained within the data, information impossible to observe using other techniques. The p...

  10. Using Data Mining for Wine Quality Assessment

    Science.gov (United States)

    Cortez, Paulo; Teixeira, Juliana; Cerdeira, António; Almeida, Fernando; Matos, Telmo; Reis, José

    Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.

  11. Data Mining and Machine Learning in Astronomy

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  12. Review of Data Mining Techniques for Churn Prediction in Telecom

    Directory of Open Access Journals (Sweden)

    Vishal Mahajan

    2015-12-01

    service. This data can be usefully mined for churn analysis and prediction. Significant research had been undertaken by researchers worldwide to understand the data mining practices that can be used for predicting customer churn. This paper provides a review of around 100 recent journal articles starting from year 2000 to present the various data mining techniques used in multiple customer based churn models. It then summarizes the existing telecom literature by highlighting the sample size used, churn variables employed and the findings of different DM techniques. Finally, we list the most popular techniques for churn prediction in telecom as decision trees, regression analysis and clustering, thereby providing a roadmap to new researchers to build upon novel churn management models.

  13. Data Mining Techniques for Customer Relationship Management

    Science.gov (United States)

    Guo, Feng; Qin, Huilin

    2017-10-01

    Data mining have made customer relationship management (CRM) a new area where firms can gain a competitive advantage, and play a key role in the firms’ management decision. In this paper, we first analyze the value and application fields of data mining techniques for CRM, and further explore how data mining applied to Customer churn analysis. A new business culture is developing today. The conventional production centered and sales purposed market strategy is gradually shifting to customer centered and service purposed. Customers’ value orientation is increasingly affecting the firms’. And customer resource has become one of the most important strategic resources. Therefore, understanding customers’ needs and discriminating the most contributed customers has become the driving force of most modern business.

  14. On-board Data Mining

    Science.gov (United States)

    Tanner, Steve; Stein, Cara; Graves, Sara J.

    Networks of remote sensors are becoming more common as technology improves and costs decline. In the past, a remote sensor was usually a device that collected data to be retrieved at a later time by some other mechanism. This collected data were usually processed well after the fact at a computer greatly removed from the in situ sensing location. This has begun to change as sensor technology, on-board processing, and network communication capabilities have increased and their prices have dropped. There has been an explosion in the number of sensors and sensing devices, not just around the world, but literally throughout the solar system. These sensors are not only becoming vastly more sophisticated, accurate, and detailed in the data they gather but they are also becoming cheaper, lighter, and smaller. At the same time, engineers have developed improved methods to embed computing systems, memory, storage, and communication capabilities into the platforms that host these sensors. Now, it is not unusual to see large networks of sensors working in cooperation with one another. Nor does it seem strange to see the autonomous operation of sensorbased systems, from space-based satellites to smart vacuum cleaners that keep our homes clean and robotic toys that help to entertain and educate our children. But access to sensor data and computing power is only part of the story. For all the power of these systems, there are still substantial limits to what they can accomplish. These include the well-known limits to current Artificial Intelligence capabilities and our limited ability to program the abstract concepts, goals, and improvisation needed for fully autonomous systems. But it also includes much more basic engineering problems such as lack of adequate power, communications bandwidth, and memory, as well as problems with the geolocation and real-time georeferencing required to integrate data from multiple sensors to be used together.

  15. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  16. Supporting Solar Physics Research via Data Mining

    Science.gov (United States)

    Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

    2012-05-01

    In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.

  17. Data mining for social network data

    CERN Document Server

    Memon, Nasrullah; Hicks, David L; Chen, Hsinchun

    2010-01-01

    Driven by counter-terrorism efforts, marketing analysis and an explosion in online social networking in recent years, data mining has moved to the forefront of information science. This proposed Special Issue on ""Data Mining for Social Network Data"" will present a broad range of recent studies in social networking analysis. It will focus on emerging trends and needs in discovery and analysis of communities, solitary and social activities, and activities in open fora, and commercial sites as well. It will also look at network modeling, infrastructure construction, dynamic growth and evolution

  18. Quantification of Operational Risk Using A Data Mining

    Science.gov (United States)

    Perera, J. Sebastian

    1999-01-01

    What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process

  19. Tools for Educational Data Mining: A Review

    Science.gov (United States)

    Slater, Stefan; Joksimovic, Srecko; Kovanovic, Vitomir; Baker, Ryan S.; Gasevic, Dragan

    2017-01-01

    In recent years, a wide array of tools have emerged for the purposes of conducting educational data mining (EDM) and/or learning analytics (LA) research. In this article, we hope to highlight some of the most widely used, most accessible, and most powerful tools available for the researcher interested in conducting EDM/LA research. We will…

  20. Educational Data Mining Acceptance among Undergraduate Students

    Science.gov (United States)

    Wook, Muslihah; Yusof, Zawiyah M.; Nazri, Mohd Zakree Ahmad

    2017-01-01

    The acceptance of Educational Data Mining (EDM) technology is on the rise due to, its ability to extract new knowledge from large amounts of students' data. This knowledge is important for educational stakeholders, such as policy makers, educators, and students themselves to enhance efficiency and achievements. However, previous studies on EDM…

  1. Data Mining Gets Traction in Education

    Science.gov (United States)

    Sparks, Sarah D.

    2011-01-01

    The new and rapidly growing field of educational data mining is using the chaff from data collected through normal school activities to explore learning in more detail than ever, and researchers say the day when educators can make use of Amazon.com-like feedback on student learning behaviors may be closer than most people think. Educational data…

  2. Highly Robust Methods in Data Mining

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2013-01-01

    Roč. 8, č. 1 (2013), s. 9-24 ISSN 1452-4864 Institutional support: RVO:67985807 Keywords : data mining * robust statistics * high-dimensional data * cluster analysis * logistic regression * neural networks Subject RIV: BB - Applied Statistics, Operational Research

  3. Engaging Business Students with Data Mining

    Science.gov (United States)

    Brandon, Dan

    2016-01-01

    The Economist calls it "a golden vein", and many business experts now say it is the new science of winning. Business and technologists have many names for this new science, "business intelligence" (BI), " data analytics," and "data mining" are among the most common. The job market for people skilled in this…

  4. Traffic Flow Management: Data Mining Update

    Science.gov (United States)

    Grabbe, Shon R.

    2012-01-01

    This presentation provides an update on recent data mining efforts that have been designed to (1) identify like/similar days in the national airspace system, (2) cluster/aggregate national-level rerouting data and (3) apply machine learning techniques to predict when Ground Delay Programs are required at a weather-impacted airport

  5. Comparative genomics using data mining tools

    Indian Academy of Sciences (India)

    We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and ...

  6. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.

    2008-01-01

    We present a system towards the integration of data mining into relational databases. To this end, a relational database model is proposed, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules and decision

  7. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Nijssen, S.; De Raedt, L.

    2007-01-01

    We propose a relational database model towards the integration of data mining into relational database systems, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules, decision trees and clusterings, can be

  8. Integrating Data Mining Techniques into Telemedicine Systems

    Directory of Open Access Journals (Sweden)

    Mihaela GHEORGHE

    2014-01-01

    Full Text Available The medical system is facing a wide range of challenges nowadays due to changes that are taking place in the global healthcare systems. These challenges are represented mostly by economic constraints (spiraling costs, financial issues, but also, by the increased emphasis on accountability and transparency, changes that were made in the education field, the fact that the biomedical research keeps growing in what concerns the complexities of the specific studies etc. Also the new partnerships that were made in medical care systems and the great advances in IT industry suggest that a predominant paradigm shift is occurring. This needs a focus on interaction, collaboration and increased sharing of information and knowledge, all of these may is in turn be leading healthcare organizations to embrace the techniques of data mining in order to create and sustain optimal healthcare outcomes. Data mining is a domain of great importance nowadays as it provides advanced data analysis techniques for extracting the knowledge from the huge volumes of data collected and stored by every system of a daily basis. In the healthcare organizations data mining can provide valuable information for patient's diagnosis and treatment planning, customer relationship management, organization resources management or fraud detection. In this article we focus on describing the importance of data mining techniques and systems for healthcare organizations with a focus on developing and implementing telemedicine solution in order to improve the healthcare services provided to the patients. We provide architecture for integrating data mining techniques into telemedicine systems and also offer an overview on understanding and improving the implemented solution by using Business Process Management methods.

  9. Open data mining for Taiwan's dengue epidemic.

    Science.gov (United States)

    Wu, ChienHsing; Kao, Shu-Chen; Shih, Chia-Hung; Kan, Meng-Hsuan

    2018-07-01

    By using a quantitative approach, this study examines the applicability of data mining technique to discover knowledge from open data related to Taiwan's dengue epidemic. We compare results when Google trend data are included or excluded. Data sources are government open data, climate data, and Google trend data. Research findings from analysis of 70,914 cases are obtained. Location and time (month) in open data show the highest classification power followed by climate variables (temperature and humidity), whereas gender and age show the lowest values. Both prediction accuracy and simplicity decrease when Google trends are considered (respectively 0.94 and 0.37, compared to 0.96 and 0.46). The article demonstrates the value of open data mining in the context of public health care. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Data Mining Smart Energy Time Series

    Directory of Open Access Journals (Sweden)

    Janina POPEANGA

    2015-07-01

    Full Text Available With the advent of smart metering technology the amount of energy data will increase significantly and utilities industry will have to face another big challenge - to find relationships within time-series data and even more - to analyze such huge numbers of time series to find useful patterns and trends with fast or even real-time response. This study makes a small review of the literature in the field, trying to demonstrate how essential is the application of data mining techniques in the time series to make the best use of this large quantity of data, despite all the difficulties. Also, the most important Time Series Data Mining techniques are presented, highlighting their applicability in the energy domain.

  11. Data Mining Supercomputing with SAS JMP® Genomics

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2011-02-01

    Full Text Available JMP® Genomics is statistical discovery software that can uncover meaningful patterns in high-throughput genomics and proteomics data. JMP® Genomics is designed for biologists, biostatisticians, statistical geneticists, and those engaged in analyzing the vast stores of data that are common in genomic research (SAS, 2009. Data mining was performed using JMP® Genomics on the two collections of microarray databases available from National Center for Biotechnology Information (NCBI for lung cancer and breast cancer. The Gene Expression Omnibus (GEO of NCBI serves as a public repository for a wide range of highthroughput experimental data, including the two collections of lung cancer and breast cancer that were used for this research. The results for applying data mining using software JMP® Genomics are shown in this paper with numerous screen shots.

  12. Data Mining Thesis Topics in Finland

    OpenAIRE

    Bajo Rouvinen, Ari

    2017-01-01

    The Theseus open repository contains metadata about more than 100,000 thesis publications from the different universities of applied sciences in Finland. Different data mining techniques were applied to the Theseus dataset to build a web application to explore thesis topics and degree programmes using different libraries in Python and JavaScript. Thesis topics were extracted from manually annotated keywords by the authors and curated subjects by the librarians. During the project, the quality...

  13. Data mining teaching throughout cards game competition

    OpenAIRE

    Antoñanzas-Torres, Javier; Urraca, Ruben; Sodupe-Ortega, Enrique; Martínez-de-Pison, Francisco; Pernía-Espinoza, Alpha

    2015-01-01

    [EN] Data-mining techniques and statistical metrics learning can be complicated because of the complexity and overwhelming nature of this field. In this paper a class competition to improve learning of designing Decision Support Systems (DSS) by playing a classic cards game named "Copo" is proposed. The fact that this game is based on a probabilistic problem and that different solutions can be obtained represents a very typical kind of problem in the field of engineering and compu...

  14. A Data Mining Approach to Intelligence Operations

    DEFF Research Database (Denmark)

    Memon, Nasrullah; Hicks, David; Harkiolakis, Nicholas

    2008-01-01

    agencies.   An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain.  Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist...... with the investigation and analysis of terrorist organizations.  The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved....

  15. Data Mining in Institutional Economics Tasks

    Science.gov (United States)

    Kirilyuk, Igor; Kuznetsova, Anna; Senko, Oleg

    2018-02-01

    The paper discusses problems associated with the use of data mining tools to study discrepancies between countries with different types of institutional matrices by variety of potential explanatory variables: climate, economic or infrastructure indicators. An approach is presented which is based on the search of statistically valid regularities describing the dependence of the institutional type on a single variable or a pair of variables. Examples of regularities are given.

  16. Educational data mining applications and trends

    CERN Document Server

    2014-01-01

    This book is devoted to the Educational Data Mining arena. It highlights works that show relevant proposals, developments, and achievements that shape trends and inspire future research.  After a rigorous revision process sixteen manuscripts were accepted and organized into four parts as follows: ·     Profile: The first part embraces three chapters oriented to: 1) describe the nature of educational data mining (EDM); 2) describe how to pre-process raw data to facilitate data mining (DM); 3) explain how EDM supports government policies to enhance education. ·     Student modeling: The second part contains five chapters concerned with: 4) explore the factors having an impact on the students academic success; 5) detect student's personality and behaviors in an educational game; 6) predict students performance to adjust content and strategies; 7) identify students who will most benefit from tutor support; 8) hypothesize the student answer correctness based on eye metrics and mouse click. ·     As...

  17. Spatiotemporal Data Mining: A Computational Perspective

    Directory of Open Access Journals (Sweden)

    Shashi Shekhar

    2015-10-01

    Full Text Available Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal databases. It has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. The complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns. In this survey, we review recent computational techniques and tools in spatiotemporal data mining, focusing on several major pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection. Compared with other surveys in the literature, this paper emphasizes the statistical foundations of spatiotemporal data mining and provides comprehensive coverage of computational approaches for various pattern families. ISPRS Int. J. Geo-Inf. 2015, 4 2307 We also list popular software tools for spatiotemporal data analysis. The survey concludes with a look at future research needs.

  18. A Data Mining Approach for Cardiovascular Diagnosis

    Directory of Open Access Journals (Sweden)

    Pereira Joana

    2017-12-01

    Full Text Available The large amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analysed by traditional methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complex data. In the healthcare industry specifically, data mining can be used to decrease costs by increasing efficiency, improve patient quality of life, and perhaps most importantly, save the lives of more patients. The main goal of this project is to apply data mining techniques in order to make possible the prediction of the degree of disability that patients will present when they leave hospitalization. The clinical data that will compose the data set was obtained from one single hospital and contains information about patients who were hospitalized in Cardio Vascular Disease’s (CVD unit in 2016 for having suffered a cardiovascular accident. To develop this project, it will be used the Waikato Environment for Knowledge Analysis (WEKA machine learning Workbench since this one allows users to quickly try out and compare different machine learning methods on new data sets

  19. Application of data mining in performance measures

    Science.gov (United States)

    Chan, Michael F. S.; Chung, Walter W.; Wong, Tai Sun

    2001-10-01

    This paper proposes a structured framework for exploiting data mining application for performance measures. The context is set in an airline company is illustrated for the use of such framework. The framework takes in consideration of how a knowledge worker interacts with performance information at the enterprise level to support them to make informed decision in managing the effectiveness of operations. A case study of applying data mining technology for performance data in an airline company is illustrated. The use of performance measures is specifically applied to assist in the aircraft delay management process. The increasingly dispersed and complex operations of airline operation put much strain on the part of knowledge worker in using search, acquiring and analyzing information to manage performance. One major problem faced with knowledge workers is the identification of root causes of performance deficiency. The large amount of factors involved in the analyze the root causes can be time consuming and the objective of applying data mining technology is to reduce the time and resources needed for such process. The increasing market competition for better performance management in various industries gives rises to need of the intelligent use of data. Because of this, the framework proposed here is very much generalizable to industries such as manufacturing. It could assist knowledge workers who are constantly looking for ways to improve operation effectiveness through new initiatives and the effort is required to be quickly done to gain competitive advantage in the marketplace.

  20. Time Dependent Data Mining in RAVEN

    Energy Technology Data Exchange (ETDEWEB)

    Cogliati, Joshua Joseph [Idaho National Lab. (INL), Idaho Falls, ID (United States); Chen, Jun [Idaho National Lab. (INL), Idaho Falls, ID (United States); Patel, Japan Ketan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Talbot, Paul William [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2016-09-01

    RAVEN is a generic software framework to perform parametric and probabilistic analysis based on the response of complex system codes. The goal of this type of analyses is to understand the response of such systems in particular with respect their probabilistic behavior, to understand their predictability and drivers or lack of thereof. Data mining capabilities are the cornerstones to perform such deep learning of system responses. For this reason static data mining capabilities were added last fiscal year (FY 15). In real applications, when dealing with complex multi-scale, multi-physics systems it seems natural that, during transients, the relevance of the different scales, and physics, would evolve over time. For these reasons the data mining capabilities have been extended allowing their application over time. In this writing it is reported a description of the new RAVEN capabilities implemented with several simple analytical tests to explain their application and highlight the proper implementation. The report concludes with the application of those newly implemented capabilities to the analysis of a simulation performed with the Bison code.

  1. Privacy-Preserving Data Mining of Medical Data Using Data Separation-Based Techniques

    Directory of Open Access Journals (Sweden)

    Gang Kou

    2007-08-01

    Full Text Available Data mining is concerned with the extraction of useful knowledge from various types of data. Medical data mining has been a popular data mining topic of late. Compared with other data mining areas, medical data mining has some unique characteristics. Because medical files are related to human subjects, privacy concerns are taken more seriously than other data mining tasks. This paper applied data separation-based techniques to preserve privacy in classification of medical data. We take two approaches to protect privacy: one approach is to vertically partition the medical data and mine these partitioned data at multiple sites; the other approach is to horizontally split data across multiple sites. In the vertical partition approach, each site uses a portion of the attributes to compute its results, and the distributed results are assembled at a central trusted party using a majority-vote ensemble method. In the horizontal partition approach, data are distributed among several sites. Each site computes its own data, and a central trusted party is responsible to integrate these results. We implement these two approaches using medical datasets from UCI KDD archive and report the experimental results.

  2. Uncertainty modeling for data mining a label semantics approach

    CERN Document Server

    Qin, Zengchang

    2014-01-01

    Outlining a new research direction in fuzzy set theory applied to data mining, this volume proposes a number of new data mining algorithms and includes dozens of figures and illustrations that help the reader grasp the complexities of the concepts.

  3. Data Mining Based on Cloud-Computing Technology

    Directory of Open Access Journals (Sweden)

    Ren Ying

    2016-01-01

    Full Text Available There are performance bottlenecks and scalability problems when traditional data-mining system is used in cloud computing. In this paper, we present a data-mining platform based on cloud computing. Compared with a traditional data mining system, this platform is highly scalable, has massive data processing capacities, is service-oriented, and has low hardware cost. This platform can support the design and applications of a wide range of distributed data-mining systems.

  4. A Survey of Educational Data-Mining Research

    Science.gov (United States)

    Huebner, Richard A.

    2013-01-01

    Educational data mining (EDM) is an emerging discipline that focuses on applying data mining tools and techniques to educationally related data. The discipline focuses on analyzing educational data to develop models for improving learning experiences and improving institutional effectiveness. A literature review on educational data mining topics…

  5. Using Data Mining to Teach Applied Statistics and Correlation

    Science.gov (United States)

    Hartnett, Jessica L.

    2016-01-01

    This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…

  6. 76 FR 14637 - State Medicaid Fraud Control Units; Data Mining

    Science.gov (United States)

    2011-03-17

    ...] State Medicaid Fraud Control Units; Data Mining AGENCY: Office of Inspector General (OIG), HHS. ACTION... and analyzing State Medicaid claims data, known as data mining. To support and modernize MFCU efforts... (FFP) in the costs of defined data mining activities under specified conditions. In addition, we...

  7. On data mining in context : cases, fusion and evaluation

    NARCIS (Netherlands)

    Putten, Petrus Wilhelmus Henricus van der

    2010-01-01

    Data mining can be seen as a process, with modeling as the core step. However, other steps such as planning, data preparation, evaluation and deployment are of key importance for applications. This thesis studies data mining in the context of these other steps with the goal of improving data mining

  8. Data Mining and Knowledge Management in Higher Education -Potential Applications.

    Science.gov (United States)

    Luan, Jing

    This paper introduces a new decision support tool, data mining, in the context of knowledge management. The most striking features of data mining techniques are clustering and prediction. The clustering aspect of data mining offers comprehensive characteristics analysis of students, while the predicting function estimates the likelihood for a…

  9. Data mining, mining data : energy consumption modelling

    Energy Technology Data Exchange (ETDEWEB)

    Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

    2007-09-15

    Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.

  10. Academic Performance: An Approach From Data Mining

    Directory of Open Access Journals (Sweden)

    David L. La Red Martinez

    2012-02-01

    Full Text Available The relatively low% of students promoted and regularized in Operating Systems Course of the LSI (Bachelor’s Degree in Information Systems of FaCENA (Faculty of Sciences and Natural Surveying - Facultad de Ciencias Exactas, Naturales y Agrimensura of UNNE (academic success, prompted this work, whose objective is to determine the variables that affect the academic performance, whereas the final status of the student according to the Res. 185/03 CD (scheme for evaluation and promotion: promoted, regular or free1. The variables considered are: status of the student, educational level of parents, secondary education, socio-economic level, and others. Data warehouse (Data Warehouses: DW and data mining (Data Mining: DM techniques were used to search pro.les of students and determine success or failure academic potential situations. Classifications through techniques of clustering according to different criteria have become. Some criteria were the following: mining of classification according to academic program, according to final status of the student, according to importance given to the study, mining of demographic clustering and Kohonen clustering according to final status of the student. Were conducted statistics of partition, detail of partitions, details of clusters, detail of fields and frequency of fields, overall quality of each process and quality detailed (precision, classification, reliability, arrays of confusion, diagrams of gain / elevation, trees, distribution of nodes, of importance of fields, correspondence tables of fields and statistics of cluster. Once certain profiles of students with low academic performance, it may address actions aimed at avoiding potential academic failures. This work aims to provide a brief description of aspects related to the data warehouse built and some processes of data mining developed on the same.

  11. Stratified sampling design based on data mining.

    Science.gov (United States)

    Kim, Yeonkook J; Oh, Yoonhwan; Park, Sunghoon; Cho, Sungzoon; Park, Hayoung

    2013-09-01

    To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.

  12. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  13. Patent data mining method and apparatus

    Science.gov (United States)

    Boyack, Kevin W.; Grafe, V. Gerald; Johnson, David K.; Wylie, Brian N.

    2002-01-01

    A method of data mining represents related patents in a multidimensional space. Distance between patents in the multidimensional space corresponds to the extent of relationship between the patents. The relationship between pairings of patents can be expressed based on weighted combinations of several predicates. The user can select portions of the space to perceive. The user also can interact with and control the communication of the space, focusing attention on aspects of the space of most interest. The multidimensional spatial representation allows more ready comprehension of the structure of the relationships among the patents.

  14. Temporal data mining for hospital management

    Science.gov (United States)

    Tsumoto, Shusaku; Hirano, Shoji

    2009-04-01

    It has passed about twenty years since clinical information are stored electronically as a hospital information system since 1980's. Stored data include from accounting information to laboratory data and even patient records are now started to be accumulated: in other words, a hospital cannot function without the information system, where almost all the pieces of medical information are stored as multimedia databases. In this paper, we applied temporal data mining and exploratory data analysis techniques to hospital management data. The results show several interesting results, which suggests that the reuse of stored data will give a powerful tool for hospial management.

  15. Analyzing Log Files using Data-Mining

    Directory of Open Access Journals (Sweden)

    Marius Mihut

    2008-01-01

    Full Text Available Information systems (i.e. servers, applications and communication devices create a large amount of monitoring data that are saved as log files. For analyzing them, a data-mining approach is helpful. This article presents the steps which are necessary for creating an ‘analyzing instrument’, based on an open source software called Waikato Environment for Knowledge Analysis (Weka [1]. For exemplification, a system log file created by a Windows-based operating system, is used as input file.

  16. Data Mining Methods for Recommender Systems

    Science.gov (United States)

    Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

    In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.

  17. Multimedia data mining and analytics disruptive innovation

    CERN Document Server

    Baughman, Aaron; Pan, Jia-Yu; Petrushin, Valery A

    2015-01-01

    This authoritative text/reference provides fresh insights into the cutting edge of multimedia data mining, reflecting how the research focus has shifted towards networked social communities, mobile devices and sensors. Presenting a detailed exploration into the progression of the field, the book describes how the history of multimedia data processing can be viewed as a sequence of disruptive innovations. Across the chapters, the discussion covers the practical frameworks, libraries, and open source software that enable the development of ground-breaking research into practical applications.

  18. Data mining in time series databases

    CERN Document Server

    Kandel, Abraham; Bunke, Horst

    2004-01-01

    Adding the time dimension to real-world databases produces Time SeriesDatabases (TSDB) and introduces new aspects and difficulties to datamining and knowledge discovery. This book covers the state-of-the-artmethodology for mining time series databases. The novel data miningmethods presented in the book include techniques for efficientsegmentation, indexing, and classification of noisy and dynamic timeseries. A graph-based method for anomaly detection in time series isdescribed and the book also studies the implications of a novel andpotentially useful representation of time series as strings. Theproblem of detecting changes in data mining models that are inducedfrom temporal databases is additionally discussed.

  19. Big data mining: In-database Oracle data mining over hadoop

    Science.gov (United States)

    Kovacheva, Zlatinka; Naydenova, Ina; Kaloyanova, Kalinka; Markov, Krasimir

    2017-07-01

    Big data challenges different aspects of storing, processing and managing data, as well as analyzing and using data for business purposes. Applying Data Mining methods over Big Data is another challenge because of huge data volumes, variety of information, and the dynamic of the sources. Different applications are made in this area, but their successful usage depends on understanding many specific parameters. In this paper we present several opportunities for using Data Mining techniques provided by the analytical engine of RDBMS Oracle over data stored in Hadoop Distributed File System (HDFS). Some experimental results are given and they are discussed.

  20. Optimal sampling strategy for data mining

    International Nuclear Information System (INIS)

    Ghaffar, A.; Shahbaz, M.; Mahmood, W.

    2013-01-01

    Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)

  1. Asymmetric threat data mining and knowledge discovery

    Science.gov (United States)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Asymmetric threats differ from the conventional force-on- force military encounters that the Defense Department has historically been trained to engage. Terrorism by its nature is now an operational activity that is neither easily detected or countered as its very existence depends on small covert attacks exploiting the element of surprise. But terrorism does have defined forms, motivations, tactics and organizational structure. Exploiting a terrorism taxonomy provides the opportunity to discover and assess knowledge of terrorist operations. This paper describes the Asymmetric Threat Terrorist Assessment, Countering, and Knowledge (ATTACK) system. ATTACK has been developed to (a) data mine open source intelligence (OSINT) information from web-based newspaper sources, video news web casts, and actual terrorist web sites, (b) evaluate this information against a terrorism taxonomy, (c) exploit country/region specific social, economic, political, and religious knowledge, and (d) discover and predict potential terrorist activities and association links. Details of the asymmetric threat structure and the ATTACK system architecture are presented with results of an actual terrorist data mining and knowledge discovery test case shown.

  2. Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.

    Science.gov (United States)

    Al-Saggaf, Yeslam; Islam, Md Zahidul

    2015-08-01

    This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.

  3. Technological Similarity, Post-acquisition R&D Reorganization, and Innovation Performance in Horizontal Acquisition

    DEFF Research Database (Denmark)

    Colombo, Massimo G.; Rabbiosi, Larissa

    2014-01-01

    This paper aims to disentangle the mechanisms through which technological similarity between acquiring and acquired firms influences innovation in horizontal acquisitions. We develop a theoretical model that links technological similarity to: (i) two key aspects of post-acquisition reorganization...... of acquired R&D operations – the rationalization of the R&D operations and the replacement of the R&D top manager, and (ii) two intermediate effects that are closely associated with the post-acquisition innovation performance of the combined firm – improvements in R&D productivity and disruptions in R......&D personnel. We rely on PLS techniques to test our theoretical model using detailed information on 31 horizontal acquisitions in high- and medium-tech industries. Our results indicate that in horizontal acquisitions, technological similarity negatively affects post-acquisition innovation performance...

  4. Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    Science.gov (United States)

    Stolzer, Alan J.; Halford, Carl

    2007-01-01

    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.

  5. Parallel object-oriented data mining system

    Science.gov (United States)

    Kamath, Chandrika; Cantu-Paz, Erick

    2004-01-06

    A data mining system uncovers patterns, associations, anomalies and other statistically significant structures in data. Data files are read and displayed. Objects in the data files are identified. Relevant features for the objects are extracted. Patterns among the objects are recognized based upon the features. Data from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) sky survey was used to search for bent doubles. This test was conducted on data from the Very Large Array in New Mexico which seeks to locate a special type of quasar (radio-emitting stellar object) called bent doubles. The FIRST survey has generated more than 32,000 images of the sky to date. Each image is 7.1 megabytes, yielding more than 100 gigabytes of image data in the entire data set.

  6. EXTRACTING KNOWLEDGE FROM DATA - DATA MINING

    Directory of Open Access Journals (Sweden)

    DIANA ELENA CODREANU

    2011-04-01

    Full Text Available Managers of economic organizations have at their disposal a large volume of information and practically facing an avalanche of information, but they can not operate studying reports containing detailed data volumes without a correlation because of the good an organization may be decided in fractions of time. Thus, to take the best and effective decisions in real time, managers need to have the correct information is presented quickly, in a synthetic way, but relevant to allow for predictions and analysis.This paper wants to highlight the solutions to extract knowledge from data, namely data mining. With this technology not only has to verify some hypotheses, but aims at discovering new knowledge, so that economic organization to cope with fierce competition in the market.

  7. Detecting Internet Worms Using Data Mining Techniques

    Directory of Open Access Journals (Sweden)

    Muazzam Siddiqui

    2008-12-01

    Full Text Available Internet worms pose a serious threat to computer security. Traditional approaches using signatures to detect worms pose little danger to the zero day attacks. The focus of malware research is shifting from using signature patterns to identifying the malicious behavior displayed by the malwares. This paper presents a novel idea of extracting variable length instruction sequences that can identify worms from clean programs using data mining techniques. The analysis is facilitated by the program control flow information contained in the instruction sequences. Based upon general statistics gathered from these instruction sequences we formulated the problem as a binary classification problem and built tree based classifiers including decision tree, bagging and random forest. Our approach showed 95.6% detection rate on novel worms whose data was not used in the model building process.

  8. Data mining in bioinformatics using Weka.

    Science.gov (United States)

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  9. DATA MINING AND APPLICATION OF IT TO CAPITAL MARKETS

    Directory of Open Access Journals (Sweden)

    Cenk AKKAYA

    2011-07-01

    Full Text Available Nowadays with the development of technology importance given to knowledge increases gradually. Data mining enables to form forecasts and models regarding future by making use of past data. Any method which helps to discover data can be used as a data mining method. Enterprises gain important competitive advantage by data mining methods. Data mining is used in different fields. In finance field it is a specially used in financial performance applications, guessing the enterprise bankruptcies and failures, determining transaction manipulation, determining financial risk management, determining customer profile and depth management. It can be costly, risky and time consuming for enterprises to gain knowledge. Thus today enterprises use data mining as an innovative competitive mean. The aim of the study is to determine the importance of data mining applications to capital markets.

  10. The study on privacy preserving data mining for information security

    Science.gov (United States)

    Li, Xiaohui

    2012-04-01

    Privacy preserving data mining have a rapid development in a short year. But it still faces many challenges in the future. Firstly, the level of privacy has different definitions in different filed. Therefore, the measure of privacy preserving data mining technology protecting private information is not the same. So, it's an urgent issue to present a unified privacy definition and measure. Secondly, the most of research in privacy preserving data mining is presently confined to the theory study.

  11. Seminal quality prediction using data mining methods.

    Science.gov (United States)

    Sahoo, Anoop J; Kumar, Yugal

    2014-01-01

    Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality. The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods. The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes. The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of

  12. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  13. Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis.

    Science.gov (United States)

    Hu, Zhi-Liang; Reecy, James M; Wu, Xiao-Lin

    2012-01-01

    A database can be used to warehouse quantitative trait loci (QTL) data from multiple sources for comparison, genomic data mining, and meta-analysis. A robust database design involves sound data structure logistics, meaningful data transformations, normalization, and proper user interface designs. This chapter starts with a brief review of relational database basics and concentrates on issues associated with curation of QTL data into a relational database, with emphasis on the principles of data normalization and structure optimization. In addition, some simple examples of QTL data mining and meta-analysis are included. These examples are provided to help readers better understand the potential and importance of sound database design.

  14. The Evaluation on Data Mining Methods of Horizontal Bar Training Based on BP Neural Network

    Directory of Open Access Journals (Sweden)

    Zhang Yanhui

    2015-01-01

    Full Text Available With the rapid development of science and technology, data analysis has become an indispensable part of people’s work and life. Horizontal bar training has multiple categories. It is an emphasis for the re-search of related workers that categories of the training and match should be reduced. The application of data mining methods is discussed based on the problem of reducing categories of horizontal bar training. The BP neural network is applied to the cluster analysis and the principal component analysis, which are used to evaluate horizontal bar training. Two kinds of data mining methods are analyzed from two aspects, namely the operational convenience of data mining and the rationality of results. It turns out that the principal component analysis is more suitable for data processing of horizontal bar training.

  15. Prediction of pork quality parameters by applying fractals and data mining on MRI

    DEFF Research Database (Denmark)

    Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés

    2017-01-01

    This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One...... Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear...... regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate...

  16. Application of Data Mining in direct marketing

    Directory of Open Access Journals (Sweden)

    Dejana Pavlović

    2014-04-01

    Full Text Available The key to successful business operations lies in good communication with clients. There are a growing number of brokers in the financial market who collect excess funds from the clients and perform transfers to those who need the funds. However, many external and internal factors influence the decision on disposal of available funds. This paper identifies and researches into clients’ satisfaction in the banking system. By application of the disclosure of data legality we will try to point to the factors that influence the clients' decision to invest their long-term deposits in the parent bank. Upon classification and clustering, we will interpret and indentify the strengths and weaknesses of the target results. This analysis provides the guidelines through the use of the decision-making tree, application of data mining and the possibility to use a large set of data increases the value and accuracy of this technique. The problem with this technique is accuracy of the data submitted by the client.

  17. Information Extraction for Clinical Data Mining: A Mammography Case Study.

    Science.gov (United States)

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2009-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F 1 -score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

  18. Introduction to the special section on educational data mining

    NARCIS (Netherlands)

    Calders, T.G.K.; Pechenizkiy, M.

    2012-01-01

    Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM is both a learning science, as well as a rich application area for data mining, due to

  19. A Tools-Based Approach to Teaching Data Mining Methods

    Science.gov (United States)

    Jafar, Musa J.

    2010-01-01

    Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…

  20. Exploring the Integration of Data Mining and Data Visualization

    Science.gov (United States)

    Zhang, Yi

    2011-01-01

    Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be…

  1. Application and Exploration of Big Data Mining in Clinical Medicine.

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-03-20

    To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.

  2. Data Mine and Forget It?: A Cautionary Tale

    Science.gov (United States)

    Tada, Yuri; Kraft, Norbert Otto; Orasanu, Judith M.

    2011-01-01

    With the development of new technologies, data mining has become increasingly popular. However, caution should be exercised in choosing the variables to include in data mining. A series of regression trees was created to demonstrate the change in the selection by the program of significant predictors based on the nature of variables.

  3. Model architecture of intelligent data mining oriented urban transportation information

    Science.gov (United States)

    Yang, Bogang; Tao, Yingchun; Sui, Jianbo; Zhang, Feizhou

    2007-06-01

    Aiming at solving practical problems in urban traffic, the paper presents model architecture of intelligent data mining from hierarchical view. With artificial intelligent technologies used in the framework, the intelligent data mining technology improves, which is more suitable for the change of real-time road condition. It also provides efficient technology support for the urban transport information distribution, transmission and display.

  4. Data mining in e-commerce: A survey

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    it is only apposite to seek the services of data mining to make (business) sense out of these data sets. Data mining ..... for the simple reason that for practical purposes, it is sufficient to include snapshots of data taken at say, weekly ..... of the mining environment and the expenses the user is willing to incur). The authors have.

  5. Informatics, Data Mining, Econometrics and Financial Economics: A Connection

    NARCIS (Netherlands)

    C-L. Chang (Chia-Lin); M.J. McAleer (Michael); W.-K. Wong (Wing-Keung)

    2015-01-01

    textabstractThis short communication reviews some of the literature in econometrics and financial economics that is related to informatics and data mining. We then discuss some of the research on econometrics and financial economics that could be extended to informatics and data mining beyond the

  6. Set-oriented data mining in relational databases

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Swami, Arun

    1995-01-01

    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are

  7. Expressive power of an algebra for data mining

    NARCIS (Netherlands)

    Calders, T.; Lakshmanan, L.V.S.; Ng, R.T.; Paredaens, J.

    2006-01-01

    The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can

  8. The viability of business data mining in the sports environment ...

    African Journals Online (AJOL)

    Data mining can be viewed as the process of extracting previously unknown information from large databases and utilising this information to make crucial business decisions (Simoudis, 1996: 26). This paper considers the viability of using data mining tools and techniques in sports, particularly with regard to mining the ...

  9. Application and Exploration of Big Data Mining in Clinical Medicine

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-01-01

    Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378

  10. Experienced ethical issues of personalized data-mined media services

    DEFF Research Database (Denmark)

    Sørensen, Jannick Kirk

    2008-01-01

    This tentative PhD project description concerns the ethnographic examination of users’ experience of privacy issues and usability related to personalized data mined (web-) services for media content.......This tentative PhD project description concerns the ethnographic examination of users’ experience of privacy issues and usability related to personalized data mined (web-) services for media content....

  11. Polyamine Metabolites Profiling for Characterization of Lung and Liver Cancer Using an LC-Tandem MS Method with Multiple Statistical Data Mining Strategies: Discovering Potential Cancer Biomarkers in Human Plasma and Urine

    Directory of Open Access Journals (Sweden)

    Huarong Xu

    2016-08-01

    Full Text Available Polyamines, one of the most important kind of biomarkers in cancer research, were investigated in order to characterize different cancer types. An integrative approach which combined ultra-high performance liquid chromatography—tandem mass spectrometry detection and multiple statistical data processing strategies including outlier elimination, binary logistic regression analysis and cluster analysis had been developed to discover the characteristic biomarkers of lung and liver cancer. The concentrations of 14 polyamine metabolites in biosamples from lung (n = 50 and liver cancer patients (n = 50 were detected by a validated UHPLC-MS/MS method. Then the concentrations were converted into independent variables to characterize patients of lung and liver cancer by binary logic regression analysis. Significant independent variables were regarded as the potential biomarkers. Cluster analysis was engaged for further verifying. As a result, two values was discovered to identify lung and liver cancer, which were the product of the plasma concentration of putrescine and spermidine; and the ratio of the urine concentration of S-adenosyl-l-methionine and N-acetylspermidine. Results indicated that the established advanced method could be successfully applied to characterize lung and liver cancer, and may also enable a new way of discovering cancer biomarkers and characterizing other types of cancer.

  12. Recent advances in environmental data mining

    Science.gov (United States)

    Leuenberger, Michael; Kanevski, Mikhail

    2016-04-01

    Due to the large amount and complexity of data available nowadays in geo- and environmental sciences, we face the need to develop and incorporate more robust and efficient methods for their analysis, modelling and visualization. An important part of these developments deals with an elaboration and application of a contemporary and coherent methodology following the process from data collection to the justification and communication of the results. Recent fundamental progress in machine learning (ML) can considerably contribute to the development of the emerging field - environmental data science. The present research highlights and investigates the different issues that can occur when dealing with environmental data mining using cutting-edge machine learning algorithms. In particular, the main attention is paid to the description of the self-consistent methodology and two efficient algorithms - Random Forest (RF, Breiman, 2001) and Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. Despite the fact that they are based on two different concepts, i.e. decision trees vs artificial neural networks, they both propose promising results for complex, high dimensional and non-linear data modelling. In addition, the study discusses several important issues of data driven modelling, including feature selection and uncertainties. The approach considered is accompanied by simulated and real data case studies from renewable resources assessment and natural hazards tasks. In conclusion, the current challenges and future developments in statistical environmental data learning are discussed. References - Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5-32. - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392

  13. Big data mining analysis method based on cloud computing

    Science.gov (United States)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  14. Kajian Data Mining Customer Relationship Management pada Lembaga Keuangan Mikro

    Directory of Open Access Journals (Sweden)

    Tikaridha Hardiani

    2016-01-01

    Full Text Available Companies are required to be ready to face the competition will be intense with other companies, including micro-finance institutions. Faced more intense competition, has led to many businesses in microfinance institutions find profitable strategy to distinguish from the others. Strategy that can be applied is implementing Customer Relationship Management (CRM and data mining. Data mining can be used to microfinance institutions that have a large enough data. Determine the potential customers with customer segmentation can help the decision-making marketing strategy that will be implemented . This paper discusses several data mining techniques that can be used for customer segmentation. Proposed method of data mining technique is fuzzy clustering with fuzzy C-Means algorithm and fuzzy RFM. Keywords : Customer relationship management; Data mining; Fuzzy clustering; Micro-finance institutions; Fuzzy C-Means; Fuzzy RFM

  15. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  16. A Data Mining Classification Approach for Behavioral Malware Detection

    Directory of Open Access Journals (Sweden)

    Monire Norouzi

    2016-01-01

    Full Text Available Data mining techniques have numerous applications in malware detection. Classification method is one of the most popular data mining techniques. In this paper we present a data mining classification approach to detect malware behavior. We proposed different classification methods in order to detect malware based on the feature and behavior of each malware. A dynamic analysis method has been presented for identifying the malware features. A suggested program has been presented for converting a malware behavior executive history XML file to a suitable WEKA tool input. To illustrate the performance efficiency as well as training data and test, we apply the proposed approaches to a real case study data set using WEKA tool. The evaluation results demonstrated the availability of the proposed data mining approach. Also our proposed data mining approach is more efficient for detecting malware and behavioral classification of malware can be useful to detect malware in a behavioral antivirus.

  17. Research on Customer Value Based on Extension Data Mining

    Science.gov (United States)

    Chun-Yan, Yang; Wei-Hua, Li

    Extenics is a new discipline for dealing with contradiction problems with formulize model. Extension data mining (EDM) is a product combining Extenics with data mining. It explores to acquire the knowledge based on extension transformations, which is called extension knowledge (EK), taking advantage of extension methods and data mining technology. EK includes extensible classification knowledge, conductive knowledge and so on. Extension data mining technology (EDMT) is a new data mining technology that mining EK in databases or data warehouse. Customer value (CV) can weigh the essentiality of customer relationship for an enterprise according to an enterprise as a subject of tasting value and customers as objects of tasting value at the same time. CV varies continually. Mining the changing knowledge of CV in databases using EDMT, including quantitative change knowledge and qualitative change knowledge, can provide a foundation for that an enterprise decides the strategy of customer relationship management (CRM). It can also provide a new idea for studying CV.

  18. An Intelligent Archive Testbed Incorporating Data Mining

    Science.gov (United States)

    Ramapriyan, H.; Isaac, D.; Yang, W.; Bonnlander, B.; Danks, D.

    2009-01-01

    interoperability, and being able to convert data to information and usable knowledge in an efficient, convenient manner, aided significantly by automation (Ramapriyan et al. 2004; NASA 2005). We can look upon the distributed provider environment with capabilities to convert data to information and to knowledge as an Intelligent Archive in the Context of a Knowledge Building system (IA-KBS). Some of the key capabilities of an IA-KBS are: Virtual Product Generation, Significant Event Detection, Automated Data Quality Assessment, Large-Scale Data Mining, Dynamic Feedback Loop, and Data Discovery and Efficient Requesting (Ramapriyan et al. 2004).

  19. Advanced Data Mining of Leukemia Cells Micro-Arrays

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2009-12-01

    Full Text Available This paper provides continuation and extensions of previous research by Segall and Pierce (2009a that discussed data mining for micro-array databases of Leukemia cells for primarily self-organized maps (SOM. As Segall and Pierce (2009a and Segall and Pierce (2009b the results of applying data mining are shown and discussed for the data categories of microarray databases of HL60, Jurkat, NB4 and U937 Leukemia cells that are also described in this article. First, a background section is provided on the work of others pertaining to the applications of data mining to micro-array databases of Leukemia cells and micro-array databases in general. As noted in predecessor article by Segall and Pierce (2009a, micro-array databases are one of the most popular functional genomics tools in use today. This research in this paper is intended to use advanced data mining technologies for better interpretations and knowledge discovery as generated by the patterns of gene expressions of HL60, Jurkat, NB4 and U937 Leukemia cells. The advanced data mining performed entailed using other data mining tools such as cubic clustering criterion, variable importance rankings, decision trees, and more detailed examinations of data mining statistics and study of other self-organized maps (SOM clustering regions of workspace as generated by SAS Enterprise Miner version 4. Conclusions and future directions of the research are also presented.

  20. Application of data mining techniques for nuclear data and instrumentation

    International Nuclear Information System (INIS)

    Toshniwal, Durga

    2013-01-01

    Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly

  1. BOOK REVIEW EDUCATIONAL DATA MINING: APPLICATIONS AND TRENDS

    Directory of Open Access Journals (Sweden)

    Aylin OZTURK

    2016-04-01

    Full Text Available Educational Data Mining (EDM is a developing field based on data mining techniques. EDM emerged as a combination of areas such as machine learning, statistics, computer science, education, cognitive science, and psychometry. EDM focuses on learner characteristics, behaviors, academic achievements, process of learning, educational functionalities, domain knowledge content, assessments, and applications. Educational data mining is defined by Baker (2010 as ‘‘an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in’’. EDM is concerned with improving the learning process and environment.

  2. Data Mining at NASA: From Theory to Applications

    Science.gov (United States)

    Srivastava, Ashok N.

    2009-01-01

    This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.

  3. Data mining for the social sciences an introduction

    CERN Document Server

    Attewell, Paul

    2015-01-01

    We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining

  4. Development of an Enhanced Generic Data Mining Life Cycle (DMLC)

    OpenAIRE

    Hofmann, Markus; Tierney, Brendan

    2017-01-01

    Data mining projects are complex and have a high failure rate. In order to improve project management and success rates of such projects a life cycle is vital to the overall success of the project. This paper reports on a research project that was concerned with the life cycle development for large scale data mining projects. The paper provides a detailed view of the design and development of a generic data mining life cycle called DMLC. The life cycle aims to support all members of data mini...

  5. Data mining algorithms for land cover change detection: a review

    Indian Academy of Sciences (India)

    Sangram Panigrahi

    2017-11-24

    Nov 24, 2017 ... values, poor quality measurement, high resolution and high dimensional data. The land cover .... These data sets also include quality assurance information, ...... 2012 A new data mining framework for forest fire mapping.

  6. Warehousing Structured and Unstructured Data for Data Mining.

    Science.gov (United States)

    Miller, L. L.; Honavar, Vasant; Barta, Tom

    1997-01-01

    Describes an extensible object-oriented view system that supports the integration of both structured and unstructured data sources in either the multidatabase or data warehouse environment. Discusses related work and data mining issues. (AEF)

  7. Usage reporting on recorded lectures using educational data mining

    NARCIS (Netherlands)

    Gorissen, Pierre; Van Bruggen, Jan; Jochems, Wim

    2012-01-01

    Gorissen, P., Van Bruggen, J., & Jochems, W. M. G. (2012). Usage reporting on recorded lectures using educational data mining. International Journal of Learning Technology, 7, 23-40. doi:10.1504/IJLT.2012.046864

  8. Data Mining and Complex Problems: Case Study in Composite Materials

    Science.gov (United States)

    Rabelo, Luis; Marin, Mario

    2009-01-01

    Data mining is defined as the discovery of useful, possibly unexpected, patterns and relationships in data using statistical and non-statistical techniques in order to develop schemes for decision and policy making. Data mining can be used to discover the sources and causes of problems in complex systems. In addition, data mining can support simulation strategies by finding the different constants and parameters to be used in the development of simulation models. This paper introduces a framework for data mining and its application to complex problems. To further explain some of the concepts outlined in this paper, the potential application to the NASA Shuttle Reinforced Carbon-Carbon structures and genetic programming is used as an illustration.

  9. Usage of Data Mining at Financial Decision Making

    Directory of Open Access Journals (Sweden)

    Levent BORAN

    2014-06-01

    Full Text Available The knowledge age requires controlling every kind of information. Recognition of patterns in data may provide previously unknown and useful information that can provide competitive advantages. If related techniques are applied on financial statements, it is possible to acquire valuable information about companies’ financial situations. It is considered that data mining could be an alternative of common financial analysis techniques such as vertical analysis, horizontal analysis, trend analysis and ratio analysis. Against existing financial analysis methods, data mining provides some advantages, which are ability of manipulation of huge data and competence of obtaining previously unknown information. There exist two major constraints of data mining implementation that are lack of experts on both data mining and related domains and cost of computer software and hardware used.

  10. Visual Data Mining of Robot Performance Data, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to design and develop VDM/RP, a visual data mining system that will enable analysts to acquire, store, query, analyze, and visualize recent and historical...

  11. Pocket data mining big data on small devices

    CERN Document Server

    Gaber, Mohamed Medhat; Gomes, Joao Bartolo

    2014-01-01

    Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the depl...

  12. 2nd International Conference on Computational Intelligence in Data Mining

    CERN Document Server

    Mohapatra, Durga

    2016-01-01

    The book is a collection of high-quality peer-reviewed research papers presented in the Second International Conference on Computational Intelligence in Data Mining (ICCIDM 2015) held at Bhubaneswar, Odisha, India during 5 – 6 December 2015. The two-volume Proceedings address the difficulties and challenges for the seamless integration of two core disciplines of computer science, i.e., computational intelligence and data mining. The book addresses different methods and techniques of integration for enhancing the overall goal of data mining. The book helps to disseminate the knowledge about some innovative, active research directions in the field of data mining, machine and computational intelligence, along with some current issues and applications of related topics.

  13. Application of Data Mining for Card Fraud Detection

    Directory of Open Access Journals (Sweden)

    I.V. Andrianov

    2012-03-01

    Full Text Available This paper focuses on implementing Data Mining methods for card fraud detection. The approach to classification and prediction tasks for detection of unauthorized transactions is considered.

  14. 1st International Conference on Computational Intelligence in Data Mining

    CERN Document Server

    Behera, Himansu; Mandal, Jyotsna; Mohapatra, Durga

    2015-01-01

    The contributed volume aims to explicate and address the difficulties and challenges for the seamless integration of two core disciplines of computer science, i.e., computational intelligence and data mining. Data Mining aims at the automatic discovery of underlying non-trivial knowledge from datasets by applying intelligent analysis techniques. The interest in this research area has experienced a considerable growth in the last years due to two key factors: (a) knowledge hidden in organizations’ databases can be exploited to improve strategic and managerial decision-making; (b) the large volume of data managed by organizations makes it impossible to carry out a manual analysis. The book addresses different methods and techniques of integration for enhancing the overall goal of data mining. The book helps to disseminate the knowledge about some innovative, active research directions in the field of data mining, machine and computational intelligence, along with some current issues and applications of relate...

  15. Data mining of air traffic control operational errors

    Science.gov (United States)

    2006-01-01

    In this paper we present the results of : applying data mining techniques to identify patterns and : anomalies in air traffic control operational errors (OEs). : Reducing the OE rate is of high importance and remains a : challenge in the aviation saf...

  16. Accounting and Financial Data Analysis Data Mining Tools

    Directory of Open Access Journals (Sweden)

    Diana Elena Codreanu

    2011-05-01

    Full Text Available Computerized accounting systems in recent years have seen an increase in complexity due to thecompetitive economic environment but with the help of data analysis solutions such as OLAP and DataMining can be a multidimensional data analysis, can detect the fraud and can discover knowledge hidden indata, ensuring such information is useful for decision making within the organization. In the literature thereare many definitions for data mining but all boils down to same idea: the process takes place to extract newinformation from large data collections, information without the aid of data mining tools would be verydifficult to obtain. Information obtained by data mining process has the advantage that only respond to thequestion of what happens but at the same time argue and show why certain things are happening. In this paperwe wish to present advanced techniques for analysis and exploitation of data stored in a multidimensionaldatabase.

  17. Transparent data mining for big and small data

    CERN Document Server

    Quercia, Daniele; Pasquale, Frank

    2017-01-01

    This book focuses on new and emerging data mining solutions that offer a greater level of transparency than existing solutions. Transparent data mining solutions with desirable properties (e.g. effective, fully automatic, scalable) are covered in the book. Experimental findings of transparent solutions are tailored to different domain experts, and experimental metrics for evaluating algorithmic transparency are presented. The book also discusses societal effects of black box vs. transparent approaches to data mining, as well as real-world use cases for these approaches. As algorithms increasingly support different aspects of modern life, a greater level of transparency is sorely needed, not least because discrimination and biases have to be avoided. With contributions from domain experts, this book provides an overview of an emerging area of data mining that has profound societal consequences, and provides the technical background to for readers to contribute to the field or to put existing approaches to prac...

  18. artery disease guidelines with extracted knowledge from data mining

    Directory of Open Access Journals (Sweden)

    Peyman Rezaei-Hachesu

    2017-06-01

    Conclusion: Guidelines confirm the achieved results from data mining (DM techniques and help to rank important risk factors based on national and local information. Evaluation of extracted rules determined new patterns for CAD patients.

  19. An Application of Multithreaded Data Mining in Educational Leadership Research

    OpenAIRE

    Fikis, David; Wang, Yinying; Bowers, Alex

    2015-01-01

    This study aims to apply high-performance computing to educational leadership research. Specifically, we applied an array of data acquisition and analytical techniques to the field of educational leadership research, including text data mining, probabiblistic topic modeling, and the use of software (CasperJS, GNU utilities, R, etc.) as well as hardware (the VELA batch computer and the multi-threaded data mining environment).  

  20. DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW

    OpenAIRE

    Pragati Sharma; Dr. Sanjiv Sharma

    2018-01-01

    Recently, data mining is gaining more popularity among researcher. Data mining provides various techniques and methods for analysing data produced by various applications of different domain. Similarly, Educational mining is providing a way for analyzing educational data set. Educational mining concerns with developing methods for discovering knowledge from data that come from educational field and it helps to extract the hidden patterns and to discover new knowledge from large educational da...

  1. Report from Dagstuhl Seminar 12331 Mobility Data Mining and Privacy

    OpenAIRE

    Clifton, Christopher W.; Kuijpers, Bart; Morik, Katharina; Saygin, Yucel

    2012-01-01

    This report documents the program and the outcomes of Dagstuhl Seminar 12331 “Mobility Data Mining and Privacy”. Mobility data mining aims to extract knowledge from movement behaviour of people, but this data also poses novel privacy risks. This seminar gathered a multidisciplinary team for a conversation on how to balance the value in mining mobility data with privacy issues. The seminar focused on four key issues: Privacy in vehicular data, in cellular data, context- dependent privacy, and ...

  2. [Aspects for data mining implementation in gerontology and geriatrics].

    Science.gov (United States)

    Mikhal'skiĭ, A I

    2014-01-01

    Current challenges facing theory and practice in ageing sciences need new methods of experimental data investigation. This is a result as of experimental basis developments in biological research, so of information technology progress. These achievements make it possible to use well proven in different fields of science and engineering data mining methods for tasks in gerontology and geriatrics. Some examples of data mining methods implementation in gerontology are presented.

  3. An Intelligent Agent based Architecture for Visual Data Mining

    OpenAIRE

    Hamdi Ellouzi; Hela Ltifi; Mounir Ben Ayed

    2016-01-01

    the aim of this paper is to present an intelligent architecture of Decision Support System (DSS) based on visual data mining. This architecture applies the multi-agent technology to facilitate the design and development of DSS in complex and dynamic environment. Multi-Agent Systems add a high level of abstraction. To validate the proposed architecture, it is implemented to develop a distributed visual data mining based DSS to predict nosocomial infectionsoccurrence in intensive care units. Th...

  4. A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING

    OpenAIRE

    Arumugam.S

    2016-01-01

    The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is mainly used to make predictions about future events which are unknown. Predictive analytics which uses various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for analyzing the current data and to make predictions about futu...

  5. Data mining in e-commerce: A survey

    Indian Academy of Sciences (India)

    Data mining has matured as a field of basic and applied research in computer science in general and e-commerce in particular. In this paper, we survey some of the recent approaches and architectures where data mining has been applied in the fields of e-commerce and e-business. Our intent is not to survey the plethora ...

  6. Predictive models in churn data mining: a review

    OpenAIRE

    García, David L.; Vellido Alcacena, Alfredo; Nebot Castells, M. Àngela

    2007-01-01

    The development of predictive models of customer abandonment plays a central role in any churn management strategy. These models can be developed using either qualitative approaches or can take a data-centred point of view. In the latter case, the use of Data Mining procedures and techniques can provide useful and actionable insights into the processes leading to abandonment. In this report, we provide a brief and structured review of some of the Data Mining approaches that have been put forw...

  7. DECISION SUPPORT SYSTEM TO SUPPORT DECISION PROCESSES WITH DATA MINING

    OpenAIRE

    Rupnik, Rok; Kukar, Matjaž

    2007-01-01

    Traditional techniques of data analysis do not enable the solution of all kind of problems and for that reason they have become insufficient. This caused a newinterdisciplinary field of data mining to arise, encompassing both classical statistical, and modern machine learning techniques to support the data analysis and knowledge discovery from data. Data mining methods are powerful in dealing with large quantities of data, but on the other hand they are difficult to master by business users t...

  8. Data mining for isotope discrimination in atom probe tomography

    Energy Technology Data Exchange (ETDEWEB)

    Broderick, Scott R. [Department of Materials Science and Engineering and Institute for Combinatorial Discovery, Iowa State University, Ames, IA 50011-2230 (United States); Bryden, Aaron [Ames National Laboratory, Ames, IA 50011-2230 (United States); Suram, Santosh K. [Department of Materials Science and Engineering and Institute for Combinatorial Discovery, Iowa State University, Ames, IA 50011-2230 (United States); Rajan, Krishna, E-mail: krajan@iastate.edu [Department of Materials Science and Engineering and Institute for Combinatorial Discovery, Iowa State University, Ames, IA 50011-2230 (United States)

    2013-09-15

    Ions with similar time-of-flights (TOF) can be discriminated by mapping their kinetic energy. While current generation position-sensitive detectors have been considered insufficient for capturing the isotope kinetic energy, we demonstrate in this paper that statistical learning methodologies can be used to capture the kinetic energy from all of the parameters currently measured by mathematically transforming the signal. This approach works because the kinetic energy is sufficiently described by the descriptors on the potential, the material, and the evaporation process within atom probe tomography (APT). We discriminate the isotopes for Mg and Al by capturing the kinetic energy, and then decompose the TOF spectrum into its isotope components and identify the isotope for each individual atom measured. This work demonstrates the value of advanced data mining methods to help enhance the information resolution of the atom probe. - Highlights: ► Atom probe tomography and statistical learning were combined for data enhancement. ► Multiple eigenvalue decompositions decomposed a spectrum with overlapping peaks. ► The isotope of each atom was determined by kinetic energy discrimination. ► Eigenspectra were identified and new chemical information was identified.

  9. Advances in Machine Learning and Data Mining for Astronomy

    Science.gov (United States)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  10. Data mining applications in the context of casemix.

    Science.gov (United States)

    Koh, H C; Leong, S K

    2001-07-01

    In October 1999, the Singapore Government introduced casemix-based funding to public hospitals. The casemix approach to health care funding is expected to yield significant benefits, including equity and rationality in financing health care, the use of comparative casemix data for quality improvement activities, and the provision of information that enables hospitals to understand their cost behaviour and reinforces the drive for more cost-efficient services. However, there is some concern about the "quicker and sicker" syndrome (that is, the rapid discharge of patients with little regard for the quality of outcome). As it is likely that consequences of premature discharges will be reflected in the readmission data, an analysis of possible systematic patterns in readmission data can provide useful insight into the "quicker and sicker" syndrome. This paper explores potential data mining applications in the context of casemix by using readmission data as an illustration. In particular, it illustrates how data mining can be used to better understand readmission data and to detect systematic patterns, if any. From a technical perspective, data mining (which is capable of analysing complex non-linear and interaction relationships) supplements and complements traditional statistical methods in data analysis. From an applications perspective, data mining provides the technology and methodology to analyse mass volume of data to detect hidden patterns in data. Using readmission data as an illustrative data mining application, this paper explores potential data mining applications in the general casemix context.

  11. KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

    Directory of Open Access Journals (Sweden)

    Isaac Triguero

    2017-01-01

    Full Text Available This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to perform data management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithmsr results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.

  12. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining

    OpenAIRE

    Chen, D

    2012-01-01

    Many small online retailers and new entrants to the online retail sector are keen to practice data mining and consumer-centric marketing in their businesses yet technically lack the necessary knowledge and expertise to do so. In this article a case study of using data mining techniques in customer-centric business intelligence for an online retailer is presented. The main purpose of this analysis is to help the business better understand its customers and therefore conduct customer-centric ma...

  13. Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes

    OpenAIRE

    Anjewierden , Anjo; Kolloffel , Bas; Hulshof , Casper

    2007-01-01

    In this paper we investigate the application of data mining methods to provide learners with real-time adaptive feedback on the nature and patterns of their on-line communication while learning collaboratively.We derived two models for classifying chat messages using data mining techniques and tested these on an actual data set [16]. The reliability of the classification of chat messages is established by comparing the models performance to that of humans. Results indicate that the classifica...

  14. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    OpenAIRE

    Dipnall, Joanna F.; Pasco, Julie A.; Berk, Michael; Williams, Lana J.; Dodd, Seetal; Jacka, Felice N.; Meyer, Denny

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted reg...

  15. Event metadata records as a testbed for scalable data mining

    International Nuclear Information System (INIS)

    Gemmeren, P van; Malon, D

    2010-01-01

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  16. Data mining in healthcare: decision making and precision

    Directory of Open Access Journals (Sweden)

    Ionuţ ŢĂRANU

    2016-05-01

    Full Text Available The trend of application of data mining in healthcare today is increased because the health sector is rich with information and data mining has become a necessity. Healthcare organizations generate and collect large volumes of information to a daily basis. Use of information technology enables automation of data mining and knowledge that help bring some interesting patterns which means eliminating manual tasks and easy data extraction directly from electronic records, electronic transfer system that will secure medical records, save lives and reduce the cost of medical services as well as enabling early detection of infectious diseases on the basis of advanced data collection. Data mining can enable healthcare organizations to anticipate trends in the patient's medical condition and behaviour proved by analysis of prospects different and by making connections between seemingly unrelated information. The raw data from healthcare organizations are voluminous and heterogeneous. It needs to be collected and stored in organized form and their integration allows the formation unite medical information system. Data mining in health offers unlimited possibilities for analyzing different data models less visible or hidden to common analysis techniques. These patterns can be used by healthcare practitioners to make forecasts, put diagnoses, and set treatments for patients in healthcare organizations.

  17. Data Mining in Distributed Database of the First Egyptian Thermal Research Reactor (ETRR-1)

    International Nuclear Information System (INIS)

    Abo Elez, R.H.; Ayad, N.M.A.; Ghuname, A.A.A.

    2006-01-01

    Distributed database (DDB)technology application systems are growing up to cover many fields an domains, and at different levels. the aim of this paper is to shade some lights on applying the new technology of distributed database on the ETRR-1 operation data logged by the data acquisition system (DACQUS)and one can extract a useful knowledge. data mining with scientific methods and specialize tools is used to support the extraction of useful knowledge from the rapidly growing volumes of data . there are many shapes and forms for data mining methods. predictive methods furnish models capable of anticipating the future behavior of quantitative or qualitative database variables. when the relationship between the dependent an independent variables is nearly liner, linear regression method is the appropriate data mining strategy. so, multiple linear regression models have been applied to a set of data samples of the ETRR-1 operation data, using least square method. the results show an accurate analysis of the multiple linear regression models as applied to the ETRR-1 operation data

  18. Review of Recent Development of Dynamic Wind Farm Equivalent Models Based on Big Data Mining

    Science.gov (United States)

    Wang, Chenggen; Zhou, Qian; Han, Mingzhe; Lv, Zhan’ao; Hou, Xiao; Zhao, Haoran; Bu, Jing

    2018-04-01

    Recently, the big data mining method has been applied in dynamic wind farm equivalent modeling. In this paper, its recent development with present research both domestic and overseas is reviewed. Firstly, the studies of wind speed prediction, equivalence and its distribution in the wind farm are concluded. Secondly, two typical approaches used in the big data mining method is introduced, respectively. For single wind turbine equivalent modeling, it focuses on how to choose and identify equivalent parameters. For multiple wind turbine equivalent modeling, the following three aspects are concentrated, i.e. aggregation of different wind turbine clusters, the parameters in the same cluster, and equivalence of collector system. Thirdly, an outlook on the development of dynamic wind farm equivalent models in the future is discussed.

  19. Rule-based statistical data mining agents for an e-commerce application

    Science.gov (United States)

    Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar

    2003-03-01

    Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.

  20. Comparative analysis of data mining techniques for business data

    Science.gov (United States)

    Jamil, Jastini Mohd; Shaharanee, Izwan Nizal Mohd

    2014-12-01

    Data mining is the process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data contained within a database. Companies are using this tool to further understand their customers, to design targeted sales and marketing campaigns, to predict what product customers will buy and the frequency of purchase, and to spot trends in customer preferences that can lead to new product development. In this paper, we conduct a systematic approach to explore several of data mining techniques in business application. The experimental result reveals that all data mining techniques accomplish their goals perfectly, but each of the technique has its own characteristics and specification that demonstrate their accuracy, proficiency and preference.

  1. Prediction of thermodynamic properties of refrigerants using data mining

    International Nuclear Information System (INIS)

    Kuecueksille, Ecir Ugur; Selbas, Resat; Sencan, Arzu

    2011-01-01

    The analysis of vapor compression refrigeration systems requires the availability of simple and efficient mathematical formulations for the determination of thermodynamic properties of refrigerants. The aim of this study is to determine thermodynamic properties as enthalpy, entropy and specific volume of alternative refrigerants using data mining method. Alternative refrigerants used in the study are R134a, R404a, R407c and R410a. The results obtained from data mining have been compared to actual data from the literature. The study shows that the data mining methodology is successfully applicable to determine enthalpy, entropy and specific volume values for any temperature and pressure of refrigerants. Therefore, computation time reduces and simulation of vapor compression refrigeration systems is fairly facilitated.

  2. The First International Conference on Soft Computing and Data Mining

    CERN Document Server

    Ghazali, Rozaida; Deris, Mustafa

    2014-01-01

    This book constitutes the refereed proceedings of the First International Conference on Soft Computing and Data Mining, SCDM 2014, held in Universiti Tun Hussein Onn Malaysia, in June 16th-18th, 2014. The 65 revised full papers presented in this book were carefully reviewed and selected from 145 submissions, and organized into two main topical sections; Data Mining and Soft Computing. The goal of this book is to provide both theoretical concepts and, especially, practical techniques on these exciting fields of soft computing and data mining, ready to be applied in real-world applications. The exchanges of views pertaining future research directions to be taken in this field and the resultant dissemination of the latest research findings makes this work of immense value to all those having an interest in the topics covered.    

  3. Data Mining and Data Fusion for Enhanced Decision Support

    Energy Technology Data Exchange (ETDEWEB)

    Khan, Shiraj [ORNL; Ganguly, Auroop R [ORNL; Gupta, Amar [University of Arizona

    2008-01-01

    The process of Data Mining converts information to knowledge by utilizing tools from the disciplines of computational statistics, database technologies, machine learning, signal processing, nonlinear dynamics, process modeling, simulation, and allied disciplines. Data Mining allows business problems to be analyzed from diverse perspectives, including dimensionality reduction, correlation and co-occurrence, clustering and classification, regression and forecasting, anomaly detection, and change analysis. The predictive insights generated from Data Mining can be further utilized through real-time analysis and decision sciences, as well as through human-driven analysis based on management by exceptions or by objectives, to generate actionable knowledge. The tools that enable the transformation of raw data to actionable predictive insights are collectively referred as Decision Support tools. This chapter presents a new formalization of the decision process, leading to a new Decision Superiority model, partially motivated by the Joint Directors of Laboratories (JDL) Data Fusion Model. In addition, it examines the growing importance of Data Fusion concepts.

  4. Data mining-aided materials discovery and optimization

    Directory of Open Access Journals (Sweden)

    Wencong Lu

    2017-09-01

    Full Text Available Recent developments in data mining-aided materials discovery and optimization are reviewed in this paper, and an introduction to the materials data mining (MDM process is provided using case studies. Both qualitative and quantitative methods in machine learning can be adopted in the MDM process to accomplish different tasks in materials discovery, design, and optimization. State-of-the-art techniques in data mining-aided materials discovery and optimization are demonstrated by reviewing the controllable synthesis of dendritic Co3O4 superstructures, materials design of layered double hydroxide, battery materials discovery, and thermoelectric materials design. The results of the case studies indicate that MDM is a powerful approach for use in materials discovery and innovation, and will play an important role in the development of the Materials Genome Initiative and Materials Informatics.

  5. Towards Cooperative Predictive Data Mining in Competitive Environments

    Science.gov (United States)

    Lisý, Viliam; Jakob, Michal; Benda, Petr; Urban, Štěpán; Pěchouček, Michal

    We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.

  6. Data mining in soft computing framework: a survey.

    Science.gov (United States)

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  7. Randomized algorithms in automatic control and data mining

    CERN Document Server

    Granichin, Oleg; Toledano-Kitai, Dvora

    2015-01-01

    In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.

  8. Teaching Financial Data Mining using Stocks and Futures Contracts

    Directory of Open Access Journals (Sweden)

    Gary Boetticher

    2005-06-01

    Full Text Available Financial data mining models is considered to be "the hardest way to make easy money." Data miners are certainly motivated by the prospect of discovering a financial "Holy Grail." However, designing and implementing a successful model poses many intellectual challenges. These include securing and cleaning data; acquiring a sufficient amount of financial domain knowledge; bounding the complexity of the problem; and properly validating results. Teaching financial data mining is especially difficult due to the student's limited financial domain knowledge and the relatively short period (one semester for building financial models. This paper describes an application of a financial data mining term project based on Stock and E-Mini futures contracts and discusses "lessons learned" from assigning similar term projects over six different semesters. Results of each case study results are presented and discussed.

  9. Clinical diabetes research using data mining: a Canadian perspective.

    Science.gov (United States)

    Shah, Baiju R; Lipscombe, Lorraine L

    2015-06-01

    With the advent of the digitization of large amounts of information and the computer power capable of analyzing this volume of information, data mining is increasingly being applied to medical research. Datasets created for administration of the healthcare system provide a wealth of information from different healthcare sectors, and Canadian provinces' single-payer universal healthcare systems mean that data are more comprehensive and complete in this country than in many other jurisdictions. The increasing ability to also link clinical information, such as electronic medical records, laboratory test results and disease registries, has broadened the types of data available for analysis. Data-mining methods have been used in many different areas of diabetes clinical research, including classic epidemiology, effectiveness research, population health and health services research. Although methodologic challenges and privacy concerns remain important barriers to using these techniques, data mining remains a powerful tool for clinical research. Copyright © 2015 Canadian Diabetes Association. Published by Elsevier Inc. All rights reserved.

  10. Combining complex networks and data mining: Why and how

    Science.gov (United States)

    Zanin, M.; Papo, D.; Sousa, P. A.; Menasalvas, E.; Nicchi, A.; Kubik, E.; Boccaletti, S.

    2016-05-01

    The increasing power of computer technology does not dispense with the need to extract meaningful information out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.

  11. Possibility of Integrated Data Mining of Clinical Data

    Directory of Open Access Journals (Sweden)

    Akinori Abe

    2007-03-01

    Full Text Available In this paper, we introduce integrated data mining. Because of recent rapid progress in medical science as well as clinical diagnosis and treatment, integrated and cooperative research among medical researchers, biology, engineering, cultural science, and sociology is required. Therefore, we propose a framework called Cyber Integrated Medical Infrastructure (CIMI. Within this framework, we can deal with various types of data and consequently need to integrate those data prior to analysis. In this study, for medical science, we analyze the features and relationships among various types of data and show the possibility of integrated data mining.

  12. Data mining approach to model the diagnostic service management.

    Science.gov (United States)

    Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

    2006-01-01

    Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services.

  13. International Conference on Computational Intelligence in Data Mining

    CERN Document Server

    Mohapatra, Durga

    2017-01-01

    The book presents high quality papers presented at the International Conference on Computational Intelligence in Data Mining (ICCIDM 2016) organized by School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India during December 10 – 11, 2016. The book disseminates the knowledge about innovative, active research directions in the field of data mining, machine and computational intelligence, along with current issues and applications of related topics. The volume aims to explicate and address the difficulties and challenges that of seamless integration of the two core disciplines of computer science. .

  14. Advances in machine learning and data mining for astronomy

    CERN Document Server

    Way, Michael J

    2012-01-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health

  15. Advanced Data Mining of Leukemia Cells Micro-Arrays

    OpenAIRE

    Richard S. Segall; Ryan M. Pierce

    2009-01-01

    This paper provides continuation and extensions of previous research by Segall and Pierce (2009a) that discussed data mining for micro-array databases of Leukemia cells for primarily self-organized maps (SOM). As Segall and Pierce (2009a) and Segall and Pierce (2009b) the results of applying data mining are shown and discussed for the data categories of microarray databases of HL60, Jurkat, NB4 and U937 Leukemia cells that are also described in this article. First, a background section is pro...

  16. 4th International conference on Knowledge Discovery and Data Mining

    CERN Document Server

    Knowledge Discovery and Data Mining

    2012-01-01

    The volume includes a set of selected papers extended and revised from the 4th International conference on Knowledge Discovery and Data Mining, March 1-2, 2011, Macau, Chin.   This Volume is to provide a forum for researchers, educators, engineers, and government officials involved in the general areas of knowledge discovery and data mining and learning to disseminate their latest research results and exchange views on the future research directions of these fields. 108 high-quality papers are included in the volume.

  17. Data mining with SPSS modeler theory, exercises and solutions

    CERN Document Server

    Wendler, Tilo

    2016-01-01

    Introducing the IBM SPSS Modeler, this book guides readers through data mining processes and presents relevant statistical methods. There is a special focus on step-by-step tutorials and well-documented examples that help demystify complex mathematical algorithms and computer programs. The variety of exercises and solutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. While intended for students, the simplicity of the Modeler makes the book useful for anyone wishing to learn about basic and more advanced data mining, and put this knowledge into practice.

  18. Spatio-Temporal Data Mining for Location-Based Services

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo

    . The objectives of the presented thesis are three-fold. First, to extend popular data mining methods to the spatio-temporal domain. Second, to demonstrate the usefulness of the extended methods and the derived knowledge in promising LBS examples. Finally, to eliminate privacy concerns in connection with spatio......-temporal data mining by devising systems for privacy-preserving location data collection and mining.......Location-Based Services (LBS) are continuously gaining popularity. Innovative LBSes integrate knowledge about the users into the service. Such knowledge can be derived by analyzing the location data of users. Such data contain two unique dimensions, space and time, which need to be analyzed...

  19. Data Mining Process Optimization in Computational Multi-agent Systems

    OpenAIRE

    Kazík, O.; Neruda, R. (Roman)

    2015-01-01

    In this paper, we present an agent-based solution of metalearning problem which focuses on optimization of data mining processes. We exploit the framework of computational multi-agent systems in which various meta-learning problems have been already studied, e.g. parameter-space search or simple method recommendation. In this paper, we examine the effect of data preprocessing for machine learning problems. We perform the set of experiments in the search-space of data mining processes which is...

  20. The Impact of the Dimensions of Transformational Leadership on the Post-acquisition Performance of the Acquired Company

    Directory of Open Access Journals (Sweden)

    Sladjana Savovic

    2017-08-01

    Full Text Available Mergers and acquisitions (M&A are the important mechanisms through which companies can achieve growth, gain access to new markets and diversify their activities. Although companies engage themselves in M&As with optimism, empirical evidence shows that many M&A transactions are not successful. Therefore, research is often focused on the identification of the ways to improve post-acquisition performance. One of the key success factors of M&A is to provide adequate transformational leadership during the process of change, especially in the critical phase of the post-acquisition integration. A transformational leader should provide incentives and support to the employees in order for them to accept changes and focus on achieving challenging goals. This paper explores the impact of the different dimensions of transformational leadership on the post-acquisition performance based on the example of a company operating in the Republic of Serbia’s retail sector, which was the subject of a cross-border acquisition. In order to ensure the adequate representativeness of the sample, a questionnaire was distributed in all parts of the company throughout the Republic of Serbia. The results of this study show that all the dimensions of transformational leadership positively impact post-acquisition performance. The “individual consideration” dimension of transformational leadership has the strongest impact on post-acquisition performance, whereas the “intellectual simulation” dimension has the weakest.

  1. High-performance secure multi-party computation for data mining applications

    DEFF Research Database (Denmark)

    Bogdanov, Dan; Niitsoo, Margus; Toft, Tomas

    2012-01-01

    Secure multi-party computation (MPC) is a technique well suited for privacy-preserving data mining. Even with the recent progress in two-party computation techniques such as fully homomorphic encryption, general MPC remains relevant as it has shown promising performance metrics in real...... operations such as multiplication and comparison. Secondly, the confidential processing of financial data requires the use of more complex primitives, including a secure division operation. This paper describes new protocols in the Sharemind model for secure multiplication, share conversion, equality, bit...

  2. Optimizing hippocampal segmentation in infants utilizing MRI post-acquisition processing.

    Science.gov (United States)

    Thompson, Deanne K; Ahmadzai, Zohra M; Wood, Stephen J; Inder, Terrie E; Warfield, Simon K; Doyle, Lex W; Egan, Gary F

    2012-04-01

    This study aims to determine the most reliable method for infant hippocampal segmentation by comparing magnetic resonance (MR) imaging post-acquisition processing techniques: contrast to noise ratio (CNR) enhancement, or reformatting to standard orientation. MR scans were performed with a 1.5 T GE scanner to obtain dual echo T2 and proton density (PD) images at term equivalent (38-42 weeks' gestational age). 15 hippocampi were manually traced four times on ten infant images by 2 independent raters on the original T2 image, as well as images processed by: a) combining T2 and PD images (T2-PD) to enhance CNR; then b) reformatting T2-PD images perpendicular to the long axis of the left hippocampus. CNRs and intraclass correlation coefficients (ICC) were calculated. T2-PD images had 17% higher CNR (15.2) than T2 images (12.6). Original T2 volumes' ICC was 0.87 for rater 1 and 0.84 for rater 2, whereas T2-PD images' ICC was 0.95 for rater 1 and 0.87 for rater 2. Reliability of hippocampal segmentation on T2-PD images was not improved by reformatting images (rater 1 ICC = 0.88, rater 2 ICC = 0.66). Post-acquisition processing can improve CNR and hence reliability of hippocampal segmentation in neonate MR scans when tissue contrast is poor. These findings may be applied to enhance boundary definition in infant segmentation for various brain structures or in any volumetric study where image contrast is sub-optimal, enabling hippocampal structure-function relationships to be explored.

  3. Advances in learning analytics and educational data mining

    NARCIS (Netherlands)

    Vahdat, Mehrnoosh; Ghio, A; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2015-01-01

    The growing interest in recent years towards Learning An- alytics (LA) and Educational Data Mining (EDM) has enabled novel ap- proaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from

  4. Data Mining for Education Decision Support: A Review

    Directory of Open Access Journals (Sweden)

    Suhirman Suhirman

    2014-12-01

    Full Text Available Management of higher education must continue to evaluate on an ongoing basis in order to improve the quality of institutions. This will be able to do the necessary evaluation of various data, information, and knowledge of both internal and external institutions. They plan to use more efficiently the collected data, develop tools so that to collect and direct management information, in order to support managerial decision making. The collected data could be utilized to evaluate quality, perform analyses and diagnoses, evaluate dependability to the standards and practices of curricula and syllabi, and suggest alternatives in decision processes. Data minings to support decision making are well suited methods to provide decision support in the education environments, by generating and presenting relevant information and knowledge towards quality improvement of education processes. In educational domain, this information is very useful since it can be used as a base for investigating and enhancing the current educational standards and managements. In this paper, a review on data mining for academic decision support in education field is presented. The details of this paper will review on recent data mining in educational field and outlines future researches in educational data mining.

  5. Student Privacy and Educational Data Mining: Perspectives from Industry

    Science.gov (United States)

    Sabourin, Jennifer; Kosturko, Lucy; FitzGerald, Clare; McQuiggan, Scott

    2015-01-01

    While the field of educational data mining (EDM) has generated many innovations for improving educational software and student learning, the mining of student data has recently come under a great deal of scrutiny. Many stakeholder groups, including public officials, media outlets, and parents, have voiced concern over the privacy of student data…

  6. A Demonstration of Regression False Positive Selection in Data Mining

    Science.gov (United States)

    Pinder, Jonathan P.

    2014-01-01

    Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…

  7. Data Mining Tools Make Flights Safer, More Efficient

    Science.gov (United States)

    2014-01-01

    A small data mining team at Ames Research Center developed a set of algorithms ideal for combing through flight data to find anomalies. Dallas-based Southwest Airlines Co. signed a Space Act Agreement with Ames in 2011 to access the tools, helping the company refine its safety practices, improve its safety reviews, and increase flight efficiencies.

  8. Data mining for the identification of metabolic syndrome status.

    Science.gov (United States)

    Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2018-01-01

    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS.

  9. Data Mining: A Hybrid Methodology for Complex and Dynamic Research

    Science.gov (United States)

    Lang, Susan; Baehr, Craig

    2012-01-01

    This article provides an overview of the ways in which data and text mining have potential as research methodologies in composition studies. It introduces data mining in the context of the field of composition studies and discusses ways in which this methodology can complement and extend our existing research practices by blending the best of what…

  10. An Application of Data Mining Algorithms for Shipbuilding Cost Estimation

    NARCIS (Netherlands)

    Kaluzny, B.L.; Barbici, S.; Berg, G.; Chiomento, R.; Derpanis,D.; Jonsson, U.; Shaw, R.H.A.D.; Smit, M.C.; Ramaroson, F.

    2011-01-01

    This article presents a novel application of known data mining algorithms to the problem of estimating the cost of ship development and construction. The work is a product of North Atlantic Treaty Organization Research and Technology Organization Systems Analysis and Studies 076 Task Group “NATO

  11. Data mining to detect clinical mastitis with automatic milking

    NARCIS (Netherlands)

    Kamphuis, C.; Mollenhorst, H.; Heesterbeek, J.A.P.; Hogeveen, H.

    2010-01-01

    Our objective was to use data mining to develop and validate a detection model for clinical mastitis (CM) using sensor data collected at nine Dutch dairy herds milking automatically. Sensor data was available for almost 3.5 million quarter milkings (QM) from 1,109 cows; 348 QM with CM were observed

  12. Briefly on the GUHA Method of Data Mining

    Czech Academy of Sciences Publication Activity Database

    Hájek, Petr

    -, č. 3 (2003), s. 112-114 ISSN 1509-4553 R&D Projects: GA MŠk OC 274.001 Grant - others:COST(XE) Action 274 TARSKI Institutional research plan: AV0Z1030915 Keywords : GUHA method * data mining * exploratory data analuysis Subject RIV: BA - General Mathematics http://www.nit.eu/czasopisma/JTIT/2003/3/112.pdf

  13. Data mining, knowledge discovery and data-driven modelling

    NARCIS (Netherlands)

    Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

    2003-01-01

    The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration

  14. Interestingness of association rules in data mining: Issues relevant ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    mental changes in many spheres of our daily life. .... concentrate on association rule mining since it features as one of the main data mining tech- ..... years, a lot of work has been done in defining and quantifying 'interestingness. .... a critical effect on both, selection of interesting events and variation of interestingness thresh-.

  15. DATA MINING. CONCEPTS AND APPLICATIONS IN BANKING SECTOR

    Directory of Open Access Journals (Sweden)

    ADRIAN IONUT PASCU

    2018-02-01

    Full Text Available The concept of banking refers to the multitude of services and products that commercial banks offer to clients and include besides transactional accounts both passive and active products. Due to the increased competitiveness in banking, the relationship between the bank and the client has become an essential factor for the strategy in order to increase customer satisfaction. Currently the banking system is able to store impressive amounts of data that they collect daily, from customer data and transaction details to data on their transactional or risk profile. The process through which large amounts of data are analyzed, extracted, identified and the information obtained using mathematical and statistical models are interpreted is known as data mining. The discovery of knowledge from data involves identifying some models, some patterns with which certain events or possible risks are anticipated. This process helps banks to develop strategies in areas such as customer retention and loyalty, customer satisfaction, fraud detection and prevention, risk management, money laundering prevention. The aim of this paper is to present the concept of data mining and the concept of data discovery (KDD, but also the impact and important use of data mining techniques in the banking sector. This paper explores and reviews various data mining techniques that are applied in the banking sector but also provides insight into how these techniques are used in different areas to make decision-making easier and more efficient.

  16. 78 FR 29055 - State Medicaid Fraud Control Units; Data Mining

    Science.gov (United States)

    2013-05-17

    ... pursue Medicaid provider fraud, we finalize proposals to permit Federal financial participation (FFP) in... general approach to data mining by MFCUs is to give each MFCU the autonomy to choose how to operate its...) to read as follows: Sec. 1007.19 Federal financial participation (FFP). * * * * * (e) * * * (2...

  17. 3D Visual Data Mining: goals and experiences

    DEFF Research Database (Denmark)

    Bøhlen, Michael Hanspeter; Bukauskas, Linas; Eriksen, Poul Svante

    2003-01-01

    , statistical analyses, perceptual and cognitive psychology, and scientific visualization. At the conceptual level we offer perceptual and cognitive insights to guide the information visualization process. We then choose cluster surfaces to exemplify the data mining process, to discuss the tasks involved...

  18. A Data Mining Approach to Modelling of Water Supply Assets

    DEFF Research Database (Denmark)

    Babovic, V.; Drecourt, J.; Keijzer, M.

    2002-01-01

    supply assets are mainly situated underground, and therefore not visible and under the influence of various highly unpredictable forces. This paper proposes the use of advanced data mining methods in order to determine the risks of pipe bursts. For example, analysis of the database of already occurred...

  19. Managing Multiuser Database Buffers Using Data Mining Techniques

    NARCIS (Netherlands)

    Feng, L.; Lu, H.J.

    2004-01-01

    In this paper, we propose a data-mining-based approach to public buffer management for a multiuser database system, where database buffers are organized into two areas – public and private. While the private buffer areas contain pages to be updated by particular users, the public

  20. Modeling issues & choices in the data mining optimization ontology

    CSIR Research Space (South Africa)

    Keet, CM

    2013-05-01

    Full Text Available We describe the Data Mining Optimization Ontology (DMOP), which was developed to support informed decision-making at various choice points of the knowledge discovery (KD) process. It can be used as a reference by data miners, but its primary purpose...

  1. Dengue fatality prediction using data mining | Rahim | Journal of ...

    African Journals Online (AJOL)

    The aim of this research is to study the current implementation of dengue outbreak control in Malaysia and predict dengue fever cases using data mining techniques. Real data on dengue fever and weather are collected from the Ministry of Health in its Perak Tengah district office and Perak Meteorological office respectively ...

  2. Separation in Data Mining Based on Fractal Nature of Data

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2013-01-01

    Roč. 3, č. 1 (2013), s. 44-60 ISSN 2225-658X Institutional support: RVO:67985807 Keywords : nearest neighbor * fractal set * multifractal * IINC method * correlation dimension Subject RIV: JC - Computer Hardware ; Software http://sdiwc.net/digital-library/separation-in-data-mining-based-on-fractal-nature-of-data.html

  3. APLIKASI DATA MINING UNTUK MENAMPILKAN INFORMASI TINGKAT KELULUSAN MAHASISWA

    Directory of Open Access Journals (Sweden)

    Yuli Asriningtias

    2014-01-01

    Full Text Available Perguruan tinggi dituntut memiliki keunggulan bersaing dengan memanfaatkan sumber dayanya, termasuk sumber daya manusia dalam hal ini adalah mahasiswa.Tidak semua mahasiswa dapat menyelesaikan study tepat waktu, disamping  IPK yang beragam. Lama waktu mahasiswa dalam menempuh studi dan IPK menjadi salah satu faktor tingkat keunggulan sebuah Perguruan Tinggi.  Nilai potensi tersebut dapat digali menggunakan teknik data mining.Data mining adalah kegiatan menemukan pola yang menarik dari data dalam jumlah besar, data dapat disimpan dalam database, data warehouse, atau penyimpanan informasi lainnya. Data warehouse merupakan penyimpanan data yang berorientasi objek, terintegrasi, mempunyai variant waktu, dan menyimpan data dalam bentuk nonvolatile sebagai pendukung manejemen dalam proses pengambilan keputusan. Penelitian ini dikembangkan dengan cara menscan data pada database secara langsung sehingga menghasilkan informasi yag dibutuhkan. Aplikasi data mining ini dibangun menggunakan bahasa pemrograman Borland Delphi 7 dan menggunakan database SQL Server 2000 sebagai media penyimpan data. Hasil dari penelitian bahwa dapat diketahui tingkat ketepatan waktu dan nilai kelulusan mahasiswa yang berelasi dengan atribut data masuk mahasiswa. Kata Kunci : Data mining, data warehouse, kelulusan mahasiswa.

  4. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    Science.gov (United States)

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  5. Highlights of recent articles on data mining in genomics & proteomics

    Science.gov (United States)

    This editorial elaborates on investigations consisting of different “OMICS” technologies and their application to biological sciences. In addition, advantages and recent development of the proteomic, genomic and data mining technologies are discussed. This information will be useful to scientists ...

  6. Recommendation in Higher Education Using Data Mining Techniques

    Science.gov (United States)

    Vialardi, Cesar; Bravo, Javier; Shafti, Leila; Ortigosa, Alvaro

    2009-01-01

    One of the main problems faced by university students is to take the right decision in relation to their academic itinerary based on available information (for example courses, schedules, sections, classrooms and professors). In this context, this work proposes the use of a recommendation system based on data mining techniques to help students to…

  7. Data Mining in Earth System Science (DMESS 2011)

    Science.gov (United States)

    Forrest M. Hoffman; J. Walter Larson; Richard Tran Mills; Bhorn-Gustaf Brooks; Auroop R. Ganguly; William Hargrove; et al

    2011-01-01

    From field-scale measurements to global climate simulations and remote sensing, the growing body of very large and long time series Earth science data are increasingly difficult to analyze, visualize, and interpret. Data mining, information theoretic, and machine learning techniques—such as cluster analysis, singular value decomposition, block entropy, Fourier and...

  8. Feature extraction for classification in the data mining process

    NARCIS (Netherlands)

    Pechenizkiy, M.; Puuronen, S.; Tsymbal, A.

    2003-01-01

    Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". Three different eigenvector-based feature extraction approaches

  9. Model Validation and Verification of Data Mining from the ...

    African Journals Online (AJOL)

    Michael Horsfall

    In this paper, we seek to present a hybrid method for Model Validation and Verification of Data Mining from the ... This model generally states the numerical value of knowledge .... procedures found in the field of software engineering should be ...

  10. Data mining for the identification of metabolic syndrome status

    Science.gov (United States)

    Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2018-01-01

    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS. PMID:29383020

  11. A framework for query optimization to support data mining

    NARCIS (Netherlands)

    S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

    1996-01-01

    textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization

  12. A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining

    Energy Technology Data Exchange (ETDEWEB)

    Kevin McCarthy; Milos Manic

    2012-08-01

    Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.

  13. Drug safety data mining with a tree-based scan statistic.

    Science.gov (United States)

    Kulldorff, Martin; Dashevsky, Inna; Avery, Taliser R; Chan, Arnold K; Davis, Robert L; Graham, David; Platt, Richard; Andrade, Susan E; Boudreau, Denise; Gunter, Margaret J; Herrinton, Lisa J; Pawloski, Pamala A; Raebel, Marsha A; Roblin, Douglas; Brown, Jeffrey S

    2013-05-01

    In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.

  14. Applying data mining techniques to improve diagnosis in neonatal jaundice

    Directory of Open Access Journals (Sweden)

    Ferreira Duarte

    2012-12-01

    Full Text Available Abstract Background Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies. Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. Methods This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology. This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa – EPE, from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer. Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48 and neural networks (multilayer perceptron. The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. Results The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. Conclusions The findings of our study sustain that, new approaches, such as data mining, may support

  15. Data Mining Web Services for Science Data Repositories

    Science.gov (United States)

    Graves, S.; Ramachandran, R.; Keiser, K.; Maskey, M.; Lynnes, C.; Pham, L.

    2006-12-01

    The maturation of web services standards and technologies sets the stage for a distributed "Service-Oriented Architecture" (SOA) for NASA's next generation science data processing. This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the Internet. NASA has initiated a project to create a suite of specialized data mining web services designed specifically for science data. The project leverages the Algorithm Development and Mining (ADaM) toolkit as its basis. The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by several research organizations and educational institutions worldwide. These mining services will give the scientific community a powerful and versatile data mining capability that can be used to create higher order products such as thematic maps from current and future NASA satellite data records with methods that are not currently available. The package of mining and related services are being developed using Web Services standards so that community-based measurement processing systems can access and interoperate with them. These standards-based services allow users different options for utilizing them, from direct remote invocation by a client application to deployment of a Business Process Execution Language (BPEL) solutions package where a complex data mining workflow is exposed to others as a single service. The ability to deploy and operate these services at a data archive allows the data mining algorithms to be run where the data are stored, a more efficient scenario than moving large amounts of data over the network. This will be demonstrated in a scenario in which a user uses a remote Web-Service-enabled clustering algorithm to create cloud masks from satellite imagery at the Goddard Earth Sciences Data and Information Services Center (GES DISC).

  16. Is Europe Falling Behind in Data Mining? Copyright’s Impact on Data Mining in Academic Research

    NARCIS (Netherlands)

    Handke, C.; Guibault, L.; Vallbé, J.J.; Schmidt, B.; Dobreva, M.

    2015-01-01

    With the diffusion of digital information technology, data mining (DM) is widely expected to increase the productivity of all kinds of research activities. Based on bibliometric data, we demonstrate that the share of DM-related research articles in all published academic papers has increased

  17. Identification of quality markers of Yuanhu Zhitong tablets based on integrative pharmacology and data mining.

    Science.gov (United States)

    Li, Ke; Li, Junfang; Su, Jin; Xiao, Xuefeng; Peng, Xiujuan; Liu, Feng; Li, Defeng; Zhang, Yi; Chong, Tao; Xu, Haiyu; Liu, Changxiao; Yang, Hongjun

    2018-03-07

    The quality evaluation of traditional Chinese medicine (TCM) formulations is needed to guarantee the safety and efficacy. In our laboratory, we established interaction rules between chemical quality control and biological activity evaluations to study Yuanhu Zhitong tablets (YZTs). Moreover, a quality marker (Q-marker) has recently been proposed as a new concept in the quality control of TCM. However, no appropriate methods are available for the identification of Q-markers from the complex TCM systems. We aimed to use an integrative pharmacological (IP) approach to further identify Q-markers from YZTs through the integration of multidisciplinary knowledge. In addition, data mining was used to determine the correlation between multiple constituents of this TCM and its bioactivity to improve quality control. The IP approach was used to identify the active constituents of YZTs and elucidate the molecular mechanisms by integrating chemical and biosynthetic analyses, drug metabolism, and network pharmacology. Data mining methods including grey relational analysis (GRA) and least squares support vector machine (LS-SVM) regression techniques, were used to establish the correlations among the constituents and efficacy, and dose efficacy in multiple dimensions. Seven constituents (tetrahydropalmatine, α-allocryptopine, protopine, corydaline, imperatorin, isoimperatorin, and byakangelicin) were identified as Q-markers of YZT using IP based on their high abundance, specific presence in the individual herbal constituents and the product, appropriate drug-like properties, and critical contribution to the bioactivity of the mixture of YZT constituents. Moreover, three Q-markers (protopine, α-allocryptopine, and corydaline) were highly correlated with the multiple bioactivities of the YZTs, as found using data mining. Finally, three constituents (tetrahydropalmatine, corydaline, and imperatorin) were chosen as minimum combinations that both distinguished the authentic

  18. Post-Acquisition Release of Glutamate and Norepinephrine in the Amygdala Is Involved in Taste-Aversion Memory Consolidation

    Science.gov (United States)

    Guzman-Ramos, Kioko; Osorio-Gomez, Daniel; Moreno-Castilla, Perla; Bermudez-Rattoni, Federico

    2012-01-01

    Amygdala activity mediates the acquisition and consolidation of emotional experiences; we have recently shown that post-acquisition reactivation of this structure is necessary for the long-term storage of conditioned taste aversion (CTA). However, the specific neurotransmitters involved in such reactivation are not known. The aim of the present…

  19. Using Advanced Data Mining And Integration In Environmental Prediction Scenarios

    Directory of Open Access Journals (Sweden)

    Habala Ondrej

    2012-01-01

    Full Text Available We present one of the meteorological and hydrological experiments performed in the FP7 project ADMIRE. It serves as an experimental platform for hydrologists, and we have used it also as a testing platform for a suite of advanced data integration and data mining (DMI tools, developed within ADMIRE. The idea of ADMIRE is to develop an advanced DMI platform accessible even to users who are not familiar with data mining techniques. To this end, we have designed a novel DMI architecture, supported by a set of software tools, managed by DMI process descriptions written in a specialized high-level DMI language called DISPEL, and controlled via several different user interfaces, each performing a different set of tasks and targeting different user group.

  20. USING ADVANCED DATA MINING AND INTEGRATION IN ENVIRONMENTAL PREDICTION SCENARIOS

    Directory of Open Access Journals (Sweden)

    Ondrej Habala

    2012-01-01

    Full Text Available We present one of the meteorological and hydrological experiments performed inthe FP7 project ADMIRE. It serves as an experimental platform for hydrologists,and we have used it also as a testing platform for a suite of advanced dataintegration and data mining (DMI tools, developed within ADMIRE. The ideaof ADMIRE is to develop an advanced DMI platform accessible even to userswho are not familiar with data mining techniques. To this end, we have designeda novel DMI architecture, supported by a set of software tools, managed by DMIprocess descriptions written in a specialized high-level DMI language calledDISPEL, and controlled via several different user interfaces, each performinga different set of tasks and targeting different user group.

  1. Data warehousing and data mining: A case study

    Directory of Open Access Journals (Sweden)

    Suknović Milija

    2005-01-01

    Full Text Available This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS, University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.

  2. Educational data mining: a sample of review and study case

    Directory of Open Access Journals (Sweden)

    Alejandro Pena, Rafael Domínguez, Jose de Jesus Medel

    2009-12-01

    Full Text Available The aim of this work is to encourage the research in a novel merged field: Educational data mining (EDM. Thereby, twosubjects are outlined: The first one corresponds to a review of data mining (DM methods and EDM applications. Thesecond topic represents an EDM study case. As a result of the application of DM in Web-based Education Systems (WBES,stratified groups of students were found during a trial. Such groups reveal key attributes of volunteers that deserted orremained during a WBES experiment. This kind of discovered knowledge inspires the statement of correlational hypothesisto set relations between attributes and behavioral patterns of WBES users. We concluded that: When EDM findings aretaken into account for designing and managing WBES, the learning objectives are improved

  3. DATA MINING IN EDUCATION: CURRENT STATE AND PERSPECTIVES OF DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    Yurii O. Kovalchuk

    2016-01-01

    Full Text Available The main tasks (classification and regression, association rules, clustering and the basic principles of the Data Mining algorithms in the context of their use for a variety of research in the field of education which are the subject of a relatively new independent direction Educational Data Mining are considered. The findings about the most popular topics of research within this area as well as the perspectives of its development are presented. Presentation of the material is illustrated by simple examples. This article is intended for readers who are engaged in research in the field of education at various levels, especially those involved in the use of e-learning systems, but little familiar with this area of data analysis.

  4. Extracting software static defect models using data mining

    Directory of Open Access Journals (Sweden)

    Ahmed H. Yousef

    2015-03-01

    Full Text Available Large software projects are subject to quality risks of having defective modules that will cause failures during the software execution. Several software repositories contain source code of large projects that are composed of many modules. These software repositories include data for the software metrics of these modules and the defective state of each module. In this paper, a data mining approach is used to show the attributes that predict the defective state of software modules. Software solution architecture is proposed to convert the extracted knowledge into data mining models that can be integrated with the current software project metrics and bugs data in order to enhance the prediction. The results show better prediction capabilities when all the algorithms are combined using weighted votes. When only one individual algorithm is used, Naïve Bayes algorithm has the best results, then the Neural Network and the Decision Trees algorithms.

  5. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  6. High Performance Data mining by Genetic Neural Network

    Directory of Open Access Journals (Sweden)

    Dadmehr Rahbari

    2013-10-01

    Full Text Available Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Most methods for mining problems is based on artificial intelligence algorithms. Neural network optimization based on three basic parameters topology, weights and the learning rate is a powerful method. We introduce optimal method for solving this problem. In this paper genetic algorithm with mutation and crossover operators change the network structure and optimized that. Dataset used for our work is stroke disease with twenty features that optimized number of that achieved by new hybrid algorithm. Result of this work is very well incomparison with other similar method. Low present of error show that our method is our new approach to efficient, high-performance data mining problems is introduced.

  7. Identifying Drug–Drug Interactions by Data Mining

    DEFF Research Database (Denmark)

    Hansen, Peter Wæde; Clemmensen, Line Katrine Harder; Sehested, Thomas S.G.

    2016-01-01

    Background—Knowledge about drug–drug interactions commonly arises from preclinical trials, from adverse drug reports, or based on knowledge of mechanisms of action. Our aim was to investigate whether drug–drug interactions were discoverable without prior hypotheses using data mining. We focused...... registries. Additionally, we discovered a few potentially novel interactions. This opens up for the use of data mining to discover unknown drug–drug interactions in cardiovascular medicine....... on warfarin–drug interactions as the prototype. Methods and Results—We analyzed altered prothrombin time (measured as international normalized ratio [INR]) after initiation of a novel prescription in previously INR-stable warfarin-treated patients with nonvalvular atrial fibrillation. Data sets were retrieved...

  8. CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy

    Science.gov (United States)

    Ball, N. M.

    2013-10-01

    To-date, computing systems have allowed either sophisticated analysis of small datasets, as exemplified by most astronomy software, or simple analysis of large datasets, such as database queries. At the Canadian Astronomy Data Centre, we have combined our cloud computing system, the Canadian Advanced Network for Astronomical Research (CANFAR), with the world's most advanced machine learning software, Skytree, to create the world's first cloud computing system for data mining in astronomy. CANFAR provides a generic environment for the storage and processing of large datasets, removing the requirement for an individual or project to set up and maintain a computing system when implementing an extensive undertaking such as a survey pipeline. 500 processor cores and several hundred terabytes of persistent storage are currently available to users, and both the storage and processing infrastructure are expandable. The storage is implemented via the International Virtual Observatory Alliance's VOSpace protocol, and is available as a mounted filesystem accessible both interactively, and to all processing jobs. The user interacts with CANFAR by utilizing virtual machines, which appear to them as equivalent to a desktop. Each machine is replicated as desired to perform large-scale parallel processing. Such an arrangement enables the user to immediately install and run the same astronomy code that they already utilize, in the same way as on a desktop. In addition, unlike many cloud systems, batch job scheduling is handled for the user on multiple virtual machines by the Condor job queueing system. Skytree is installed and run just as any other software on the system, and thus acts as a library of command line data mining functions that can be integrated into one's wider analysis. Thus we have created a generic environment for large-scale analysis by data mining, in the same way that CANFAR itself has done for storage and processing. Because Skytree scales to large data in

  9. Review of Data Mining Techniques for Churn Prediction in Telecom

    OpenAIRE

    Vishal Mahajan; Richa Misra; Renuka Mahajan

    2015-01-01

    Telecommunication sector generates a huge amount of data due to increasing number of subscribers, rapidly renewable technologies; data based applications and other value added service. This data can be usefully mined for churn analysis and prediction. Significant research had been undertaken by researchers worldwide to understand the data mining practices that can be used for predicting customer churn. This paper provides a review of around 100 recent journal articles starting from year 2000 ...

  10. Data Mining for ISHM of Liquid Rocket Propulsion Status Update

    Science.gov (United States)

    Srivastava, Ashok; Schwabacher, Mark; Oza, Nijunj; Martin, Rodney; Watson, Richard; Matthews, Bryan

    2006-01-01

    This document consists of presentation slides that review the current status of data mining to support the work with the Integrated Systems Health Management (ISHM) for the systems associated with Liquid Rocket Propulsion. The aim of this project is to have test stand data from Rocketdyne to design algorithms that will aid in the early detection of impending failures during operation. These methods will be extended and improved for future platforms (i.e., CEV/CLV).

  11. Visual data mining for developing competitive strategies in higher education

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal

    2009-01-01

    Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive benchmarking, and planning for High School Relationship Management (HSRM). In this paper the f...

  12. Using data mining techniques to characterize participation in observational studies.

    Science.gov (United States)

    Linden, Ariel; Yarnold, Paul R

    2016-12-01

    Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention. © 2016 John Wiley & Sons, Ltd.

  13. Evolutionary Data Mining Approach to Creating Digital Logic

    Science.gov (United States)

    2010-01-01

    To deal with this problem a genetic program (GP) based data mining ( DM ) procedure has been invented (Smith 2005). A genetic program is an algorithm...that can operate on the variables. When a GP was used as a DM function in the past to automatically create fuzzy decision trees, the Report...rules represents an approach to the determining the effect of linguistic imprecision, i.e., the inability of experts to provide crisp rules. The

  14. Survey of Insurance Fraud Detection Using Data Mining Techniques

    OpenAIRE

    Sithic, H. Lookman; Balasubramanian, T.

    2013-01-01

    With an increase in financial accounting fraud in the current economic scenario experienced, financial accounting fraud detection has become an emerging topics of great importance for academics, research and industries. Financial fraud is a deliberate act that is contrary to law, rule or policy with intent to obtain unauthorized financial benefit and intentional misstatements or omission of amounts by deceiving users of financial statements, especially investors and creditors. Data mining tec...

  15. DATA MINING IN HIGHER EDUCATION : UNIVERSITY STUDENT DROPOUT CASE STUDY

    OpenAIRE

    Ghadeer S. Abu-Oda; Alaa M. El-Halees

    2015-01-01

    In this paper, we apply different data mining approaches for the purpose of examining and predicting students’ dropouts through their university programs. For the subject of the study we select a total of 1290 records of computer science students Graduated from ALAQSA University between 2005 and 2011. The collected data included student study history and transcript for courses taught in the first two years of computer science major in addition to student GPA , high school average ...

  16. WEKA-G: Parallel data mining on computational grids

    Directory of Open Access Journals (Sweden)

    PIMENTA, A.

    2009-12-01

    Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.

  17. An Overview on Data Mining of Nighttime Light Remote Sensing

    Directory of Open Access Journals (Sweden)

    LI Deren

    2015-06-01

    Full Text Available When observing the Earth from above at night, it is clear that the human settlement and major economic regions emit glorious light. At cloud-free nights, some remote sensing satellites can record visible radiance source, including city light, fishing boat light and fire, and these nighttime cloud-free images are remotely sensed nighttime light images. Different from daytime remote sensing, nighttime light remote sensing provides a unique perspective on human social activities, thus it has been widely used for spatial data mining of socioeconomic domains. Historically, researches on nighttime light remote sensing mostly focus on urban land cover and urban expansion mapping using DMSP/OLS imagery, but the nighttime light images are not the unique remote sensing source to do these works. Through decades of development of nighttime light product, the nighttime light remote sensing application has been extended to numerous interesting and scientific study domains such as econometrics, poverty estimation, light pollution, fishery and armed conflict. Among the application cases, it is surprising to see the Gross Domestic Production (GDP data can be corrected using the nighttime light data, and it is interesting to see mechanism of several diseases can be revealed by nighttime light images, while nighttime light are the unique remote sensing source to do the above works. As the nighttime light remote sensing has numerous applications, it is important to summarize the application of nighttime light remote sensing and its data mining fields. This paper introduced major satellite platform and sensors for observing nighttime light at first. Consequently, the paper summarized the progress of nighttime light remote sensing data mining in socioeconomic parameter estimation, urbanization monitoring, important event evaluation, environmental and healthy effects, fishery dynamic mapping, epidemiological research and natural gas flaring monitoring. Finally, future

  18. Using Copulas in Data Mining Based on the Observational Calculus

    Czech Academy of Sciences Publication Activity Database

    Holeňa, Martin; Bajer, L.; Ščavnický, M.

    2015-01-01

    Roč. 27, č. 10 (2015), s. 2851-2864 ISSN 1041-4347 R&D Projects: GA ČR GA13-17187S Grant - others:SLU(CZ) SGS/21/2014 Institutional support: RVO:67985807 Keywords : data mining * observational calculus * generalized quantifiers * joint probability distribution * copulas * hierarchical Archimedean copulas Subject RIV: IN - Informatics, Computer Science Impact factor: 2.476, year: 2015

  19. Data Analysis and Data Mining: Current Issues in Biomedical Informatics

    Science.gov (United States)

    Bellazzi, Riccardo; Diomidous, Marianna; Sarkar, Indra Neil; Takabayashi, Katsuhiko; Ziegler, Andreas; McCray, Alexa T.

    2011-01-01

    Summary Background Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. Objectives To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. Results The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. Conclusions Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers. PMID:22146916

  20. Data Mining Application in Customer Relationship Management for Hospital Inpatients

    OpenAIRE

    Lee, Eun Whan

    2012-01-01

    Objectives This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. Methods A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services us...

  1. Using Data Mining to Predict Possible Future Depression Cases

    OpenAIRE

    Daimi, Kevin; Banitaan, Shadi

    2014-01-01

    Depression is a disorder characterized by misery and gloominess felt over a period of time. Some symptoms of depression overlap with somatic illnesses implying considerable difficulty in diagnosing it. This paper contributes to its diagnosis through the application of data mining, namely classification, to predict patients who will most likely develop depression or are currently suffering from depression. Synthetic data is used for this study. To acquire the results, the popular suite of mach...

  2. 2nd International Conference on Soft Computing and Data Mining

    CERN Document Server

    Ghazali, Rozaida; Nawi, Nazri; Deris, Mustafa

    2017-01-01

    This book provides a comprehensive introduction and practical look at the concepts and techniques readers need to get the most out of their data in real-world, large-scale data mining projects. It also guides readers through the data-analytic thinking necessary for extracting useful knowledge and business value from the data. The book is based on the Soft Computing and Data Mining (SCDM-16) conference, which was held in Bandung, Indonesia on August 18th–20th 2016 to discuss the state of the art in soft computing techniques, and offer participants sufficient knowledge to tackle a wide range of complex systems. The scope of the conference is reflected in the book, which presents a balance of soft computing techniques and data mining approaches. The two constituents are introduced to the reader systematically and brought together using different combinations of applications and practices. It offers engineers, data analysts, practitioners, scientists and managers the insights into the concepts, tools and techni...

  3. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    Science.gov (United States)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.

  4. SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    H Benjamin Fredrick David

    2017-04-01

    Full Text Available Data Mining is the procedure which includes evaluating and examining large pre-existing databases in order to generate new information which may be essential to the organization. The extraction of new information is predicted using the existing datasets. Many approaches for analysis and prediction in data mining had been performed. But, many few efforts has made in the criminology field. Many few have taken efforts for comparing the information all these approaches produce. The police stations and other similar criminal justice agencies hold many large databases of information which can be used to predict or analyze the criminal movements and criminal activity involvement in the society. The criminals can also be predicted based on the crime data. The main aim of this work is to perform a survey on the supervised learning and unsupervised learning techniques that has been applied towards criminal identification. This paper presents the survey on the Crime analysis and crime prediction using several Data Mining techniques.

  5. Data mining application in customer relationship management for hospital inpatients.

    Science.gov (United States)

    Lee, Eun Whan

    2012-09-01

    This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM.

  6. Developing and Implementing the Data Mining Algorithms in RAVEN

    International Nuclear Information System (INIS)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian

    2015-01-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  7. Statistical and Visualization Data Mining Tools for Foundry Production

    Directory of Open Access Journals (Sweden)

    M. Perzyk

    2007-07-01

    Full Text Available In recent years a rapid development of a new, interdisciplinary knowledge area, called data mining, is observed. Its main task is extracting useful information from previously collected large amount of data. The main possibilities and potential applications of data mining in manufacturing industry are characterized. The main types of data mining techniques are briefly discussed, including statistical, artificial intelligence, data base and visualization tools. The statistical methods and visualization methods are presented in more detail, showing their general possibilities, advantages as well as characteristic examples of applications in foundry production. Results of the author’s research are presented, aimed at validation of selected statistical tools which can be easily and effectively used in manufacturing industry. A performance analysis of ANOVA and contingency tables based methods, dedicated for determination of the most significant process parameters as well as for detection of possible interactions among them, has been made. Several numerical tests have been performed using simulated data sets, with assumed hidden relationships as well some real data, related to the strength of ductile cast iron, collected in a foundry. It is concluded that the statistical methods offer relatively easy and fairly reliable tools for extraction of that type of knowledge about foundry manufacturing processes. However, further research is needed, aimed at explanation of some imperfections of the investigated tools as well assessment of their validity for more complex tasks.

  8. Developing and Implementing the Data Mining Algorithms in RAVEN

    Energy Technology Data Exchange (ETDEWEB)

    Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  9. Data mining application in industrial energy audit for lighting

    Energy Technology Data Exchange (ETDEWEB)

    Maricar, N.M.; Kim, G.C.; Jamal, N. [Kolej Univ., Melaka (Malaysia). Faculty of Electrical Engineering

    2005-07-01

    A data mining application for lighting energy audits at industrial sites was presented. Data collection was based on the parameters needed for the analysis part of the audit. Data collection included the activity for which the room was used; its dimension; light level readings in lux; the number of luminaries; the number of lamps per luminaries; lamp fixtures; and lamp wattage. The lumen method was used to calculate the recommended numbers of luminaries in the room. The number was then compared with the existing system's luminaries. The installed load efficacy ratio (ILER) was then used to determine proper retrofit action to maximize energy usage. The difference between the calculated lux and the standard lux was used to create data subsets. A data mining algorithm was used to determine that the ILER plays an important role in calculating the efficiency of lighting systems. It was also concluded that the method can be used to minimize the time needed to analyze large amounts of lighting data. The results of case studies were also used to show that the combined data mining algorithm provided accurate assessments using existing calculated data. 7 refs., 8 tabs., 5 figs.

  10. Improving clinical decision support using data mining techniques

    Science.gov (United States)

    Burn-Thornton, Kath E.; Thorpe, Simon I.

    1999-02-01

    Physicians, in their ever-demanding jobs, are looking to decision support systems for aid in clinical diagnosis. However, clinical decision support systems need to be of sufficiently high accuracy that they help, rather than hinder, the physician in his/her diagnosis. Decision support systems with accuracies, of patient state determination, of greater than 80 percent, are generally perceived to be sufficiently accurate to fulfill the role of helping the physician. We have previously shown that data mining techniques have the potential to provide the underpinning technology for clinical decision support systems. In this paper, an extension of the work in reverence 2, we describe how changes in data mining methodologies, for the analysis of 12-lead ECG data, improve the accuracy by which data mining algorithms determine which patients are suffering from heart disease. We show that the accuracy of patient state prediction, for all the algorithms, which we investigated, can be increased by up to 6 percent, using the combination of appropriate test training ratios and 5-fold cross-validation. The use of cross-validation greater than 5-fold, appears to reduce the improvement in algorithm classification accuracy gained by the use of this validation method. The accuracy of 84 percent in patient state predictions, obtained using the algorithm OCI, suggests that this algorithm will be capable of providing the required accuracy for clinical decision support systems.

  11. Data Mining Application in Customer Relationship Management for Hospital Inpatients

    Science.gov (United States)

    2012-01-01

    Objectives This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. Methods A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Results Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. Conclusions To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM. PMID:23115740

  12. Knowledge Discovery and Data Mining in Iran's Climatic Researches

    Science.gov (United States)

    Karimi, Mostafa

    2013-04-01

    Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for

  13. A knowledge discovery approach to urban analysis: Beyoglu Preservation Area as a data mine

    Directory of Open Access Journals (Sweden)

    Ahu Sokmenoglu Sohtorik

    2017-11-01

    defined by Fayyad, Piatetsky-Shapiro, and Smyth (1996b. The model describes a semi-automated process of database formulation, analysis and evaluation for extracting information patterns and relationships from raw data by combining both GIS and data mining functionalities in a complementary way. The KDPM for urban analysis suggests that GIS functionalities can be used to formulate a database, and GIS and data mining can complement each other in analyzing the database and evaluating the outcomes. The model illustrates that the output of a GIS platform can become the input for a data mining platform and vice versa, resulting in an interlinked analytical process which allows for a more sophisticated analysis of urban data. To investigate the second and third research questions, firstly the KDPM for urban analysis was further developed to construct a GIS database of the Beyoğlu Preservation Area from the thematic maps. Then, three implementations were performed using this GIS database; the Beyoğlu Preservation Area Building Features Database consisting of multiple features attributed to the buildings. In Implementation (1, the KDPM for urban analysis was used to investigate a variety of patterns and relationships that can be extracted from the database using three different data mining methods. In Implementations (2 and (3, the KDPM for urban analysis was implemented to test how the knowledge discovery approach through data mining proposed in this thesis can assist in developing draft plans for the regeneration of a run-down neighbourhood in the Beyoğlu Preservation Area (Tarlabaşı. In Implementation (2, the KDPM for urban analysis is implemented in combination with an evolutionary process to apply a regeneration approach developed by the author; a computational process which generates draft plans for ground floor use, user-profile and tenure-type allocation was developed. In Implementation (3, students applied the KDPM for urban analysis during the course of an

  14. A Novel Visual Data Mining Module for the Geographical Information System gvSIG

    Directory of Open Access Journals (Sweden)

    Romel Vázquez-Rodríguez

    2013-01-01

    Full Text Available The exploration of large GIS models containing spatio-temporal information is a challenge. In this paper we propose the integration of scientific visualization (ScVis techniques into geographic information systems (GIS as an alternative for the visual analysis of data. Providing GIS with such tools improves the analysis and understanding of datasets with very low spatial density and allows to find correlations between variables in time and space. In this regard, we present a new visual data mining tool for the GIS gvSIG. This tool has been implemented as a gvSIG module and contains several ScVis techniques for multiparameter data with a wide range of possibilities to explore interactively the data. The developed module is a powerful visual data mining and data visualization tool to obtain knowledge from multiple datasets in time and space. A real case study with meteorological data from Villa Clara province (Cuba is presented, where the implemented visualization techniques were used to analyze the available datasets. Although it is tested with meteorological data, the developed module is of general application in the sense that it can be used in multiple application fields related with Earth Sciences.

  15. The Potentials of Educational Data Mining for Researching Metacognition, Motivation and Self-Regulated Learning

    Science.gov (United States)

    Winne, Philip H.; Baker, Ryan S. J. D.

    2013-01-01

    Our article introduces the "Journal of Educational Data Mining's" Special Issue on Educational Data Mining on Motivation, Metacognition, and Self-Regulated Learning. We outline general research challenges for data mining researchers who conduct investigations in these areas, the potential of EDM to advance research in this area, and…

  16. Data Mining Methods to Generate Severe Wind Gust Models

    Directory of Open Access Journals (Sweden)

    Subana Shanmuganathan

    2014-01-01

    Full Text Available Gaining knowledge on weather patterns, trends and the influence of their extremes on various crop production yields and quality continues to be a quest by scientists, agriculturists, and managers. Precise and timely information aids decision-making, which is widely accepted as intrinsically necessary for increased production and improved quality. Studies in this research domain, especially those related to data mining and interpretation are being carried out by the authors and their colleagues. Some of this work that relates to data definition, description, analysis, and modelling is described in this paper. This includes studies that have evaluated extreme dry/wet weather events against reported yield at different scales in general. They indicate the effects of weather extremes such as prolonged high temperatures, heavy rainfall, and severe wind gusts. Occurrences of these events are among the main weather extremes that impact on many crops worldwide. Wind gusts are difficult to anticipate due to their rapid manifestation and yet can have catastrophic effects on crops and buildings. This paper examines the use of data mining methods to reveal patterns in the weather conditions, such as time of the day, month of the year, wind direction, speed, and severity using a data set from a single location. Case study data is used to provide examples of how the methods used can elicit meaningful information and depict it in a fashion usable for management decision making. Historical weather data acquired between 2008 and 2012 has been used for this study from telemetry devices installed in a vineyard in the north of New Zealand. The results show that using data mining techniques and the local weather conditions, such as relative pressure, temperature, wind direction and speed recorded at irregular intervals, can produce new knowledge relating to wind gust patterns for vineyard management decision making.

  17. Applying Fuzzy Data Mining to Telecom Churn Management

    Science.gov (United States)

    Liao, Kuo-Hsiung; Chueh, Hao-En

    Customers tend to change telecommunications service providers in pursuit of more favorable telecommunication rates. Therefore, how to avoid customer churn is an extremely critical topic for the intensely competitive telecommunications industry. To assist telecommunications service providers in effectively reducing the rate of customer churn, this study used fuzzy data mining to determine effective marketing strategies by analyzing the responses of customers to various marketing activities. These techniques can help telecommunications service providers determine the most appropriate marketing opportunities and methods for different customer groups, to reduce effectively the rate of customer turnover.

  18. Data Mining Activities for Bone Discipline - Current Status

    Science.gov (United States)

    Sibonga, J. D.; Pietrzyk, R. A.; Johnston, S. L.; Arnaud, S. B.

    2008-01-01

    The disciplinary goals of the Human Research Program are broadly discussed. There is a critical need to identify gaps in the evidence that would substantiate a skeletal health risk during and after spaceflight missions. As a result, data mining activities will be engaged to gather reviews of medical data and flight analog data and to propose additional measures and specific analyses. Several studies are briefly reviewed which have topics that partially address these gaps in knowledge, including bone strength recovery with recovery of bone mass density, current renal stone formation knowledge, herniated discs, and a review of bed rest studies conducted at Ames Human Research Facility.

  19. Mathematical tools for data mining set theory, partial orders, combinatorics

    CERN Document Server

    Simovici, Dan A

    2014-01-01

    Data mining essentially relies on several mathematical disciplines, many of which are presented in this second edition of this book. Topics include partially ordered sets, combinatorics, general topology, metric spaces, linear spaces, graph theory. To motivate the reader a significant number of applications of these mathematical tools are included ranging from association rules, clustering algorithms, classification, data constraints, logical data analysis, etc. The book is intended as a reference for researchers and graduate students. The current edition is a significant expansion of the firs

  20. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  1. Astroinformatics, data mining and the future of astronomical research

    Energy Technology Data Exchange (ETDEWEB)

    Brescia, Massimo, E-mail: longo@na.infn.it [INAF, Astronomical Obs. of Capodimonte, Via Moiariello 16, I-80131 Napoli (Italy); Longo, Giuseppe [Department of Physics, University Federico II, Via Cintia 6, 80126 Napoli (Italy); Department of Astronomy, Caltech, Pasadena (United States)

    2013-08-21

    Astronomy, as many other scientific disciplines, is facing a true data deluge which is bound to change both the praxis and the methodology of every day research work. The emerging field of astroinformatics, while on the one end appears crucial to face the technological challenges, on the other is opening new exciting perspectives for new astronomical discoveries through the implementation of advanced data mining procedures. The complexity of astronomical data and the variety of scientific problems, however, call for innovative algorithms and methods as well as for an extreme usage of ICT technologies.

  2. Ensemble Methods in Data Mining Improving Accuracy Through Combining Predictions

    CERN Document Server

    Seni, Giovanni

    2010-01-01

    This book is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques. The authors are industry experts in data mining and machine learning who are also adjunct professors and popular speakers. Although e

  3. Data mining practical machine learning tools and techniques

    CERN Document Server

    Witten, Ian H

    2005-01-01

    As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same

  4. Astroinformatics, data mining and the future of astronomical research

    International Nuclear Information System (INIS)

    Brescia, Massimo; Longo, Giuseppe

    2013-01-01

    Astronomy, as many other scientific disciplines, is facing a true data deluge which is bound to change both the praxis and the methodology of every day research work. The emerging field of astroinformatics, while on the one end appears crucial to face the technological challenges, on the other is opening new exciting perspectives for new astronomical discoveries through the implementation of advanced data mining procedures. The complexity of astronomical data and the variety of scientific problems, however, call for innovative algorithms and methods as well as for an extreme usage of ICT technologies

  5. Aplicaciones de data mining al estudio de la biodiversidad

    OpenAIRE

    Santa María, Cristóbal; Soria, Marcelo A.

    2011-01-01

    El trabajo propone la utilización conjunta de técnicas de data mining y simulación para evaluar la riqueza y diversidad de comunidades microbianas. Se parte de una muestra formada por distintas secuencias de ADN que se alinean para luego ser agrupadas según su similaridad en clusters. Cada uno de estos clusters es una especie y el propósito es estimar su número y distribución en la comunidad basándose en la información que da la muestra. La técnica de rarefacción, sustentada en el procedimien...

  6. Clustering-based approaches to SAGE data mining

    Directory of Open Access Journals (Sweden)

    Wang Haiying

    2008-07-01

    Full Text Available Abstract Serial analysis of gene expression (SAGE is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation.

  7. Visualizing data mining results with the Brede tools

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup

    2009-01-01

    has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web......A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them --- the BrainMap database. Since then the Brede Toolbox...

  8. Data Mining and Knowledge Discovery via Logic-Based Methods

    CERN Document Server

    Triantaphyllou, Evangelos

    2010-01-01

    There are many approaches to data mining and knowledge discovery (DM&KD), including neural networks, closest neighbor methods, and various statistical methods. This monograph, however, focuses on the development and use of a novel approach, based on mathematical logic, that the author and his research associates have worked on over the last 20 years. The methods presented in the book deal with key DM&KD issues in an intuitive manner and in a natural sequence. Compared to other DM&KD methods, those based on mathematical logic offer a direct and often intuitive approach for extracting easily int

  9. Clustering for data mining a data recovery approach

    CERN Document Server

    Mirkin, Boris

    2005-01-01

    Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods--K-Means for partitioning the data set and Ward's method for hierarchical clustering--have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids.Rather than the traditional set of ad hoc techniques, Clustering for Data Mining: A Data Recovery Approach presents a theory that not only closes gaps in K-Mean

  10. Utility Independent Privacy Preserving Data Mining - Horizontally Partitioned Data

    Directory of Open Access Journals (Sweden)

    E Poovammal

    2010-06-01

    Full Text Available Micro data is a valuable source of information for research. However, publishing data about individuals for research purposes, without revealing sensitive information, is an important problem. The main objective of privacy preserving data mining algorithms is to obtain accurate results/rules by analyzing the maximum possible amount of data without unintended information disclosure. Data sets for analysis may be in a centralized server or in a distributed environment. In a distributed environment, the data may be horizontally or vertically partitioned. We have developed a simple technique by which horizontally partitioned data can be used for any type of mining task without information loss. The partitioned sensitive data at 'm' different sites are transformed using a mapping table or graded grouping technique, depending on the data type. This transformed data set is given to a third party for analysis. This may not be a trusted party, but it is still allowed to perform mining operations on the data set and to release the results to all the 'm' parties. The results are interpreted among the 'm' parties involved in the data sharing. The experiments conducted on real data sets prove that our proposed simple transformation procedure preserves one hundred percent of the performance of any data mining algorithm as compared to the original data set while preserving privacy.

  11. DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    FRANCISCA NONYELUM OGWUELEKA

    2011-06-01

    Full Text Available Data mining is popularly used to combat frauds because of its effectiveness. It is a well-defined procedure that takes data as input and produces models or patterns as output. Neural network, a data mining technique was used in this study. The design of the neural network (NN architecture for the credit card detection system was based on unsupervised method, which was applied to the transactions data to generate four clusters of low, high, risky and high-risk clusters. The self-organizing map neural network (SOMNN technique was used for solving the problem of carrying out optimal classification of each transaction into its associated group, since a prior output is unknown. The receiver-operating curve (ROC for credit card fraud (CCF detection watch detected over 95% of fraud cases without causing false alarms unlike other statistical models and the two-stage clusters. This shows that the performance of CCF detection watch is in agreement with other detection software, but performs better.

  12. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  13. Marine data users clustering using data mining technique

    Directory of Open Access Journals (Sweden)

    Farnaz Ghiasi

    2015-09-01

    Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.

  14. DATA MINING UNTUK KLASIFIKASI PELANGGAN DENGAN ANT COLONY OPTIMIZATION

    Directory of Open Access Journals (Sweden)

    Maulani Kapiudin

    2007-01-01

    Full Text Available In this research the system for potentially customer classification is designed by extracting rule based classification from raw data with certain criteria. The searching process uses customer database from a bank with data mining technic by using ant colony optimization. A test based on min_case_per_rule variety and phenomene updating were done on a certain period of time. The result are group of customer class which base on rules built by ant and by modifying the pheromone updating, the area of the case is getting bigger. Prototype of the software is coded with C++ 6 version. The customer database master is created by using Microsoft Access. This paper gives information about potential customer of bank that can be classified by prototype of the software. Abstract in Bahasa Indonesia : Pada penelitian untuk sistem klasifikasi potensial customer ini didesain dengan melakukan ekstrak rule berdasarkan klasifikasi dari data mentah dengan kriteria tertentu. Proses pencarian menggunakan database pelanggan dari suatu bank dengan teknik data mining dengan ant colony optimization. Dilakukan percobaan dengan min_case_per_rule variety dan phenomene updating pada periode waktu tertentu. Hasilnya adalah sekelompok class pelanggan yang didasarkan dari rules yang dibangun dengan ant dan dengan dimodifikasi dengan pheromone updating, area permasalahan menjadi lebih melebar. Prototype dari software ini menggunakan C++ versi 6. Database pelanggan dibangun dengan Microsoft Access. Paper ini memberikan informasi mengenai potensi pelanggan dari bank, sehingga dapat diklasifikasikan dengan prototype dari software. Kata kunci: ant colony optimization, classification, min_case_per_rule, term, pheromone updating

  15. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  16. Unrecorded Accidents Detection on Highways Based on Temporal Data Mining

    Directory of Open Access Journals (Sweden)

    Shi An

    2014-01-01

    Full Text Available Automatic traffic accident detection, especially not recorded by traffic police, is crucial to accident black spots identification and traffic safety. A new method of detecting traffic accidents is proposed based on temporal data mining, which can identify the unknown and unrecorded accidents by traffic police. Time series model was constructed using ternary numbers to reflect the state of traffic flow based on cell transmission model. In order to deal with the aftereffects of linear drift between time series and to reduce the computational cost, discrete Fourier transform was implemented to turn time series from time domain to frequency domain. The pattern of the time series when an accident happened could be recognized using the historical crash data. Then taking Euclidean distance as the similarity evaluation function, similarity data mining of the transformed time series was carried out. If the result was less than the given threshold, the two time series were similar and an accident happened probably. A numerical example was carried out and the results verified the effectiveness of the proposed method.

  17. Connecting traditional sciences with the OLAP and data mining paradigms

    Science.gov (United States)

    Guergachi, Aziz A.

    2003-03-01

    The paradigms of OLAP, multidimensional modeling and data mining have first emerged in the areas of market analysis and finance to address various needs of people working in these areas. Does this mean that they are useful and applicable in these areas only? Or, can they also be applicable in the other more traditional areas of science and engineering? What characterize the systems for which these paradigms are suitable? What are the goals of these paradigms? How do they relate to the traditional body of knowledge that has been developed throughout the centuries in the areas of mathematics, statistics, systems science and engineering? Where, how and to what extent can we leverage the conventional wisdom that has been accumulated in the aforementioned disciplines to develop a foundational basis for the above paradigms? The goal of this paper is to address these questions at the foundational level. We argue that the paradigms of OLAP, multidimensional modeling and data mining can also be applied successfully to complex engineering systems, such as membrane-based water/wastewater treatment plants, for example. We develop mathematically-based axiomatic definition of the concepts of 'dimension,' 'dimension level,' 'dimension hierarchy' and 'measure' using set theory and equivalence relations.

  18. Data Mining and Machine Learning Tools for Combinatorial Material Science of All-Oxide Photovoltaic Cells.

    Science.gov (United States)

    Yosipof, Abraham; Nahum, Oren E; Anderson, Assaf Y; Barad, Hannah-Noa; Zaban, Arie; Senderowitz, Hanoch

    2015-06-01

    Growth in energy demands, coupled with the need for clean energy, are likely to make solar cells an important part of future energy resources. In particular, cells entirely made of metal oxides (MOs) have the potential to provide clean and affordable energy if their power conversion efficiencies are improved. Such improvements require the development of new MOs which could benefit from combining combinatorial material sciences for producing solar cells libraries with data mining tools to direct synthesis efforts. In this work we developed a data mining workflow and applied it to the analysis of two recently reported solar cell libraries based on Titanium and Copper oxides. Our results demonstrate that QSAR models with good prediction statistics for multiple solar cells properties could be developed and that these models highlight important factors affecting these properties in accord with experimental findings. The resulting models are therefore suitable for designing better solar cells. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Applying Data-mining techniques to study drought periods in Spain

    Science.gov (United States)

    Belda, F.; Penades, M. C.

    2010-09-01

    Data-mining is a technique that it can be used to interact with large databases and to help in the discovery relations between parameters by extracting information from massive and multiple data archives. Drought affects many economic and social sectors, from agricultural to transportation, going through urban water deficit and the development of modern industries. With these problems and drought geographical and temporal distribution it's difficult to find a single definition of drought. Improving the understanding of the knowledge of climatic index is necessary to reduce the impacts of drought and to facilitate quick decisions regarding this problem. The main objective is to analyze drought periods from 1950 to 2009 in Spain. We use several kinds of information, different formats, sources and transmission mode. We use satellite-based Vegetation Index, dryness index for several temporal periods. We use daily and monthly precipitation and temperature data and soil moisture data from numerical weather model. We calculate mainly Standardized Precipitation Index (SPI) that it has been used amply in the bibliography. We use OLAP-Mining techniques to discovery of association rules between remote-sensing, numerical weather model and climatic index. Time series Data- Mining techniques organize data as a sequence of events, with each event having a time of recurrence, to cluster the data into groups of records or cluster with similar characteristics. Prior climatological classification is necessary if we want to study drought periods over all Spain.

  20. Explaining and predicting workplace accidents using data-mining techniques

    International Nuclear Information System (INIS)

    Rivas, T.; Paz, M.; Martin, J.E.; Matias, J.M.; Garcia, J.F.; Taboada, J.

    2011-01-01

    Current research into workplace risk is mainly conducted using conventional descriptive statistics, which, however, fail to properly identify cause-effect relationships and are unable to construct models that could predict accidents. The authors of the present study modelled incidents and accidents in two companies in the mining and construction sectors in order to identify the most important causes of accidents and develop predictive models. Data-mining techniques (decision rules, Bayesian networks, support vector machines and classification trees) were used to model accident and incident data compiled from the mining and construction sectors and obtained in interviews conducted soon after an incident/accident occurred. The results were compared with those for a classical statistical techniques (logistic regression), revealing the superiority of decision rules, classification trees and Bayesian networks in predicting and identifying the factors underlying accidents/incidents.

  1. Healthcare Scheduling by Data Mining: Literature Review and Future Directions

    Directory of Open Access Journals (Sweden)

    Maria M. Rinder

    2012-01-01

    Full Text Available This article presents a systematic literature review of the application of industrial engineering methods in healthcare scheduling, with a focus on the role of patient behavior in scheduling. Nine articles that used mathematical programming, data mining, genetic algorithms, and local searches for optimum schedules were obtained from an extensive search of literature. These methods are new approaches to solve the problems in healthcare scheduling. Some are adapted from areas such as manufacturing and transportation. Key findings from these studies include reduced time for scheduling, capability of solving more complex problems, and incorporation of more variables and constraints simultaneously than traditional scheduling methods. However, none of these methods modeled no-show and walk-ins patient behavior. Future research should include more variables related to patient and/or environment.

  2. Data mining techniques for thermophysical properties of refrigerants

    International Nuclear Information System (INIS)

    Kuecueksille, Ecir Ugur; Selbas, Resat; Sencan, Arzu

    2009-01-01

    This study presents ten modeling techniques within data mining process for the prediction of thermophysical properties of refrigerants (R134a, R404a, R407c and R410a). These are linear regression (LR), multi layer perception (MLP), pace regression (PR), simple linear regression (SLR), sequential minimal optimization (SMO), KStar, additive regression (AR), M5 model tree, decision table (DT), M5'Rules models. Relations depending on temperature and pressure were carried out for the determination of thermophysical properties as the specific heat capacity, viscosity, heat conduction coefficient, density of the refrigerants. Obtained model results for every refrigerant were compared and the best model was investigated. Results indicate that use of derived formulations from these techniques will facilitate design and optimize of heat exchangers which is component of especially vapor compression refrigeration system

  3. Prediksi Pendapatan Sewa Dengan Data Mining Pada Perusahaan XYZ

    Directory of Open Access Journals (Sweden)

    May Liana

    2010-12-01

    Full Text Available XYZ Company has a program to predict leasing income that only predict in constant condition where every tenant assumed for leasing renewal. This research is done to build accurate income prediction system that accommodate in making strategic decision towards the company. Premier data collecting is through direct interview with the company management. The analysis is through data training from the previous years to build neural network model. The analysis result shows that this model has produced error total value that is smaller than the previous error total value in years before. Therefore, it could be concluded that data mining with neural network technique that produced more accurate leasing income that could help the company making decision based on the hidden information in the database.

  4. Explaining and predicting workplace accidents using data-mining techniques

    Energy Technology Data Exchange (ETDEWEB)

    Rivas, T., E-mail: trivas@uvigo.e [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Paz, M., E-mail: mpaz.minas@gmail.co [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Martin, J.E., E-mail: jmartin@cippinternacional.co [CIPP International, S.L. Parque Tecnologico de Asturias, Parcela 43, Oficina 11, 33428 Llanera (Spain); Matias, J.M., E-mail: jmmatias@uvigo.e [Dpto. Estadistica e Investigacion Operativa, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain); Garcia, J.F., E-mail: jgarcia@cippinternacional.co [CIPP International, S.L. Parque Tecnologico de Asturias, Parcela 43, Oficina 11, 33428 Llanera (Spain); Taboada, J., E-mail: jtaboada@uvigo.e [Dpto. Ingenieria de los Recursos Naturales y Medio Ambiente, E.T.S.I. Minas, University of Vigo, Campus Lagoas, 36310 Vigo (Spain)

    2011-07-15

    Current research into workplace risk is mainly conducted using conventional descriptive statistics, which, however, fail to properly identify cause-effect relationships and are unable to construct models that could predict accidents. The authors of the present study modelled incidents and accidents in two companies in the mining and construction sectors in order to identify the most important causes of accidents and develop predictive models. Data-mining techniques (decision rules, Bayesian networks, support vector machines and classification trees) were used to model accident and incident data compiled from the mining and construction sectors and obtained in interviews conducted soon after an incident/accident occurred. The results were compared with those for a classical statistical techniques (logistic regression), revealing the superiority of decision rules, classification trees and Bayesian networks in predicting and identifying the factors underlying accidents/incidents.

  5. CrossRef text and data mining services

    Directory of Open Access Journals (Sweden)

    Rachael Lammey

    2015-02-01

    Full Text Available CrossRef is an association of scholarly publishers that develops shared infrastructure to support more effective scholarly communications. It is a registration agency for the digital object identifier (DOI, and has built additional services for CrossRef members around the DOI and the bibliographic metadata that publishers deposit in order to register DOIs for their publications. Among these services are CrossCheck, powered by iThenticate, which helps publishers screen for plagiarism in submitted manuscripts and FundRef, which gives publishers standard way to report funding sources for published scholarly research. To add to these services, Cross-Ref launched CrossRef text and data mining services in May 2014. This article will explain the thinking behind CrossRef launching this new service, what it offers to publishers and researchers alike, how publishers can participate in it, and the uptake of the service so far.

  6. Base Oils Biodegradability Prediction with Data Mining Techniques

    Directory of Open Access Journals (Sweden)

    Malika Trabelsi

    2010-02-01

    Full Text Available In this paper, we apply various data mining techniques including continuous numeric and discrete classification prediction models of base oils biodegradability, with emphasis on improving prediction accuracy. The results show that highly biodegradable oils can be better predicted through numeric models. In contrast, classification models did not uncover a similar dichotomy. With the exception of Memory Based Reasoning and Decision Trees, tested classification techniques achieved high classification prediction. However, the technique of Decision Trees helped uncover the most significant predictors. A simple classification rule derived based on this predictor resulted in good classification accuracy. The application of this rule enables efficient classification of base oils into either low or high biodegradability classes with high accuracy. For the latter, a higher precision biodegradability prediction can be obtained using continuous modeling techniques.

  7. Data mining learning bootstrap through semantic thumbnail analysis

    Science.gov (United States)

    Battiato, Sebastiano; Farinella, Giovanni Maria; Giuffrida, Giovanni; Tribulato, Giuseppe

    2007-01-01

    The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques.

  8. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    Directory of Open Access Journals (Sweden)

    Joanna F Dipnall

    Full Text Available Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010. Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30, serum glucose (OR 1.01; 95% CI 1.00, 1.01 and total bilirubin (OR 0.12; 95% CI 0.05, 0.28. Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016, and current smokers (p<0.001.The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling

  9. High Performance EVA Glove Collaboration: Glove Injury Data Mining Effort

    Science.gov (United States)

    Reid, C. R.; Benson, E.; England, S.; Charvat, J.; Norcross, J. R.; McFarland, S. M.; Rajulu, S.

    2015-01-01

    Human hands play a significant role during Extravehicular Activity (EVA) missions and Neutral Buoyancy Lab (NBL) training events, as they are needed for translating and performing tasks in the weightless environment. Because of this high frequency usage, hand and arm related injuries are known to occur during EVA and EVA training in the NBL. The primary objectives of this investigation were to: 1) document all known EVA glove related injuries and circumstances of these incidents, 2) determine likely risk factors, and 3) recommend interventions where possible that could be implemented in the current and future glove designs. METHODS: The investigation focused on the discomforts and injuries of U.S. crewmembers who had worn the pressurized Extravehicular Mobility Unit (EMU) spacesuit and experienced 4000 Series or Phase VI glove related incidents during 1981 to 2010 for either EVA ground training or in-orbit flight. We conducted an observational retrospective case-control investigation using 1) a literature review of known injuries, 2) data mining of crew injury, glove sizing, and hand anthropometry databases, 3) descriptive statistical analyses, and finally 4) statistical risk correlation and predictor analyses to better understand injury prevalence and potential causation. Specific predictor statistical analyses included use of principal component analyses (PCA), multiple logistic regression, and survival analyses (Cox proportional hazards regression). Results of these analyses were computed risk variables in the forms of odds ratios (likelihood of an injury occurring given the magnitude of a risk variable) and hazard ratios (likelihood of time to injury occurrence). Due to the exploratory nature of this investigation, we selected predictor variables significant at p=0.15. RESULTS: Through 2010, there have been a total of 330 NASA crewmembers, from which 96 crewmembers performed 322 EVAs during 1981-2010, resulting in 50 crewmembers being injured inflight and 44

  10. Data Mining Techniques to Estimate Plutonium, Initial Enrichment, Burnup, and Cooling Time in Spent Fuel Assemblies

    Energy Technology Data Exchange (ETDEWEB)

    Trellue, Holly Renee [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Fugate, Michael Lynn [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Tobin, Stephen Joesph [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-03-19

    The Next Generation Safeguards Initiative (NGSI), Office of Nonproliferation and Arms Control (NPAC), National Nuclear Security Administration (NNSA) of the U.S. Department of Energy (DOE) has sponsored a multi-laboratory, university, international partner collaboration to (1) detect replaced or missing pins from spent fuel assemblies (SFA) to confirm item integrity and deter diversion, (2) determine plutonium mass and related plutonium and uranium fissile mass parameters in SFAs, and (3) verify initial enrichment (IE), burnup (BU), and cooling time (CT) of facility declaration for SFAs. A wide variety of nondestructive assay (NDA) techniques were researched to achieve these goals [Veal, 2010 and Humphrey, 2012]. In addition, the project includes two related activities with facility-specific benefits: (1) determination of heat content and (2) determination of reactivity (multiplication). In this research, a subset of 11 integrated NDA techniques was researched using data mining solutions at Los Alamos National Laboratory (LANL) for their ability to achieve the above goals.

  11. A New Approach of Multi-robot Cooperative Pursuit Based on Association Rule Data Mining

    Directory of Open Access Journals (Sweden)

    Jun Li

    2010-02-01

    Full Text Available An approach of cooperative hunting for multiple mobile targets by multi-robot is presented, which divides the pursuit process into forming the pursuit teams and capturing the targets. The data sets of attribute relationship is built by consulting all of factors about capturing evaders, then the interesting rules can be found by data mining from the data sets to build the pursuit teams. Through doping out the positions of targets, the pursuit game can be transformed into multi-robot path planning. Reinforcement learning is used to find the best path. The simulation results show that the mobile evaders can be captured effectively and efficiently, and prove the feasibility and validity of the given algorithm under a dynamic environment.

  12. A New Approach of Multi-Robot Cooperative Pursuit Based on Association Rule Data Mining

    Directory of Open Access Journals (Sweden)

    Jun Li

    2009-12-01

    Full Text Available An approach of cooperative hunting for multiple mobile targets by multi-robot is presented, which divides the pursuit process into forming the pursuit teams and capturing the targets. The data sets of attribute relationship is built by consulting all of factors about capturing evaders, then the interesting rules can be found by data mining from the data sets to build the pursuit teams. Through doping out the positions of targets, the pursuit game can be transformed into multi-robot path planning. Reinforcement learning is used to find the best path. The simulation results show that the mobile evaders can be captured effectively and efficiently, and prove the feasibility and validity of the given algorithm under a dynamic environment.

  13. Automation of route identification and optimisation based on data-mining and chemical intuition.

    Science.gov (United States)

    Lapkin, A A; Heer, P K; Jacob, P-M; Hutchby, M; Cunningham, W; Bull, S D; Davidson, M G

    2017-09-21

    Data-mining of Reaxys and network analysis of the combined literature and in-house reactions set were used to generate multiple possible reaction routes to convert a bio-waste feedstock, limonene, into a pharmaceutical API, paracetamol. The network analysis of data provides a rich knowledge-base for generation of the initial reaction screening and development programme. Based on the literature and the in-house data, an overall flowsheet for the conversion of limonene to paracetamol was proposed. Each individual reaction-separation step in the sequence was simulated as a combination of the continuous flow and batch steps. The linear model generation methodology allowed us to identify the reaction steps requiring further chemical optimisation. The generated model can be used for global optimisation and generation of environmental and other performance indicators, such as cost indicators. However, the identified further challenge is to automate model generation to evolve optimal multi-step chemical routes and optimal process configurations.

  14. Effective search for stable segregation configurations at grain boundaries with data-mining techniques

    Science.gov (United States)

    Kiyohara, Shin; Mizoguchi, Teruyasu

    2018-03-01

    Grain boundary segregation of dopants plays a crucial role in materials properties. To investigate the dopant segregation behavior at the grain boundary, an enormous number of combinations have to be considered in the segregation of multiple dopants at the complex grain boundary structures. Here, two data mining techniques, the random-forests regression and the genetic algorithm, were applied to determine stable segregation sites at grain boundaries efficiently. Using the random-forests method, a predictive model was constructed from 2% of the segregation configurations and it has been shown that this model could determine the stable segregation configurations. Furthermore, the genetic algorithm also successfully determined the most stable segregation configuration with great efficiency. We demonstrate that these approaches are quite effective to investigate the dopant segregation behaviors at grain boundaries.

  15. Automatic detection of referral patients due to retinal pathologies through data mining.

    Science.gov (United States)

    Quellec, Gwenolé; Lamard, Mathieu; Erginay, Ali; Chabouis, Agnès; Massin, Pascale; Cochener, Béatrice; Cazuguel, Guy

    2016-04-01

    With the increased prevalence of retinal pathologies, automating the detection of these pathologies is becoming more and more relevant. In the past few years, many algorithms have been developed for the automated detection of a specific pathology, typically diabetic retinopathy, using eye fundus photography. No matter how good these algorithms are, we believe many clinicians would not use automatic detection tools focusing on a single pathology and ignoring any other pathology present in the patient's retinas. To solve this issue, an algorithm for characterizing the appearance of abnormal retinas, as well as the appearance of the normal ones, is presented. This algorithm does not focus on individual images: it considers examination records consisting of multiple photographs of each retina, together with contextual information about the patient. Specifically, it relies on data mining in order to learn diagnosis rules from characterizations of fundus examination records. The main novelty is that the content of examination records (images and context) is characterized at multiple levels of spatial and lexical granularity: 1) spatial flexibility is ensured by an adaptive decomposition of composite retinal images into a cascade of regions, 2) lexical granularity is ensured by an adaptive decomposition of the feature space into a cascade of visual words. This multigranular representation allows for great flexibility in automatically characterizing normality and abnormality: it is possible to generate diagnosis rules whose precision and generalization ability can be traded off depending on data availability. A variation on usual data mining algorithms, originally designed to mine static data, is proposed so that contextual and visual data at adaptive granularity levels can be mined. This framework was evaluated in e-ophtha, a dataset of 25,702 examination records from the OPHDIAT screening network, as well as in the publicly-available Messidor dataset. It was successfully

  16. Building a Classification Model for Enrollment In Higher Educational Courses using Data Mining Techniques

    OpenAIRE

    Saini, Priyanka

    2014-01-01

    Data Mining is the process of extracting useful patterns from the huge amount of database and many data mining techniques are used for mining these patterns. Recently, one of the remarkable facts in higher educational institute is the rapid growth data and this educational data is expanding quickly without any advantage to the educational management. The main aim of the management is to refine the education standard; therefore by applying the various data mining techniques on this data one ca...

  17. A Case Study for Student Performance Analysis based on Educational Data Mining (EDM)

    OpenAIRE

    Daxa Kundariya; Prof. Vaseem Ghada

    2016-01-01

    Educational Data Mining (EDM) is a study methodology and an application of data mining techniques related to student’s data from academic database. Like other domain, educational domain also produce vast amount of studying data. To enhance the quality of education system student performance analysis plays an important role for decision support. This paper elaborates a study on various Educational data mining technique and how they could be used to educational system to analysis student perfor...

  18. Data Mining in Education : A Review on the Knowledge Discovery Perspective

    OpenAIRE

    Pratiyush Guleria; Manu Sood

    2014-01-01

    Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data minin g can be used to mine understandable meaningful patterns from large databases and these patterns ma y then be converted into knowledge.Data mining is t he process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehou se and...

  19. Some remarks on parallel data mining using a persistent object manager

    International Nuclear Information System (INIS)

    Araujo, Neil; Grossman, Robert; Hanley, David

    1996-01-01

    Our underlying assumption is that high performance data management will be as important as high performance computing by the beginning of the next millennium. Given this, data mining will take on increasing importance. In this paper, we discuss our experience with parallel data mining on an IBM SP-2, focusing on four issues which we feel are emerging as critical for data mining applications in general. (author)

  20. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Lahiru Iddamalgoda; Partha Sarathi Das; Partha Sarathi Das; Achala Aponso; Vijayaraghava Seshadri Sundararajan; Prashanth Suravajhala; Prashanth Suravajhala; Prashanth Suravajhala; Jayaraman K Valadi

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern r...

  1. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited ...

  2. Data Mining Foundations and Intelligent Paradigms Volume 2 Statistical, Bayesian, Time Series and other Theoretical Aspects

    CERN Document Server

    Jain, Lakhmi

    2012-01-01

    Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 2 of this three volume series, we have brought together contributions from some of the most prestigious researchers in theoretical data mining. Each of the chapters is self contained. Statisticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in data mining.

  3. Data mining concepts, methods and applications in management and engineering design

    CERN Document Server

    Yin, Yong; Tang, Jiafu; Zhu, JianMing

    2011-01-01

    Data Mining introduces in clear and simple ways how to use existing data mining methods to obtain effective solutions for a variety of management and engineering design problems. Data Mining is organised into two parts: the first provides a focused introduction to data mining and the second goes into greater depth on subjects such as customer analysis. It covers almost all managerial activities of a company, including: * supply chain design, * product development, * manufacturing system design, * product quality control, and * preservation of privacy. Incorporating recent developments of data

  4. Advanced Query and Data Mining Capabilities for MaROS

    Science.gov (United States)

    Wang, Paul; Wallick, Michael N.; Allard, Daniel A.; Gladden, Roy E.; Hy, Franklin H.

    2013-01-01

    The Mars Relay Operational Service (MaROS) comprises a number of tools to coordinate, plan, and visualize various aspects of the Mars Relay network. These levels include a Web-based user interface, a back-end "ReSTlet" built in Java, and databases that store the data as it is received from the network. As part of MaROS, the innovators have developed and implemented a feature set that operates on several levels of the software architecture. This new feature is an advanced querying capability through either the Web-based user interface, or through a back-end REST interface to access all of the data gathered from the network. This software is not meant to replace the REST interface, but to augment and expand the range of available data. The current REST interface provides specific data that is used by the MaROS Web application to display and visualize the information; however, the returned information from the REST interface has typically been pre-processed to return only a subset of the entire information within the repository, particularly only the information that is of interest to the GUI (graphical user interface). The new, advanced query and data mining capabilities allow users to retrieve the raw data and/or to perform their own data processing. The query language used to access the repository is a restricted subset of the structured query language (SQL) that can be built safely from the Web user interface, or entered as freeform SQL by a user. The results are returned in a CSV (Comma Separated Values) format for easy exporting to third party tools and applications that can be used for data mining or user-defined visualization and interpretation. This is the first time that a service is capable of providing access to all cross-project relay data from a single Web resource. Because MaROS contains the data for a variety of missions from the Mars network, which span both NASA and ESA, the software also establishes an access control list (ACL) on each data record

  5. A way toward analyzing high-content bioimage data by means of semantic annotation and visual data mining

    Science.gov (United States)

    Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.

    2009-02-01

    In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.

  6. Prediction of Thyroid Disease Using Data Mining Techniques

    Directory of Open Access Journals (Sweden)

    Irina Ioniţă

    2016-08-01

    Full Text Available Recently, thyroid diseases are more and more spread worldwide. In Romania, for example, one of eight women suffer from hypothyroidism, hyperthyroidism or thyroid cancer. Various research studies estimate that about 30% of Romanians are diagnosed with endemic goiter. The factors that affect the thyroid function are: stress, infection, trauma, toxins, low-calorie diet, certain medication etc. It is very important to prevent such diseases rather than cure them, because the majority of treatments consist in long term medication or in chirurgical intervention. The current study refers to the thyroid disease classification in two of the most common thyroid dysfunctions (hyperthyroidism and hypothyroidism among the population. The authors analyzed and compared four classification models: Naive Bayes, Decision Tree, Multilayer Perceptron and Radial Basis Function Network. The results indicate a significant accuracy for all the classification models mentioned above, the best classification rate being that of the Decision Tree model. The data set used to build and to validate the classifier was provided by the UCI machine learning repository and by a website with Romanian data. The framework for building and testing the classification models was KNIME Analytics Platform and Weka, two data mining software.

  7. A novel Neuro-fuzzy classification technique for data mining

    Directory of Open Access Journals (Sweden)

    Soumadip Ghosh

    2014-11-01

    Full Text Available In our study, we proposed a novel Neuro-fuzzy classification technique for data mining. The inputs to the Neuro-fuzzy classification system were fuzzified by applying generalized bell-shaped membership function. The proposed method utilized a fuzzification matrix in which the input patterns were associated with a degree of membership to different classes. Based on the value of degree of membership a pattern would be attributed to a specific category or class. We applied our method to ten benchmark data sets from the UCI machine learning repository for classification. Our objective was to analyze the proposed method and, therefore compare its performance with two powerful supervised classification algorithms Radial Basis Function Neural Network (RBFNN and Adaptive Neuro-fuzzy Inference System (ANFIS. We assessed the performance of these classification methods in terms of different performance measures such as accuracy, root-mean-square error, kappa statistic, true positive rate, false positive rate, precision, recall, and f-measure. In every aspect the proposed method proved to be superior to RBFNN and ANFIS algorithms.

  8. A genetic algorithm approach to recognition and data mining

    Energy Technology Data Exchange (ETDEWEB)

    Punch, W.F.; Goodman, E.D.; Min, Pei [Michigan State Univ., East Lansing, MI (United States)] [and others

    1996-12-31

    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  9. Project X: competitive intelligence data mining and analysis

    Science.gov (United States)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Competitive Intelligence (CI) is a systematic and ethical program for gathering and analyzing information about your competitors' activities and general business trends to further your own company's goals. CI allows companies to gather extensive information on their competitors and to analyze what the competition is doing in order to maintain or gain a competitive edge. In commercial business this potentially translates into millions of dollars in annual savings or losses. The Internet provides an overwhelming portal of information for CI analysis. The problem is how a company can automate the translation of voluminous information into valuable and actionable knowledge. This paper describes Project X, an agent-based data mining system specifically developed for extracting and analyzing competitive information from the Internet. Project X gathers CI information from a variety of sources including online newspapers, corporate websites, industry sector reporting sites, speech archiving sites, video news casts, stock news sites, weather sites, and rumor sites. It uses individual industry specific (e.g., pharmaceutical, financial, aerospace, etc.) commercial sector ontologies to form the knowledge filtering and discovery structures/content required to filter and identify valuable competitive knowledge. Project X is described in detail and an example competitive intelligence case is shown demonstrating the system's performance and utility for business intelligence.

  10. Data mining in the study of nuclear fuel cells

    International Nuclear Information System (INIS)

    Medina P, J. A.; Ortiz S, J. J.; Castillo, A.; Montes T, J. L.; Perusquia, R.

    2015-09-01

    In this paper is presented a study of data mining application in the analysis of fuel cells and their performance within a nuclear boiling water reactor. A decision tree was used to fulfill questions of the type If (condition) and Then (conclusion) to classify if the fuel cells will have good performance. The performance is measured by compliance or not of the cold shutdown margin, the rate of linear heat generation and the average heat generation in a plane of the reactor. It is assumed that the fuel cells are simulated in the reactor under a fuel reload and rod control patterns pre designed. 18125 fuel cells were simulated according to a steady-state calculation. The decision tree works on a target variable which is one of the three mentioned before. To analyze this objective, the decision tree works with a set of attribute variables. In this case, the attributes are characteristics of the cell as number of gadolinium rods, rods number with certain uranium enrichment mixed with a concentration of gadolinium, etc. The found model was able to predict the execution or not of the shutdown margin with a precision of around 95%. However, the other two variables showed lower percentages due to few learning cases of the model in which these variables were or were not achieved. Even with this inconvenience, the model is quite reliable and can be used in way coupled in optimization systems of fuel cells. (Author)

  11. Effective approach toward Intrusion Detection System using data mining techniques

    Directory of Open Access Journals (Sweden)

    G.V. Nadiammai

    2014-03-01

    Full Text Available With the tremendous growth of the usage of computers over network and development in application running on various platform captures the attention toward network security. This paradigm exploits security vulnerabilities on all computer systems that are technically difficult and expensive to solve. Hence intrusion is used as a key to compromise the integrity, availability and confidentiality of a computer resource. The Intrusion Detection System (IDS plays a vital role in detecting anomalies and attacks in the network. In this work, data mining concept is integrated with an IDS to identify the relevant, hidden data of interest for the user effectively and with less execution time. Four issues such as Classification of Data, High Level of Human Interaction, Lack of Labeled Data, and Effectiveness of Distributed Denial of Service Attack are being solved using the proposed algorithms like EDADT algorithm, Hybrid IDS model, Semi-Supervised Approach and Varying HOPERAA Algorithm respectively. Our proposed algorithm has been tested using KDD Cup dataset. All the proposed algorithm shows better accuracy and reduced false alarm rate when compared with existing algorithms.

  12. Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Directory of Open Access Journals (Sweden)

    Harry Dhika

    2015-12-01

    Full Text Available The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC. Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC is used to find data tables and comparison Area Under Curve (AUC.

  13. An Integrative data mining approach to identifying Adverse ...

    Science.gov (United States)

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP

  14. Utilization of Selected Data Mining Methods for Communication Network Analysis

    Directory of Open Access Journals (Sweden)

    V. Ondryhal

    2011-06-01

    Full Text Available The aim of the project was to analyze the behavior of military communication networks based on work with real data collected continuously since 2005. With regard to the nature and amount of the data, data mining methods were selected for the purpose of analyses and experiments. The quality of real data is often insufficient for an immediate analysis. The article presents the data cleaning operations which have been carried out with the aim to improve the input data sample to obtain reliable models. Gradually, by means of properly chosen SW, network models were developed to verify generally valid patterns of network behavior as a bulk service. Furthermore, unlike the commercially available communication networks simulators, the models designed allowed us to capture nonstandard models of network behavior under an increased load, verify the correct sizing of the network to the increased load, and thus test its reliability. Finally, based on previous experience, the models enabled us to predict emergency situations with a reasonable accuracy.

  15. Knowledge Discovery and Data Mining (KDDM) survey report.

    Energy Technology Data Exchange (ETDEWEB)

    Phillips, Laurence R.; Jordan, Danyelle N.; Bauer, Travis L.; Elmore, Mark T. (Oak Ridge National Laboratory, Oak Ridge, TN); Treadwell, Jim N. (Oak Ridge National Laboratory, Oak Ridge, TN); Homan, Rossitza A.; Chapman, Leon Darrel; Spires, Shannon V.

    2005-02-01

    The large number of government and industry activities supporting the Unit of Action (UA), with attendant documents, reports and briefings, can overwhelm decision-makers with an overabundance of information that hampers the ability to make quick decisions often resulting in a form of gridlock. In particular, the large and rapidly increasing amounts of data and data formats stored on UA Advanced Collaborative Environment (ACE) servers has led to the realization that it has become impractical and even impossible to perform manual analysis leading to timely decisions. UA Program Management (PM UA) has recognized the need to implement a Decision Support System (DSS) on UA ACE. The objective of this document is to research the commercial Knowledge Discovery and Data Mining (KDDM) market and publish the results in a survey. Furthermore, a ranking mechanism based on UA ACE-specific criteria has been developed and applied to a representative set of commercially available KDDM solutions. In addition, an overview of four R&D areas identified as critical to the implementation of DSS on ACE is provided. Finally, a comprehensive database containing detailed information on surveyed KDDM tools has been developed and is available upon customer request.

  16. Apply data mining to analyze the rainfall of landslide

    Directory of Open Access Journals (Sweden)

    Lee Chou-Yuan

    2018-01-01

    Full Text Available Taiwan is listed as extremely dangerous country which suffers from many disasters. The disasters from the landslide result in the loss of agricultural productions, life and property and so on. Many researchers concern about the disasters of landslide, but there are few discussions for the threshold of rainfall for landslide. In this paper, data mining is applied to establish rules and the threshold of rainfall for landslide in Huafan University, Taiwan. These used variables include rainfall, insolation, insolation rate, averaged humidity, averaged temperature, wind speed, and the tilt of inclinometer. The inclinometer is an important instrument for measuring tilt, elevation or depression of an object with respect to gravity. There are 26 inclinometers in Talun mountain area of Huafan University. In this research, the used data were collected from January 2008 to July 2014. In the proposed approach, the regression analysis is used to predict rainfall first. Then, decision tree is used to obtain decision rules and set the threshold of rainfall for landslide. The output of this approach can provide more information for understanding the change of rainfall. The threshold of rainfall could also provide useful information to maintain the security for Huafan University.

  17. Visualizing data mining results with the Brede tools

    Directory of Open Access Journals (Sweden)

    Finn A Nielsen

    2009-07-01

    Full Text Available A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them --- the BrainMap database. Since then the Brede Toolbox has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web presence of the Brede Database is established by a single script executing a workflow involving these steps together with a final generation of Web pages with embedded visualizations and links to interactive three-dimensional models in the Virtual Reality Modeling Language. Apart from the Brede tools I briefly review alternate visualization tools and methods for Internet-based visualization and information visualization as well as portals for visualization tools.

  18. Simulation of California's Major Reservoirs Outflow Using Data Mining Technique

    Science.gov (United States)

    Yang, T.; Gao, X.; Sorooshian, S.

    2014-12-01

    The reservoir's outflow is controlled by reservoir operators, which is different from the upstream inflow. The outflow is more important than the reservoir's inflow for the downstream water users. In order to simulate the complicated reservoir operation and extract the outflow decision making patterns for California's 12 major reservoirs, we build a data-driven, computer-based ("artificial intelligent") reservoir decision making tool, using decision regression and classification tree approach. This is a well-developed statistical and graphical modeling methodology in the field of data mining. A shuffled cross validation approach is also employed to extract the outflow decision making patterns and rules based on the selected decision variables (inflow amount, precipitation, timing, water type year etc.). To show the accuracy of the model, a verification study is carried out comparing the model-generated outflow decisions ("artificial intelligent" decisions) with that made by reservoir operators (human decisions). The simulation results show that the machine-generated outflow decisions are very similar to the real reservoir operators' decisions. This conclusion is based on statistical evaluations using the Nash-Sutcliffe test. The proposed model is able to detect the most influential variables and their weights when the reservoir operators make an outflow decision. While the proposed approach was firstly applied and tested on California's 12 major reservoirs, the method is universally adaptable to other reservoir systems.

  19. Profiling Oman education data using data mining approach

    Science.gov (United States)

    Alawi, Sultan Juma Sultan; Shaharanee, Izwan Nizal Mohd; Jamil, Jastini Mohd

    2017-10-01

    Nowadays, with a large amount of data generated by many application services in different learning fields has led to the new challenges in education field. Education portal is an important system that leads to a better development of education field. This research paper presents an innovative data mining techniques to understand and summarizes the information of Oman's education data generated from the Ministry of Education Oman "Educational Portal". This research embarks into performing student profiling of the Oman student database. This study utilized the k-means clustering technique to determine the students' profiles. An amount of 42484-student records from Sultanate of Oman has been extracted for this study. The findings of this study show the practicality of clustering technique to investigating student's profiles. Allowing for a better understanding of student's behavior and their academic performance. Oman Education Portal contain a large amounts of user activity and interaction data. Analyses of this large data can be meaningful for educator to improve the student performance level and recognize students who needed additional attention.

  20. Data mining approach to web application intrusions detection

    Science.gov (United States)

    Kalicki, Arkadiusz

    2011-10-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application script languages and frameworks together with careless development results in high number of web application vulnerabilities and high number of attacks performed. There are several types of attacks possible because of improper input validation: SQL injection Cross-site scripting, Cross-Site Request Forgery (CSRF), web spam in blogs and others. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. This paper presents data mining based algorithm for anomaly detection. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Previously presented detection method was rewritten and improved. Some tests show that the software catches malicious requests, especially long attack sequences, results quite good with medium length sequences, for short length sequences must be complemented with other methods.

  1. Ultrabroadband photonic Internet: data mining approach to security aspects

    Science.gov (United States)

    Kalicki, Arkadiusz

    2009-06-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. Misuse detection systems are signature based, have high accuracy in detecting many kinds of known attacks but cannot detect unknown and emerging attacks. This can be complemented with anomaly based intrusion detection and prevention systems. This paper presents anomaly driven proxy as an IPS and data mining based algorithm which was used to detecting anomalies. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Some basic tests show that the software catches malicious requests.

  2. Data Mining as a Service (DMaaS)

    Science.gov (United States)

    Tejedor, E.; Piparo, D.; Mascetti, L.; Moscicki, J.; Lamanna, M.; Mato, P.

    2016-10-01

    Data Mining as a Service (DMaaS) is a software and computing infrastructure that allows interactive mining of scientific data in the cloud. It allows users to run advanced data analyses by leveraging the widely adopted Jupyter notebook interface. Furthermore, the system makes it easier to share results and scientific code, access scientific software, produce tutorials and demonstrations as well as preserve the analyses of scientists. This paper describes how a first pilot of the DMaaS service is being deployed at CERN, starting from the notebook interface that has been fully integrated with the ROOT analysis framework, in order to provide all the tools for scientists to run their analyses. Additionally, we characterise the service backend, which combines a set of IT services such as user authentication, virtual computing infrastructure, mass storage, file synchronisation, development portals or batch systems. The added value acquired by the combination of the aforementioned categories of services is discussed, focusing on the opportunities offered by the CERNBox synchronisation service and its massive storage backend, EOS.

  3. Opinion data mining based on DNA method and ORA software

    Science.gov (United States)

    Tian, Ru-Ya; Wu, Lei; Liang, Xiao-He; Zhang, Xue-Fu

    2018-01-01

    Public opinion, especially the online public opinion is a critical issue when it comes to mining its characteristics. Because it can be formed directly and intensely in a short time, and may lead to the outbreak of online group events, and the formation of online public opinion crisis. This may become the pushing hand of a public crisis event, or even have negative social impacts, which brings great challenges to the government management. Data from the mass media which reveal implicit, previously unknown, and potentially valuable information, can effectively help us to understand the evolution law of public opinion, and provide a useful reference for rumor intervention. Based on the Dynamic Network Analysis method, this paper uses ORA software to mine characteristics of public opinion information, opinion topics, and public opinion agents through a series of indicators, and quantitatively analyzed the relationships between them. The results show that through the analysis of the 8 indexes associating with opinion data mining, we can have a basic understanding of the public opinion characteristics of an opinion event, such as who is important in the opinion spreading process, the information grasping condition, and the opinion topics release situation.

  4. Data Mining Student Answers with Moodle to Investigate Learning Pathways in an Introductory Geohazards Course

    Science.gov (United States)

    Sit, S. M.; Brudzinski, M. R.; Colella, H. V.

    2012-12-01

    The recent growth of online learning in higher education is primarily motivated by a desire to (a) increase the availability of learning experiences for learners who cannot, or choose not, to attend traditional face-to-face offerings, (b) assemble and disseminate instructional content more cost-efficiently, or (c) enable instructors to handle more students while maintaining a learning outcome quality that is equivalent to that of comparable face-to-face instruction. However, a less recognized incentive is that online learning also provides an opportunity for data mining, or efficient discovery of non-obvious valuable patterns from a large collection of data, that can be used to investigate learning pathways as opposed to focusing solely on assessing student outcomes. Course management systems that enable online courses provide a means to collect a vast amount of information to analyze students' behavior and the learning process in general. One of the most commonly used is Moodle (modular object-oriented developmental learning environment), a free learning management system that enables creation of powerful, flexible, and engaging online courses and experiences. In order to examine student learning pathways, the online learning modules we are constructing take advantage of Moodle capabilities to provide immediate formative feedback, verifying answers as correct or incorrect and elaborating on knowledge components to guide students towards the correct answer. By permitting multiple attempts in which credit is diminished for each incorrect answer, we provide opportunities to use data mining strategies to assess thousands of students' actions for evidence of problem solving strategies and mastery of concepts. We will show preliminary results from application of this approach to a ~90 student introductory geohazard course that is migrating toward online instruction. We hope more continuous assessment of students' performances will help generate cognitive models that can

  5. Data Mining in Course Management Systems: Moodle Case Study and Tutorial

    Science.gov (United States)

    Romero, Cristobal; Ventura, Sebastian; Garcia, Enrique

    2008-01-01

    Educational data mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. This work is a survey of the specific application of data mining in learning management systems and a case study tutorial with the Moodle system. Our objective is to introduce it both…

  6. A Data Mining Approach to Reveal Representative Collaboration Indicators in Open Collaboration Frameworks

    Science.gov (United States)

    Anaya, Antonio R.; Boticario, Jesus G.

    2009-01-01

    Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…

  7. Educational Data Mining Applications and Tasks: A Survey of the Last 10 Years

    Science.gov (United States)

    Bakhshinategh, Behdad; Zaiane, Osmar R.; ElAtia, Samira; Ipperciel, Donald

    2018-01-01

    Educational Data Mining (EDM) is the field of using data mining techniques in educational environments. There exist various methods and applications in EDM which can follow both applied research objectives such as improving and enhancing learning quality, as well as pure research objectives, which tend to improve our understanding of the learning…

  8. Workshop on Educational Data Mining @ ICALT07 (EDM@ICALT07)

    NARCIS (Netherlands)

    Beck, J.E.; Calders, T.; Pechenizkiy, M.; Viola, S.R.; Spector, J.M.; Sampson, D.G.; Okamoto, T.; Cerri, S.A.; Ueno, M.; Kashihara, A.

    2007-01-01

    The educational data mining workshop1 held in conjunction with the 7 IEEE International Conference on Advanced Learning Technologies (ICALT) in Niigata, Japan on July 18-20, 2007. EDM@ICALT07 continues the series of Workshops organized by the International Working Group on Educational Data Mining

  9. A Framework for Investigating Influence of Organizational Decision Makers on Data Mining Process Achievement

    Directory of Open Access Journals (Sweden)

    Hanieh Hajisafari

    2012-02-01

    Full Text Available Currently, few studies deal with evaluation of data mining plans in context of solvng organizational problems. A successful data miner is searching to solve a fully defined business problem. To make the data mining (DM results actionable, the data miner must explain them to the business insider. The interaction process between the business insiders and data miners is actually a knowledge-sharing process. In this study through representing a framwork, influence of organizational decision makers on data mining process and results investigated. By investigating research literature, the critical success factors of data mining plans was identified and the role of organizational decision makers in each step of data mining was investigated.‌ Then, the conceptual framework of influence of organizational decision makers on data mining process achievement was designed. By getting expert opinions, the proposed framework was analyzed and evantually designed the final framework of influence of organizational decision makers on data mining process achievement. Analysis of experts opinions showed that by knowledge sharing of data ming results with decision makers, "learning", "action or internalization" and "enforcing/unlearning" will become as critical success factors. Also, results of examining importance of decision makers' feedback on data mining steps showed that getting feedback from decision makers could have most influence on "knowledge extraction and representing model" step and least on "data cleaning and preprocessing" step.

  10. Towards the generic framework for utility considerations in data mining research

    NARCIS (Netherlands)

    Puuronen, S.; Pechenizkiy, M.; Soares, C.; Ghani, R.

    2010-01-01

    Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is

  11. Data mining methods for quality assurance in an environmental monitoring network

    NARCIS (Netherlands)

    Athanasiadis, Ioannis N.; Rizzoli, Andrea Emilio; Beard, Daniel W.

    2010-01-01

    The paper presents a system architecture that employs data mining techniques for ensuring quality assurance in an environmental monitoring network. We investigate how data mining techniques can be incorporated in the quality assurance decision making process. As prior expert decisions are

  12. An XML-Enabled Data Mining Query Language XML-DMQL

    NARCIS (Netherlands)

    Feng, L.; Dillon, T.

    2005-01-01

    Inspired by the good work of Han et al. (1996) and Elfeky et al. (2001) on the design of data mining query languages for relational and object-oriented databases, in this paper, we develop an expressive XML-enabled data mining query language by extension of XQuery. We first describe some

  13. IBM SPSS modeler essentials effective techniques for building powerful data mining and predictive analytics solutions

    CERN Document Server

    McCormick, Keith; Wei, Bowen

    2017-01-01

    IBM SPSS Modeler allows quick, efficient predictive analytics and insight building from your data, and is a popularly used data mining tool. This book will guide you through the data mining process, and presents relevant statistical methods which are used to build predictive models and conduct other analytic tasks using IBM SPSS Modeler. From ...

  14. Web based parallel/distributed medical data mining using software agents

    Energy Technology Data Exchange (ETDEWEB)

    Kargupta, H.; Stafford, B.; Hamzaoglu, I.

    1997-12-31

    This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

  15. A Quantitative Analysis of Organizational Factors That Relate to Data Mining Success

    Science.gov (United States)

    Huebner, Richard A.

    2017-01-01

    The ubiquity of data in various forms has fueled the need for advanced data-mining techniques within organizations. The advent of data mining methods used to uncover hidden nuggets of information buried within large data sets has also fueled the need for determining how these unique projects can be successful. There are many challenges associated…

  16. A look at aerosol formation using data mining techniques

    Directory of Open Access Journals (Sweden)

    S. Hyvönen

    2005-01-01

    Full Text Available Atmospheric aerosol particle formation is frequently observed throughout the atmosphere, but despite various attempts of explanation, the processes behind it remain unclear. In this study data mining techniques were used to find the key parameters needed for atmospheric aerosol particle formation to occur. A dataset of 8 years of 80 variables collected at the boreal forest station (SMEAR II in Southern Finland was used, incorporating variables such as radiation, humidity, SO2, ozone and present aerosol surface area. This data was analyzed using clustering and classification methods. The aim of this approach was to gain new parameters independent of any subjective interpretation. This resulted in two key parameters, relative humidity and preexisting aerosol particle surface (condensation sink, capable in explaining 88% of the nucleation events. The inclusion of any further parameters did not improve the results notably. Using these two variables it was possible to derive a nucleation probability function. Interestingly, the two most important variables are related to mechanisms that prevent the nucleation from starting and particles from growing, while parameters related to initiation of particle formation seemed to be less important. Nucleation occurs only with low relative humidity and condensation sink values. One possible explanation for the effect of high water content is that it prevents biogenic hydrocarbon ozonolysis reactions from producing sufficient amounts of low volatility compounds, which might be able to nucleate. Unfortunately the most important biogenic hydrocarbon compound emissions were not available for this study. Another effect of water vapour may be due to its linkage to cloudiness which may prevent the formation of nucleating and/or condensing vapours. A high number of preexisting particles will act as a sink for condensable vapours that otherwise would have been able to form sufficient supersaturation and initiate the

  17. Data-mining of medication records to improve asthma management.

    Science.gov (United States)

    Bereznicki, Bonnie J; Peterson, Gregory M; Jackson, Shane L; Walters, E Haydn; Fitzmaurice, Kimbra D; Gee, Peter R

    2008-07-07

    To use community pharmacy medication records to identify patients whose asthma may not be well managed and then implement and evaluate a multidisciplinary educational intervention to improve asthma management. We used a multisite controlled study design. Forty-two pharmacies throughout Tasmania ran a software application that "data-mined" medication records, generating a list of patients who had received three or more canisters of inhaled short-acting beta(2)-agonists in the preceding 6 months. The patients identified were allocated to an intervention or control group. Pre-intervention data were collected for the period May to November 2006 and post-intervention data for the period December 2006 to May 2007. Intervention patients were contacted by the community pharmacist via mail, and were sent educational material and a letter encouraging them to see their general practitioner for an asthma management review. Pharmacists were blinded to the control patients' identities until the end of the post-intervention period. Dispensing ratio of preventer medication (inhaled corticosteroids [ICSs]) to reliever medication (inhaled short-acting beta(2)-agonists). Thirty-five pharmacies completed the study, providing 702 intervention and 849 control patients. The intervention resulted in a threefold increase in the preventer-to-reliever ratio in the intervention group compared with the control group (P < 0.01) and a higher proportion of patients in the intervention group using ICS therapy than in the control group (P < 0.01). Community pharmacy medication records can be effectively used to identify patients with suboptimal asthma management, who can then be referred to their GP for review. The intervention should be trialled on a national scale to determine the effects on clinical, social, emotional and economic outcomes for people in the Australian community, with a longer follow-up to determine sustainability of the improvements noted.

  18. Sistem Informasi Pemetaan Pendidikan Menggunakan Algoritma Data Mining

    Directory of Open Access Journals (Sweden)

    Olha Musa

    2016-04-01

    Full Text Available in this study to identify the increase in educational services based on the quality of non-formal education is an indicator, having tiered in terms of education, non-formal education (training to be one of the prerequisites in multiplying the potential for self-development. Data mining algorithms is a basic k-means clustering to put the object based on the average (Means nearest cluster. Aims to design mapping information system education with the k-means cluster. Application k-means cluster is part of a non-hierarchical method, the mapping system of education in 171 samples of data Isalam Students Association (HMI were tested in this study showed that the non-hierarchical method (k-means cluster has a good degree of accuracy because they specify the number of clusters in advance. Education information system mapping is used to cluster the data level, corresponding formal education and training has been followed. Education information system mapping is used to cluster the data level, corresponding formal education and training has been followed . The test results have in me some real, the spread of the data in each cluster are similar.  At the time of  the iteration process is not visible difference in the results of the mapping study using the k-means cluster. Results of a cluster centroid information models with variable 4 educated members include S1, S2, has entered basic training cluster  0, educated S1, S2, S3 has entered basic training cluster 1, S1 has educated basic training and training of incoming intermediate cluster 2, educated S1 has entered basic training cluster 3. formal education, education tiered seen in cluster 1 for non-formal education  (training tiered education seen in cluster 2. Based the test results k-means cluster.   Keywords: Information Systems; Educational Mapping; Cluster; K–means

  19. Development of turbine cycle performance analyzer using intelligent data mining

    Energy Technology Data Exchange (ETDEWEB)

    Heo, Gyun Young

    2004-02-15

    In recent year, the performance enhancement of turbine cycle in nuclear power plants is being highlighted because of worldwide deregulation environment. Especially the first target of operating plants became the reduction of operating cost to compete other power plants. It is known that overhaul interval is closely related to operating cost Author identified that the rapid and reliable performance tests, analysis, and diagnosis play an important role in the control of overhaul interval through field investigation. First the technical road map was proposed to clearly set up the objectives. The controversial issues were summarized into data gathering, analysis tool, and diagnosis method. Author proposed the integrated solution on the basis of intelligent data mining techniques. For the reliable data gathering, the state analyzer composed of statistical regression, wavelet analysis, and neural network was developed. The role of the state analyzer is to estimate unmeasured data and to increase the reliability of the collected data. For the advanced performance analysis, performance analysis toolbox was developed. The purpose of this tool makes analysis process easier and more accurate by providing three novel heat balance diagrams. This tool includes the state analyzer and turbine cycle simulation code. In diagnosis module, the probabilistic technique based on Bayesian network model and the deterministic technique based on algebraical model are provided together. It compromises the uncertainty in diagnosis process and the pin-point capability. All the modules were validated by simulated data as well as actual test data, and some modules are used as industrial applications. We have a lot of thing to be improved in turbine cycle in order to increase plant availability. This study was accomplished to remind the concern about the importance of turbine cycle and to propose the solutions on the basis of academic as well as industrial needs.

  20. Visual management of large scale data mining projects.

    Science.gov (United States)

    Shah, I; Hunter, L

    2000-01-01

    This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.

  1. Development of turbine cycle performance analyzer using intelligent data mining

    International Nuclear Information System (INIS)

    Heo, Gyun Young

    2004-02-01

    In recent year, the performance enhancement of turbine cycle in nuclear power plants is being highlighted because of worldwide deregulation environment. Especially the first target of operating plants became the reduction of operating cost to compete other power plants. It is known that overhaul interval is closely related to operating cost Author identified that the rapid and reliable performance tests, analysis, and diagnosis play an important role in the control of overhaul interval through field investigation. First the technical road map was proposed to clearly set up the objectives. The controversial issues were summarized into data gathering, analysis tool, and diagnosis method. Author proposed the integrated solution on the basis of intelligent data mining techniques. For the reliable data gathering, the state analyzer composed of statistical regression, wavelet analysis, and neural network was developed. The role of the state analyzer is to estimate unmeasured data and to increase the reliability of the collected data. For the advanced performance analysis, performance analysis toolbox was developed. The purpose of this tool makes analysis process easier and more accurate by providing three novel heat balance diagrams. This tool includes the state analyzer and turbine cycle simulation code. In diagnosis module, the probabilistic technique based on Bayesian network model and the deterministic technique based on algebraical model are provided together. It compromises the uncertainty in diagnosis process and the pin-point capability. All the modules were validated by simulated data as well as actual test data, and some modules are used as industrial applications. We have a lot of thing to be improved in turbine cycle in order to increase plant availability. This study was accomplished to remind the concern about the importance of turbine cycle and to propose the solutions on the basis of academic as well as industrial needs

  2. A Financial Data Mining Model for Extracting Customer Behavior

    Directory of Open Access Journals (Sweden)

    Mark K.Y. Mak

    2011-08-01

    Full Text Available Facing the problem of variation and chaotic behavior of customers, the lack of sufficient information is a challenge to many business organizations. Human analysts lacking an understanding of the hidden patterns in business data, thus, can miss corporate business opportunities. In order to embrace all business opportunities, enhance the competitiveness, discovery of hidden knowledge, unexpected patterns and useful rules from large databases have provided a feasible solution for several decades. While there is a wide range of financial analysis products existing in the financial market, how to customize the investment portfolio for the customer is still a challenge to many financial institutions. This paper aims at developing an intelligent Financial Data Mining Model (FDMM for extracting customer behavior in the financial industry, so as to increase the availability of decision support data and hence increase customer satisfaction. The proposed financial model first clusters the customers into several sectors, and then finds the correlation among these sectors. It is noted that better customer segmentation can increase the ability to identify targeted customers, therefore extracting useful rules for specific clusters can provide an insight into customers' buying behavior and marketing implications. To validate the feasibility of the proposed model, a simple dataset is collected from a financial company in Hong Kong. The simulation experiments show that the proposed method not only can improve the workflow of a financial company, but also deepen understanding of investment behavior. Thus, a corporation is able to customize the most suitable products and services for customers on the basis of the rules extracted.

  3. Application of integrated data mining techniques in stock market forecasting

    Directory of Open Access Journals (Sweden)

    Chin-Yin Huang

    2014-12-01

    Full Text Available Stock market is considered too uncertain to be predictable. Many individuals have developed methodologies or models to increase the probability of making a profit in their stock investment. The overall hit rates of these methodologies and models are generally too low to be practical for real-world application. One of the major reasons is the huge fluctuation of the market. Therefore, the current research focuses in the stock forecasting area is to improve the accuracy of stock trading forecast. This paper introduces a system that addresses the particular need. The system integrates various data mining techniques and supports the decision-making for stock trades. The proposed system embeds the top-down trading theory, artificial neural network theory, technical analysis, dynamic time series theory, and Bayesian probability theory. To experimentally examine the trading return of the presented system, two examples are studied. The first uses the Taiwan Semiconductor Manufacturing Company (TSMC data-set that covers an investment horizon of 240 trading days from 16 February 2011 to 23 January 2013. Eighty four transactions were made using the proposed approach and the investment return of the portfolio was 54% with an 80.4% hit rate during a 12-month period in which the TSMC stock price increased by 25% (from $NT 78.5 to $NT 101.5. The second example examines the stock data of Evergreen Marine Corporation, an international marine shipping company. Sixty four transactions were made and the investment return of the portfolio was 128% in 12 months. Given the remarkable investment returns in trading the example TSMC and Evergreen stocks, the proposed system demonstrates promising potentials as a viable tool for stock market forecasting.

  4. High Performance EVA Glove Collaboration: Glove Injury Data Mining Effort

    Science.gov (United States)

    Reid, C. R.; Benosn, E.; England, S.; Norcross, J. R.; McFarland, S. M.; Rajulu, S.

    2014-01-01

    Human hands play a significant role during extravehicular activity (EVA) missions and Neutral Buoyancy Lab (NBL) training events, as they are needed for translating and performing tasks in the weightless environment. It is because of this high frequency usage that hand- and arm-related injuries and discomfort are known to occur during training in the NBL and while conducting EVAs. Hand-related injuries and discomforts have been occurring to crewmembers since the days of Apollo. While there have been numerous engineering changes to the glove design, hand-related issues still persist. The primary objectives of this study are therefore to: 1) document all known EVA glove-related injuries and the circumstances of these incidents, 2) determine likely risk factors, and 3) recommend ergonomic mitigations or design strategies that can be implemented in the current and future glove designs. METHODS: The investigator team conducted an initial set of literature reviews, data mining of Lifetime Surveillance of Astronaut Health (LSAH) databases, and data distribution analyses to understand the ergonomic issues related to glove-related injuries and discomforts. The investigation focused on the injuries and discomforts of U.S. crewmembers who had worn pressurized suits and experienced glove-related incidents during the 1980 to 2010 time frame, either during training or on-orbit EVA. In addition to data mining of the LSAH database, the other objective of the study was to find complimentary sources of information such as training experience, EVA experience, suit-related sizing data, and hand-arm anthropometric data to be tied to the injury data from LSAH. RESULTS: Past studies indicated that the hand was the most frequently injured part of the body during both EVA and NBL training. This study effort thus focused primarily on crew training data in the NBL between 2002 and 2010. Of the 87 recorded training incidents, 19 occurred to women and 68 to men. While crew ages ranged from

  5. Detection Model for Seepage Behavior of Earth Dams Based on Data Mining

    Directory of Open Access Journals (Sweden)

    Zhenxiang Jiang

    2018-01-01

    Full Text Available Seepage behavior detecting is an important tool for ensuring the safety of earth dams. However, traditional seepage behavior detection methods have used insufficient monitoring data and have mainly focused on single-point measures and local seepage behavior. The seepage behavior of dams is not quantitatively detected based on the monitoring data with multiple measuring points. Therefore, this study uses data mining techniques to analyze the monitoring data and overcome the above-mentioned shortcomings. The massive seepage monitoring data with multiple points are used as the research object. The key information on seepage behavior is extracted using principal component analysis. The correlation between seepage behavior and upstream water level is described as mutual information. A detection model for overall seepage behavior is established. Result shows that the model can completely extract the seepage monitoring data with multiple points and quantitatively detect the overall seepage behavior of earth dams. The proposed method can provide a new and reasonable means of quantitatively detecting the overall seepage behavior of earth dams.

  6. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    Science.gov (United States)

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and

  7. Data mining with iPlant: a meeting report from the 2013 GARNet workshop, Data mining with iPlant.

    Science.gov (United States)

    Martin, Lisa; Cook, Charis; Matasci, Naim; Williams, Jason; Bastow, Ruth

    2015-01-01

    High-throughput sequencing technologies have rapidly moved from large international sequencing centres to individual laboratory benchtops. These changes have driven the 'data deluge' of modern biology. Submissions of nucleotide sequences to GenBank, for example, have doubled in size every year since 1982, and individual data sets now frequently reach terabytes in size. While 'big data' present exciting opportunities for scientific discovery, data analysis skills are not part of the typical wet bench biologist's experience. Knowing what to do with data, how to visualize and analyse them, make predictions, and test hypotheses are important barriers to success. Many researchers also lack adequate capacity to store and share these data, creating further bottlenecks to effective collaboration between groups and institutes. The US National Science Foundation-funded iPlant Collaborative was established in 2008 to form part of the data collection and analysis pipeline and help alleviate the bottlenecks associated with the big data challenge in plant science. Leveraging the power of high-performance computing facilities, iPlant provides free-to-use cyberinfrastructure to enable terabytes of data storage, improve analysis, and facilitate collaborations. To help train UK plant science researchers to use the iPlant platform and understand how it can be exploited to further research, GARNet organized a four-day Data mining with iPlant workshop at Warwick University in September 2013. This report provides an overview of the workshop, and highlights the power of the iPlant environment for lowering barriers to using complex bioinformatics resources, furthering discoveries in plant science research and providing a platform for education and outreach programmes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  8. Post-acquisition repetitive thought in fear conditioning: an experimental investigation of the effect of CS-US-rehearsal.

    Science.gov (United States)

    Joos, Els; Vansteenwegen, Debora; Hermans, Dirk

    2012-06-01

    Although repetitive thought (e.g., worry) is generally assumed to be a risk factor for psychopathological disorders such as anxiety disorders, the repetitive thought processes occurring after a conditioning event have not yet received much theoretical attention. However, as repetitive thought can be mimicked by (mental) rehearsal, which is well-known to enhance memory performance, it seems worthwhile to explore the role of rehearsal in conditioning. Therefore, the current study investigates the impact of rehearsing an acquired CS-US-contingency on subsequent conditioned fear responding. After acquiring two CS-US-contingencies with either a human scream or a white noise as US, participants were instructed to rehearse one of these CS-US-pairings during an experimental session as well as during the following week. Fear responding to the CS which was previously paired with the scream persisted in the participants who rehearsed the CS-US(scream)-contingency, but decreased in those participants who rehearsed the CS-US(noise)-contingency. The same pattern emerged in the US-expectancy ratings, but the effect failed to reach significance. For the CS which was paired with the noise-US, no rehearsal effect emerged. As acquisition to the noise-US was less pronounced and less robust as compared to the scream-US, claims regarding the rehearsal effect might be hampered for the CS-US(noise)-contingency. Repetitive post-acquisition activation of a CS-US-contingency impacts CR retention. As the USs were not rated as more intense, aversive or startling after rehearsal compared to post-acquisition, US-inflation is discarded as a possible explanation of this effect. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Data Mining in the Context of Monitoring Mt Etna, Italy

    Science.gov (United States)

    Aliotta, Marco; Cassisi, Carmelo; D'Agostino, Marcello; Falsaperla, Susanna; Ferrari, Ferruccio; Langer, Horst; Messina, Alfio; Montalto, Placido; Reitano, Danilo; Spampinato, Salvatore

    2015-04-01

    The persistent volcanic activity of Mt Etna makes the continuous monitoring of multidisciplinary data a first-class issue. Indeed, the monitoring systems rapidly accumulate huge quantity of data, arising specific problems of andling and interpretation. In order to respond to these problems, the INGV staff has developed a number of software tools for data mining. These tools have the scope of identifying structures in the data that can be related to volcanic activity, furnishing criteria for the identification of precursory scenarios. In particular, we use methods of clustering and classification in which data are divided into groups according to a-priori-defined measures of similarity or distance. Data groups may assume various shapes, such as convex clouds or complex concave bodies.The "KKAnalysis" software package is a basket of clustering methods. Currently, it is one of the key techniques of the tremor-based automatic alarm systems of INGV Osservatorio Etneo. It exploits both Self-Organizing Maps and Fuzzy Clustering. Beside seismic data, the software has been applied to the geochemical composition of eruptive products as well as a combined analysis of gas-emission (radon) and seismic data. The "DBSCAN" package exploits a concept based on density-based clustering. This method allows discovering clusters with arbitrary shape. Clusters are defined as dense regions of objects in the data space separated by regions of low density. In DBSCAN a cluster grows as long as the density within a group of objects exceeds some threshold. In the context of volcano monitoring, the method is particularly promising in the recognition of ash particles as they have a rather irregular shape. The "MOTIF" software allows us to identify typical waveforms in time series, outperforming methods like cross-correlation that entail a high computational effort. MOTIF can recognize the non-imilarity of two patterns on a small number of data points without going through the whole length of

  10. Overview on How Data Mining Tools May Support Cardiovascular Disease Prediction

    Directory of Open Access Journals (Sweden)

    Dan-Andrei Sitar-Taut

    2010-01-01

    Full Text Available Terms as knowledge discovery or KnowledgeDiscovery from Databases (KDD, Data Mining (DM, ArtificialIntelligence (AI, Machine Learning (ML, Artificial Neuralnetworks (ANN, decision tables and trees, gain from day to day,an increasing significance in medical data analysis. They permitthe identification, evaluation, and quantification of some lessvisible, intuitively unpredictable, by using generally large sets ofdata. Cardiology represents an extremely vast and importantdomain, having multiple and complex social and humanimplications. These are enough reasons to promote theresearches in this area, becoming shortly not just national orEuropean priorities, but also world-level ones. The profoundand multiple interwoven relationships among the cardiovascularrisk factors and cardiovascular diseases – but still far to becompletely discovered or understood – represent a niche forapplying IT&C modern and multidisciplinary tools in order tosolve the existing knowledge gaps.This paper’s aim is to present, by emphasizing their absoluteor relative pros and cons, several opportunities of applying DMtools in cardiology, more precisely in endothelial dysfunctiondiagnostic and quantification the relationships between theseand so-called “classical” cardiovascular risk factors.

  11. Improving diagnostic accuracy using agent-based distributed data mining system.

    Science.gov (United States)

    Sridhar, S

    2013-09-01

    The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

  12. Comparsion analysis of data mining models applied to clinical research in traditional Chinese medicine.

    Science.gov (United States)

    Zhao, Yufeng; Xie, Qi; He, Liyun; Liu, Baoyan; Li, Kun; Zhang, Xiang; Bai, Wenjing; Luo, Lin; Jing, Xianghong; Huo, Ruili

    2014-10-01

    To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine (TCM) diagnosis and therapy. Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies: symptoms, symptom patterns, herbs, and efficacy. Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes. The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared. By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.

  13. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  14. Methodologies of Knowledge Discovery from Data and Data Mining Methods in Mechanical Engineering

    Directory of Open Access Journals (Sweden)

    Rogalewicz Michał

    2016-12-01

    Full Text Available The paper contains a review of methodologies of a process of knowledge discovery from data and methods of data exploration (Data Mining, which are the most frequently used in mechanical engineering. The methodologies contain various scenarios of data exploring, while DM methods are used in their scope. The paper shows premises for use of DM methods in industry, as well as their advantages and disadvantages. Development of methodologies of knowledge discovery from data is also presented, along with a classification of the most widespread Data Mining methods, divided by type of realized tasks. The paper is summarized by presentation of selected Data Mining applications in mechanical engineering.

  15. Application Of Data Mining Techniques For Student Success And Failure Prediction The Case Of DebreMarkos University

    OpenAIRE

    Muluken Alemu Yehuala

    2015-01-01

    Abstract This research work has investigated the potential applicability of data mining technology to predict student success and failure cases on University students datasets. CRISP-DM Cross Industry Standard Process for Data mining is a data mining methodology to be used by the research. Classification and prediction data mining functionalities are used to extract hidden patterns from students data. These patterns can be seen in relation to different variables in the students records. The ...

  16. Proton Pump Inhibitors and the Risk for Fracture at Specific Sites: Data Mining of the FDA Adverse Event Reporting System.

    Science.gov (United States)

    Wang, Liwei; Li, Mei; Cao, Yuying; Han, Zhengqi; Wang, Xueju; Atkinson, Elizabeth J; Liu, Hongfang; Amin, Shreyasee

    2017-07-17

    Proton pump inhibitors (PPIs) are widely used to treat gastric acid-related disorders. Concerns have been raised about potential fracture risk, especially at the hip, spine and wrist. However, fracture risk at other bone sites has not been as well studied. We investigated the association between PPIs and specific fracture sites using an aggregated knowledge-enhanced database, the Food and Drug Administration Adverse Event Reporting System Data Mining Set (AERS-DM). Proportional reporting ratio (PRR) was used to detect statistically significant associations (signals) between PPIs and fractures. We analyzed both high level terms (HLT) and preferred terms (PT) for fracture sites, defined by MedDRA (Medical Dictionary for Regulatory Activities). Of PPI users reporting fractures, the mean age was 65.3 years and the female to male ratio was 3.4:1. Results revealed signals at multiple HLT and PT fracture sites, consistent for both sexes. These included fracture sites with predominant trabecular bone, not previously reported as being associated with PPIs, such as 'rib fractures', where signals were detected for overall PPIs as well as for each of 5 generic ingredients (insufficient data for dexlansoprazole). Based on data mining from AERS-DM, PPI use appears to be associated with an increased risk for fractures at multiple sites.

  17. Computing Infrastructure and Remote, Parallel Data Mining Engine for Virtual Observatories, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — SciberQuest, Inc. proposes to develop a state-of-the-art data mining engine that extends the functionality of Virtual Observatories (VO) from data portal to science...

  18. Data Mining – Innovative Method for Obtaining Information in Marketingand Business Management

    Directory of Open Access Journals (Sweden)

    Mirela-Cristina Voicu

    2011-05-01

    Full Text Available The existence of massive amounts of data raised the question of using their reorientation to a retrospective to a prospective operation. Data mining offers the promise of an important aid for discovering hidden patterns in data that can be used to predict the behavior of customers, products and processes. Data mining tools must be guided by users who understand the business, the general nature of the data and analytical methods involved. It discovers information within the data that queries and reports can’t effectively reveal. It is vital to collect data and prepare properly, to face reality models. Choosing the most appropriate product data mining is to find a tool with the capabilities required, an interface that matches the skills of users and can be applied in a specific business problem. In this context, the purpose of this paper is to illustrate some of the problems of company activity problems which can be solved by using data mining techniques.

  19. Data Mining and Information Technology: Its Impact on Intelligence Collection and Privacy Rights

    National Research Council Canada - National Science Library

    Soderberg, Eric; Glenney, William

    2007-01-01

    .... At a time when the threat environment has shifted in emphasis to COIN, terrorism, and cyber war, IT-enhanced data mining capabilities could provide some of the critical intelligence demanded by these types of threats...

  20. A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications

    Science.gov (United States)

    Grossman, Robert L.; Northcutt, Dave

    1996-01-01

    Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.

  1. Detecting Structural Damage of Nuclear Power Plant by Interactive Data Mining Approach

    International Nuclear Information System (INIS)

    Yufei Shu

    2006-01-01

    This paper presents a nonlinear structural damage identification technique, based on an interactive data mining approach, which integrates a human cognitive model in a data mining loop. A mining control agent emulating human analysts is developed, which directly interacts with the data miner, analyzing and verifying the output of the data miner and controlling the data mining process. Additionally, an artificial neural network method, which is adopted as a core component of the proposed interactive data mining method, is evolved by adding a novelty detecting and retraining function for handling complicated nuclear power plant quake-proof data. Plant quake-proof testing data has been applied to the system to show the validation of the proposed method. (author)

  2. Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds

    KAUST Repository

    Soufan, Othman

    2016-01-01

    compounds as candidate drugs for the treatment. Computational resources have been playing a significant role in this part through a step known as virtual screening. From a data mining perspective, availability of rich data resources is key in training

  3. A Case Investigation of Product Structure Complexity in Mass Customization Using a Data Mining Approach

    DEFF Research Database (Denmark)

    Nielsen, Peter; Brunø, Thomas Ditlev; Nielsen, Kjeld

    2014-01-01

    This paper presents a data mining method for analyzing historical configuration data providing a number of opportunities for improving mass customization capabilities. The overall objective of this paper is to investigate how specific quantitative analyses, more specifically the association rule...

  4. Maternal vaccination and preterm birth: using data mining as a screening tool

    DEFF Research Database (Denmark)

    Orozova-Bekkevold, Ivanka; Jensen, Henrik; Stensballe, Lone

    2007-01-01

    Objective The main purpose of this study was to identify possible associations between medicines used in pregnancy and preterm deliveries using data mining as a screening tool. Settings Prospective cohort study. Methods We used data mining to identify possible correlates between preterm delivery...... measure Preterm birth, a delivery occurring before the 259th day of gestation (i.e., less than 37 full weeks). Results Data mining had indicated that maternal vaccination (among other factors) might be related to preterm birth. The following regression analysis showed that, the women who reported being...... further studies. Data mining, especially with additional refinements, may be a valuable and very efficient tool to screen large databases for relevant information which can be used in clinical and public health research....

  5. Granular-relational data mining how to mine relational data in the paradigm of granular computing ?

    CERN Document Server

    Hońko, Piotr

    2017-01-01

    This book provides two general granular computing approaches to mining relational data, the first of which uses abstract descriptions of relational objects to build their granular representation, while the second extends existing granular data mining solutions to a relational case. Both approaches make it possible to perform and improve popular data mining tasks such as classification, clustering, and association discovery. How can different relational data mining tasks best be unified? How can the construction process of relational patterns be simplified? How can richer knowledge from relational data be discovered? All these questions can be answered in the same way: by mining relational data in the paradigm of granular computing! This book will allow readers with previous experience in the field of relational data mining to discover the many benefits of its granular perspective. In turn, those readers familiar with the paradigm of granular computing will find valuable insights on its application to mining r...

  6. Computing Infrastructure and Remote, Parallel Data Mining Engine for Virtual Observatories, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to develop a state-of-the-art data mining engine that extends the functionality of Virtual Observatories (VO) from data portal to science analysis...

  7. An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

    OpenAIRE

    Yogendra Kumar Jain,; Vinod Kumar Yadav,; Geetika S. Panday

    2011-01-01

    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and cru...

  8. A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

    OpenAIRE

    Hongwei Tian; Weining Zhang; Shouhuai Xu; Patrick Sharkey

    2012-01-01

    Privacy-preserving data mining (PPDM) is an important problem and is currently studied in three approaches: the cryptographic approach, the data publishing, and the model publishing. However, each of these approaches has some problems. The cryptographic approach does not protect privacy of learned knowledge models and may have performance and scalability issues. The data publishing, although is popular, may suffer from too much utility loss for certain types of data mining applications. The m...

  9. Use of Recurrent Neural Networks for Strategic Data Mining of Sales

    OpenAIRE

    Vadhavkar, Sanjeev; Shanmugasundaram, Jayavel; Gupta, Amar; Prasad, M.V. Nagendra

    2002-01-01

    An increasing number of organizations are involved in the development of strategic information systems for effective linkages with their suppliers, customers, and other channel partners involved in transportation, distribution, warehousing and maintenance activities. An efficient inter-organizational inventory management system based on data mining techniques is a significant step in this direction. This paper discusses the use of neural network based data mining and knowledge discovery techn...

  10. Data Mining: Comparing the Empiric CFS to the Canadian ME/CFS Case Definition

    OpenAIRE

    Jason, Leonard A.; Skendrovic, Beth; Furst, Jacob; Brown, Abigail; Weng, Angela; Bronikowski, Christine

    2011-01-01

    This article contrasts two case definitions for Myalgic Encephalomyelitis/chronic fatigue syndrome (ME/CFS). We compared the empiric CFS case definition (Reeves et al., 2005) and the Canadian ME/CFS Clinical case definition (Carruthers et al., 2003) with a sample of individuals with CFS versus those without. Data mining with decision trees was used to identify the best items to identify patients with CFS. Data mining is a statistical technique that was used to help determine which of the surv...

  11. A Framework for Investigating Influence of Organizational Decision Makers on Data Mining Process Achievement

    OpenAIRE

    Hanieh Hajisafari; Shaaban Elahi

    2012-01-01

    Currently, few studies deal with evaluation of data mining plans in context of solvng organizational problems. A successful data miner is searching to solve a fully defined business problem. To make the data mining (DM) results actionable, the data miner must explain them to the business insider. The interaction process between the business insiders and data miners is actually a knowledge-sharing process. In this study through representing a framwork, influence of organizational decision mak...

  12. A Review of Machine Learning and Data Mining Approaches for Business Applications in Social Networks

    OpenAIRE

    Evis Trandafili; Marenglen Biba

    2013-01-01

    Social networks have an outstanding marketing value and developing data mining methods for viral marketing is a hot topic in the research community. However, most social networks remain impossible to be fully analyzed and understood due to prohibiting sizes and the incapability of traditional machine learning and data mining approaches to deal with the new dimension in the learning process related to the large-scale environment where the data are produced. On one hand, the birth and evolution...

  13. A Data Mining and Survey Study on Diseases Associated with Paraesophageal Hernia

    OpenAIRE

    Yang, Jianji; Logan, Judith

    2006-01-01

    Paraesophageal hernia is a severe form of hiatal hernia, characterized by the upward dislocation of the gastric fundus into the thoracic cavity. In this study, the 1999 National Inpatient Sample dataset of the Healthcare Cost and Utilization Project was analyzed using data mining techniques to explore disorders associated with paraesophageal hernia. The result of this data mining process was compared with a subsequent expert knowledge survey of 97 gastrointestinal tract surgeons. This two-ste...

  14. Assessing the effectiveness of sustainable land management policies for combating desertification: A data mining approach.

    Science.gov (United States)

    Salvati, L; Kosmas, C; Kairis, O; Karavitis, C; Acikalin, S; Belgacem, A; Solé-Benet, A; Chaker, M; Fassouli, V; Gokceoglu, C; Gungor, H; Hessel, R; Khatteli, H; Kounalaki, A; Laouina, A; Ocakoglu, F; Ouessar, M; Ritsema, C; Sghaier, M; Sonmez, H; Taamallah, H; Tezcan, L; de Vente, J; Kelly, C; Colantoni, A; Carlucci, M

    2016-12-01

    This study investigates the relationship between fine resolution, local-scale biophysical and socioeconomic contexts within which land degradation occurs, and the human responses to it. The research draws on experimental data collected under different territorial and socioeconomic conditions at 586 field sites in five Mediterranean countries (Spain, Greece, Turkey, Tunisia and Morocco). We assess the level of desertification risk under various land management practices (terracing, grazing control, prevention of wildland fires, soil erosion control measures, soil water conservation measures, sustainable farming practices, land protection measures and financial subsidies) taken as possible responses to land degradation. A data mining approach, incorporating principal component analysis, non-parametric correlations, multiple regression and canonical analysis, was developed to identify the spatial relationship between land management conditions, the socioeconomic and environmental context (described using 40 biophysical and socioeconomic indicators) and desertification risk. Our analysis identified a number of distinct relationships between the level of desertification experienced and the underlying socioeconomic context, suggesting that the effectiveness of responses to land degradation is strictly dependent on the local biophysical and socioeconomic context. Assessing the latent relationship between land management practices and the biophysical/socioeconomic attributes characterizing areas exposed to different levels of desertification risk proved to be an indirect measure of the effectiveness of field actions contrasting land degradation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Data Exploration and Analysis of Alternative Learning System Accreditation and Equivalency Test Result Using Data Mining

    Science.gov (United States)

    Talingdan, J. A.; Trinidad, J. T., Jr.; Palaoag, T. D.

    2018-03-01

    Alternative Learning System (ALS) is a subsystem of Depatment of Education (DepEd) that serves as an option of learners who cannot afford to go in a formal education. The research focuses on the data exploration and analysis of ALS accreditation and equivalency test result using data mining. The ALS 2014 to 2016 A & E test results in the secondary level were used as data sets in the study. The A & E test results revealed that the passing rate is doubled per year. The results were clustered using k- means clustering algorithm and they were grouped into good, medium, and low standard learners to identify students need exceptional stuff for enhancement. From the clustered data, it was found out that the strand they are weak in is strand 4 which is the Development of Self and a Sense of Community with a general average of 84.23. It also revealed that the essay type of exam got the lowest score with a general average of 2.14 compared to the multiple type of exam that covers the five learning strands. Furthermore, decision tree and naive bayes were also employed in the study to predict the performance of the learners in the A & E test and determine which is better to use for prediction. It was concluded that naive bayes performs better because the accuracy rate is higher than the decision tree algorithm.

  16. [Establishment of data warehouse of needling and moxibustion literature based on data mining].

    Science.gov (United States)

    Wang, Jian-Ling; Li, Ren-Ling; Jia, Chun-Sheng

    2012-02-01

    In order to explore the efficacy specificity and valuable rules of clinical application of needling and moxibustion methods in a large quantity of information from literature, a data warehouse needs being established. On the basis of the original databases of red-hot needle therapy and hydro-acupuncture therapy, and the newly-established databases of acupoint catgut embedding therapy, acupoint application therapy, etc., and in accordance with the characteristics of different types of needling-moxibustion literature information, databases on different subjects were established first. These subject databases constitute a general "literature data warehouse on needling moxibustion methods" composing of multi-subjects and multiple dimensions so as to discover useful regularities about clinical treatment and trials collected in the literature by using data mining techniques. In the present paper, the authors introduce the design of the data warehouse, determination of subjects, establishment of subject relations, application of the administration platform, and application of data. This data warehouse will provide a standard data representation mode, enlarge data attributes and create extensive data links among literature information in the network, and may bring us with considerable convenience and profits in clinical application decision making and scientific research about needling-moxibustion techniques.

  17. Entrepreneurial subjects in forestry and data mining from accounting data in the Czech Republic

    Directory of Open Access Journals (Sweden)

    Zbyněk Šmída

    2004-01-01

    Full Text Available Forests owned by the state in the Czech Republic are managed by Forests of the Czech Republic, state enterprise with its headquarters in Hradec Králové. The private companies (established during the economic reform in 1992 and privatization in 1994 carry out silvicultural and logging activities in state forests on the basis of contracts. This study is focused on forest enterprises (contractors; the current situation of business environment in the Czech Republic was studied. There have been found 38 236 forestry entrepreneurs in the Czech Republic, and divided according to legal title, to numbers of employees onto groups on the basis of size and availability of their accounting data in the first part of the article.The second part deals with data mining from accounting by a process known as a Financial statement analysis, which has to make an informed decision for owners or managers of the enterprise. Ratio analysis is regarded as the basic methodical instrument of financial analysis. Ratio analysis effectively summarizes multiple financial statement categories into few relative indices of performance and financial position. It is powerful method for managing with the complexity and volume data presented in financial statements. The relative indices converse financial statement categories into measures and it helps control for differences across companies and across time. This article contains chosen forestry contractors and describes the most useful economic indicators (ratios and takes into account possible utilization in the sector generally.

  18. Using data mining to segment healthcare markets from patients' preference perspectives.

    Science.gov (United States)

    Liu, Sandra S; Chen, Jie

    2009-01-01

    This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.

  19. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    Science.gov (United States)

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (pmachine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future

  20. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  1. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

    Science.gov (United States)

    Winkler, Robert

    2015-01-01

    In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models

  2. Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting

    Directory of Open Access Journals (Sweden)

    Shailendra Singh

    2018-02-01

    Full Text Available Responsible, efficient and environmentally aware energy consumption behavior is becoming a necessity for the reliable modern electricity grid. In this paper, we present an intelligent data mining model to analyze, forecast and visualize energy time series to uncover various temporal energy consumption patterns. These patterns define the appliance usage in terms of association with time such as hour of the day, period of the day, weekday, week, month and season of the year as well as appliance-appliance associations in a household, which are key factors to infer and analyze the impact of consumers’ energy consumption behavior and energy forecasting trend. This is challenging since it is not trivial to determine the multiple relationships among different appliances usage from concurrent streams of data. Also, it is difficult to derive accurate relationships between interval-based events where multiple appliance usages persist for some duration. To overcome these challenges, we propose unsupervised data clustering and frequent pattern mining analysis on energy time series, and Bayesian network prediction for energy usage forecasting. We perform extensive experiments using real-world context-rich smart meter datasets. The accuracy results of identifying appliance usage patterns using the proposed model outperformed Support Vector Machine (SVM and Multi-Layer Perceptron (MLP at each stage while attaining a combined accuracy of 81.82%, 85.90%, 89.58% for 25%, 50% and 75% of the training data size respectively. Moreover, we achieved energy consumption forecast accuracies of 81.89% for short-term (hourly and 75.88%, 79.23%, 74.74%, and 72.81% for the long-term; i.e., day, week, month, and season respectively.

  3. Visualization and Integrated Data Mining of Disparate Information

    Energy Technology Data Exchange (ETDEWEB)

    Saffer, Jeffrey D.(OMNIVIZ, INC); Albright, Cory L.(BATTELLE (PACIFIC NW LAB)); Calapristi, Augustin J.(BATTELLE (PACIFIC NW LAB)); Chen, Guang (OMNIVIZ, INC); Crow, Vernon L.(BATTELLE (PACIFIC NW LAB)); Decker, Scott D.(BATTELLE (PACIFIC NW LAB)); Groch, Kevin M.(BATTELLE (PACIFIC NW LAB)); Havre, Susan L.(BATTELLE (PACIFIC NW LAB)); Malard, Joel (BATTELLE (PACIFIC NW LAB)); Martin, Tonya J.(BATTELLE (PACIFIC NW LAB)); Miller, Nancy E.(BATTELLE (PACIFIC NW LAB)); Monroe, Philip J.(OMNIVIZ, INC); Nowell, Lucy T.(BATTELLE (PACIFIC NW LAB)); Payne, Deborah A.(BATTELLE (PACIFIC NW LAB)); Reyes Spindola, Jorge F.(BATTELLE (PACIFIC NW LAB)); Scarberry, Randall E.(OMNIVIZ, INC); Sofia, Heidi J.(BATTELLE (PACIFIC NW LAB)); Stillwell, Lisa C.(OMNIVIZ, INC); Thomas, Gregory S.(BATTELLE (PACIFIC NW LAB)); Thurston, Sarah J.(OMNIVIZ, INC); Williams, Leigh K.(BATTELLE (PACIFIC NW LAB)); Zabriskie, Sean J.(OMNIVIZ, INC); MG Hicks

    2001-05-11

    The volumes and diversity of information in the discovery, development, and business processes within the chemical and life sciences industries require new approaches for analysis. Traditional list- or spreadsheet-based methods are easily overwhelmed by large amounts of data. Furthermore, generating strong hypotheses and, just as importantly, ruling out weak ones, requires integration across different experimental and informational sources. We have developed a framework for this integration, including common conceptual data models for multiple data types and linked visualizations that provide an overview of the entire data set, a measure of how each data record is related to every other record, and an assessment of the associations within the data set.

  4. Improve Data Mining and Knowledge Discovery through the use of MatLab

    Science.gov (United States)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and

  5. A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

    Science.gov (United States)

    Pérez, Joaquín; Iturbide, Emmanuel; Olivares, Víctor; Hidalgo, Miguel; Martínez, Alicia; Almanza, Nelva

    2015-11-01

    It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50% or up to 70% of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.

  6. The use of data mining by private health insurance companies and customers' privacy.

    Science.gov (United States)

    Al-Saggaf, Yeslam

    2015-07-01

    This article examines privacy threats arising from the use of data mining by private Australian health insurance companies. Qualitative interviews were conducted with key experts, and Australian governmental and nongovernmental websites relevant to private health insurance were searched. Using Rationale, a critical thinking tool, the themes and considerations elicited through this empirical approach were developed into an argument about the use of data mining by private health insurance companies. The argument is followed by an ethical analysis guided by classical philosophical theories-utilitarianism, Mill's harm principle, Kant's deontological theory, and Helen Nissenbaum's contextual integrity framework. Both the argument and the ethical analysis find the use of data mining by private health insurance companies in Australia to be unethical. Although private health insurance companies in Australia cannot use data mining for risk rating to cherry-pick customers and cannot use customers' personal information for unintended purposes, this article nonetheless concludes that the secondary use of customers' personal information and the absence of customers' consent still suggest that the use of data mining by private health insurance companies is wrong.

  7. Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance

    Directory of Open Access Journals (Sweden)

    Kuo-Chung Lin

    2012-04-01

    Full Text Available The health insurance claims application case the inspection usually relies on experts’ experience for verification and experienced personnel in charge for checking. However, due to the heavy work load and the insufficiency of manpower and experience, the ratio of miscarriages of justice is high, leading to improper settlement of claims and the waste of social resources. This paper takes advantage of data-mining technology to design models and find out cases requiring for manual inspection so as to save time and manpower. Six models are designed in this paper. By the analysis of the 20/80 principle and the coverage and accuracy ratio, a great number of periodic data (over 2 million records are fed back to the data-mining models after repetitive verification. Also, it is discovered that to integrate the data-mining technology and feed back to different business stages so as to establish early warning system will be an important topic for the health insurance system in hospital’s EMR in the future. Meanwhile, as the information acquired by data-mining needs to be stored and the traditional database technology has limitations. Next time, this paper explores the ontology framework to be set up by semantic network technology in the future in order to assist the storage of knowledge gained by data-mining.

  8. A Review of Financial Accounting Fraud Detection based on Data Mining Techniques

    Science.gov (United States)

    Sharma, Anuj; Kumar Panigrahi, Prabin

    2012-02-01

    With an upsurge in financial accounting fraud in the current economic scenario experienced, financial accounting fraud detection (FAFD) has become an emerging topic of great importance for academic, research and industries. The failure of internal auditing system of the organization in identifying the accounting frauds has lead to use of specialized procedures to detect financial accounting fraud, collective known as forensic accounting. Data mining techniques are providing great aid in financial accounting fraud detection, since dealing with the large data volumes and complexities of financial data are big challenges for forensic accounting. This paper presents a comprehensive review of the literature on the application of data mining techniques for the detection of financial accounting fraud and proposes a framework for data mining techniques based accounting fraud detection. The systematic and comprehensive literature review of the data mining techniques applicable to financial accounting fraud detection may provide a foundation to future research in this field. The findings of this review show that data mining techniques like logistic models, neural networks, Bayesian belief network, and decision trees have been applied most extensively to provide primary solutions to the problems inherent in the detection and classification of fraudulent data.

  9. A Survey on Distributed Mobile Database and Data Mining

    Science.gov (United States)

    Goel, Ajay Mohan; Mangla, Neeraj; Patel, R. B.

    2010-11-01

    The anticipated increase in popular use of the Internet has created more opportunity in information dissemination, Ecommerce, and multimedia communication. It has also created more challenges in organizing information and facilitating its efficient retrieval. In response to this, new techniques have evolved which facilitate the creation of such applications. Certainly the most promising among the new paradigms is the use of mobile agents. In this paper, mobile agent and distributed database technologies are applied in the banking system. Many approaches have been proposed to schedule data items for broadcasting in a mobile environment. In this paper, an efficient strategy for accessing multiple data items in mobile environments and the bottleneck of current banking will be proposed.

  10. Implementasi Data Warehouse dan Data Mining: Studi Kasus Analisis Peminatan Studi Siswa

    Directory of Open Access Journals (Sweden)

    Eka Miranda

    2011-06-01

    Full Text Available This paper discusses the implementation of data mining and their role in helping decision-making related to students’ specialization program selection. Currently, the university uses a database to store records of transactions which can not directly be used to assist analysis and decision making. Based on these issues then made the data warehouse design used to store large amounts of data and also has the potential to gain new data distribution perspectives and allows to answer the ad hoc question as well as to perform data analysis. The method used consists of: record analysis related to students’ academic achievement, designing data warehouse and data mining. The paper’s results are in a form of data warehouse and data mining design and its implementation with the classification techniques and association rules. From these results can be seen the students’ tendency and pattern background in choosing the specialization, to help them make decisions. 

  11. Studies of MHD stability using data mining technique in helical plasmas

    International Nuclear Information System (INIS)

    Yamamoto, Satoshi; Pretty, David; Blackwell, Boyd

    2010-01-01

    Data mining techniques, which automatically extract useful knowledge from large datasets, are applied to multichannel magnetic probe signals of several helical plasmas in order to identify and classify MHD instabilities in helical plasmas. This method is useful to find new MHD instabilities as well as previously identified ones. Moreover, registering the results obtained from data mining in a database allows us to investigate the characteristics of MHD instabilities with parameter studies. We introduce the data mining technique consisted of pre-processing, clustering and visualizations using results from helical plasmas in H-1 and Heliotron J. We were successfully able to classify the MHD instabilities using the criterion of phase differences of each magnetic probe and identify them as energetic-ion-driven MHD instabilities using parameter study in Heliotron J plasmas. (author)

  12. A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques

    Science.gov (United States)

    Uswatun Khasanah, Annisa; Harwati

    2017-06-01

    Student’s performance prediction is essential to be conducted for a university to prevent student fail. Number of student drop out is one of parameter that can be used to measure student performance and one important point that must be evaluated in Indonesia university accreditation. Data Mining has been widely used to predict student’s performance, and data mining that applied in this field usually called as Educational Data Mining. This study conducted Feature Selection to select high influence attributes with student performance in Department of Industrial Engineering Universitas Islam Indonesia. Then, two popular classification algorithm, Bayesian Network and Decision Tree, were implemented and compared to know the best prediction result. The outcome showed that student’s attendance and GPA in the first semester were in the top rank from all Feature Selection methods, and Bayesian Network is outperforming Decision Tree since it has higher accuracy rate.

  13. Conformational dynamics of ATP/Mg:ATP in motor proteins via data mining and molecular simulation

    Science.gov (United States)

    Bojovschi, A.; Liu, Ming S.; Sadus, Richard J.

    2012-08-01

    The conformational diversity of ATP/Mg:ATP in motor proteins was investigated using molecular dynamics and data mining. Adenosine triphosphate (ATP) conformations were found to be constrained mostly by inter cavity motifs in the motor proteins. It is demonstrated that ATP favors extended conformations in the tight pockets of motor proteins such as F1-ATPase and actin whereas compact structures are favored in motor proteins such as RNA polymerase and DNA helicase. The incorporation of Mg2+ leads to increased flexibility of ATP molecules. The differences in the conformational dynamics of ATP/Mg:ATP in various motor proteins was quantified by the radius of gyration. The relationship between the simulation results and those obtained by data mining of motor proteins available in the protein data bank is analyzed. The data mining analysis of motor proteins supports the conformational diversity of the phosphate group of ATP obtained computationally.

  14. An application of data mining in district heating substations for improving energy performance

    Science.gov (United States)

    Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing

    2017-11-01

    Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.

  15. Electronic structure prediction via data-mining the empirical pseudopotential method

    Energy Technology Data Exchange (ETDEWEB)

    Zenasni, H; Aourag, H [LEPM, URMER, Departement of Physics, University Abou Bakr Belkaid, Tlemcen 13000 (Algeria); Broderick, S R; Rajan, K [Department of Materials Science and Engineering, Iowa State University, Ames, Iowa 50011-2230 (United States)

    2010-01-15

    We introduce a new approach for accelerating the calculation of the electronic structure of new materials by utilizing the empirical pseudopotential method combined with data mining tools. Combining data mining with the empirical pseudopotential method allows us to convert an empirical approach to a predictive approach. Here we consider tetrahedrally bounded III-V Bi semiconductors, and through the prediction of form factors based on basic elemental properties we can model the band structure and charge density for these semi-conductors, for which limited results exist. This work represents a unique approach to modeling the electronic structure of a material which may be used to identify new promising semi-conductors and is one of the few efforts utilizing data mining at an electronic level. (Abstract Copyright [2010], Wiley Periodicals, Inc.)

  16. Data mining for signals in spontaneous reporting databases: proceed with caution.

    Science.gov (United States)

    Stephenson, Wendy P; Hauben, Manfred

    2007-04-01

    To provide commentary and points of caution to consider before incorporating data mining as a routine component of any Pharmacovigilance program, and to stimulate further research aimed at better defining the predictive value of these new tools as well as their incremental value as an adjunct to traditional methods of post-marketing surveillance. Commentary includes review of current data mining methodologies employed and their limitations, caveats to consider in the use of spontaneous reporting databases and caution against over-confidence in the results of data mining. Future research should focus on more clearly delineating the limitations of the various quantitative approaches as well as the incremental value that they bring to traditional methods of pharmacovigilance.

  17. Exploring the potential of data mining techniques for the analysis of accident patterns

    DEFF Research Database (Denmark)

    Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

    2010-01-01

    Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...

  18. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

    Science.gov (United States)

    Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

  19. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    Directory of Open Access Journals (Sweden)

    Lahiru Iddamalgoda

    2016-08-01

    Full Text Available Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification and scoring based prioritization methods for determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI methods in conjunction with the K nearest neighbors’ could be used in accurately categorizing the genetic factors in disease causation

  20. Data mining and knowledge discovery for big data methodologies, challenge and opportunities

    CERN Document Server

    2014-01-01

    The field of data mining has made significant and far-reaching advances over the past three decades.  Because of its potential power for solving complex problems, data mining has been successfully applied to diverse areas such as business, engineering, social media, and biological science. Many of these applications search for patterns in complex structural information. In biomedicine for example, modeling complex biological systems requires linking knowledge across many levels of science, from genes to disease.  Further, the data characteristics of the problems have also grown from static to dynamic and spatiotemporal, complete to incomplete, and centralized to distributed, and grow in their scope and size (this is known as big data). The effective integration of big data for decision-making also requires privacy preservation. The contributions to this monograph summarize the advances of data mining in the respective fields. This volume consists of nine chapters that address subjects ranging from mining da...

  1. Data Mining CMMSs: How to Convert Data into Knowledge.

    Science.gov (United States)

    Fennigkoh, Larry; Nanney, D Courtney

    2018-01-01

    Although the healthcare technology management (HTM) community has decades of accumulated medical device-related maintenance data, little knowledge has been gleaned from these data. Finding and extracting such knowledge requires the use of the well-established, but admittedly somewhat foreign to HTM, application of inferential statistics. This article sought to provide a basic background on inferential statistics and describe a case study of their application, limitations, and proper interpretation. The research question associated with this case study involved examining the effects of ventilator preventive maintenance (PM) labor hours, age, and manufacturer on needed unscheduled corrective maintenance (CM) labor hours. The study sample included more than 21,000 combined PM inspections and CM work orders on 2,045 ventilators from 26 manufacturers during a five-year period (2012-16). A multiple regression analysis revealed that device age, manufacturer, and accumulated PM inspection labor hours all influenced the amount of CM labor significantly (P < 0.001). In essence, CM labor hours increased with increasing PM labor. However, and despite the statistical significance of these predictors, the regression analysis also indicated that ventilator age, manufacturer, and PM labor hours only explained approximately 16% of all variability in CM labor, with the remainder (84%) caused by other factors that were not included in the study. As such, the regression model obtained here is not suitable for predicting ventilator CM labor hours.

  2. Smart-card-based automatic meal record system intervention tool for analysis using data mining approach.

    Science.gov (United States)

    Zenitani, Satoko; Nishiuchi, Hiromu; Kiuchi, Takahiro

    2010-04-01

    The Smart-card-based Automatic Meal Record system for company cafeterias (AutoMealRecord system) was recently developed and used to monitor employee eating habits. The system could be a unique nutrition assessment tool for automatically monitoring the meal purchases of all employees, although it only focuses on company cafeterias and has never been validated. Before starting an interventional study, we tested the reliability of the data collected by the system using the data mining approach. The AutoMealRecord data were examined to determine if it could predict current obesity. All data used in this study (n = 899) were collected by a major electric company based in Tokyo, which has been operating the AutoMealRecord system for several years. We analyzed dietary patterns by principal component analysis using data from the system and extracted 5 major dietary patterns: healthy, traditional Japanese, Chinese, Japanese noodles, and pasta. The ability to predict current body mass index (BMI) with dietary preference was assessed with multiple linear regression analyses, and in the current study, BMI was positively correlated with male gender, preference for "Japanese noodles," mean energy intake, protein content, and frequency of body measurement at a body measurement booth in the cafeteria. There was a negative correlation with age, dietary fiber, and lunchtime cafeteria use (R(2) = 0.22). This regression model predicted "would-be obese" participants (BMI >or= 23) with 68.8% accuracy by leave-one-out cross validation. This shows that there was sufficient predictability of BMI based on data from the AutoMealRecord System. We conclude that the AutoMealRecord system is valuable for further consideration as a health care intervention tool. Copyright 2010 Elsevier Inc. All rights reserved.

  3. A systematic review of data mining and machine learning for air pollution epidemiology.

    Science.gov (United States)

    Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

    2017-11-28

    Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air

  4. From data mining rules to medical logical modules and medical advices.

    Science.gov (United States)

    Gomoi, Valentin; Vida, Mihaela; Robu, Raul; Stoicu-Tivadar, Vasile; Bernad, Elena; Lupşe, Oana

    2013-01-01

    Using data mining in collaboration with Clinical Decision Support Systems adds new knowledge as support for medical diagnosis. The current work presents a tool which translates data mining rules supporting generation of medical advices to Arden Syntax formalism. The developed system was tested with data related to 2326 births that took place in 2010 at the Bega Obstetrics - Gynaecology Hospital, Timişoara. Based on processing these data, 14 medical rules regarding the Apgar score were generated and then translated in Arden Syntax language.

  5. Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning

    Science.gov (United States)

    Prabakaran, S.; Mitra, Shilpa

    2018-04-01

    Data mining is the field containing procedures for finding designs or patterns in a huge dataset, it includes strategies at the convergence of machine learning and database framework. It can be applied to various fields like future healthcare, market basket analysis, education, manufacturing engineering, crime investigation etc. Among these, crime investigation is an interesting application to process crime characteristics to help the society for a better living. This paper survey various data mining techniques used in this domain. This study may be helpful in designing new strategies for crime prediction and analysis.

  6. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

    Science.gov (United States)

    Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S

    2016-01-01

    Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.

  7. Advances in research methods for information systems research data mining, data envelopment analysis, value focused thinking

    CERN Document Server

    Osei-Bryson, Kweku-Muata

    2013-01-01

    Advances in social science research methodologies and data analytic methods are changing the way research in information systems is conducted. New developments in statistical software technologies for data mining (DM) such as regression splines or decision tree induction can be used to assist researchers in systematic post-positivist theory testing and development. Established management science techniques like data envelopment analysis (DEA), and value focused thinking (VFT) can be used in combination with traditional statistical analysis and data mining techniques to more effectively explore

  8. A Survey on Accessing Data over Cloud Environment using Data mining Algorithms

    OpenAIRE

    B.Prasanalakshmi; A.Selvaraj

    2015-01-01

    In today's world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large s...

  9. Analisis Data Lulusan dengan Data Mining untuk Mendukung Strategi Promosi Universitas Lancang Kuning

    Directory of Open Access Journals (Sweden)

    Elvira Asril

    2015-11-01

    Full Text Available Setiap perusahaan maupun organisasi yang ingin tetap bertahan perlu untuk menentukan strategi promosi yang tepat. Penentuan strategi promosi yang tepat akan dapat mengurangi biaya promosi dan mencapai sasaran promosi yang tepat. Salah satu cara yang dapat dilakukan untuk penentuan strategi promosi adalah dengan menggunakan teknik data mining. Teknik data mining yang digunakan dalam hal ini adalah dengan menggunakan algoritma Clustering K-Means. Clustering merupakan pengelompokkan record, observasi, atau kasus ke dalam kelas-kelas objek yang mirip. K-Means adalah metode klaster data non-hirarkis yang mencoba untuk membagi data ke dalam satu atau lebih klaster. Penelitian dilakukan dengan mengamati beberapa variabel penelitian yang sering dipertimbangkan oleh perguruan tinggi dalam menentukan sasaran promosinya yaitu asal sekolah, daerah, dan jurusan. Hasil penelitian ini adalah berupa pola menarik hasil data mining yang merupakan informasi penting untuk mendukung strategi promosi yang tepat dalam mendapatkan calon mahasiswa baru.Kata kunci: Data Mining, Clustering, K-Means Each company or organization that wants to survive needs to determine appropriate promotional strategies. Determination of appropriate promotional strategies will be able to reduce costs and achieve the goals the promotion of proper promotion. One way that can be done to determine campaign strategy is to use data mining techniques. Data mining techniques used in this case is to use a K-Means clustering algorithm. Clustering is the grouping of records, observation, or in the case of the object classes that are similar. K-Means is a method of non-hierarchical clustering of data that is trying to divide the data into one or more clusters. The study was conducted by observing some of the variables that are often considered by the college in determining the target of promotion that the school of origin, region, and department. Results of this study are interesting pattern of

  10. Combination of complementary data mining methods for geographical characterization of extra virgin olive oils based on mineral composition.

    Science.gov (United States)

    Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles

    2018-09-30

    This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Multiple comparisons permutation test for image based data mining in radiotherapy

    NARCIS (Netherlands)

    Chen, Chun; Witte, Marnix; Heemsbergen, Wilma; van Herk, Marcel

    2013-01-01

    : Comparing incidental dose distributions (i.e. images) of patients with different outcomes is a straightforward way to explore dose-response hypotheses in radiotherapy. In this paper, we introduced a permutation test that compares images, such as dose distributions from radiotherapy, while tackling

  12. Geographical variation in morphometry, craniometry, and diet of amammalian species (Stone marten, Martes foina) using data mining

    OpenAIRE

    PAPAKOSTA, MALAMATI; KITIKIDOU, KYRIAKI; BAKALOUDIS, DIMITRIOS; VLACHOS, CHRISTOS; CHATZINIKOS, EVANGELOS; ALEXANDROU, OLGA; SAKOULIS, ANASTASIOS

    2018-01-01

    Ecologists use various data mining techniques to make predictions and estimations, to identify patterns in datasets and relationships between qualitative and quantitative variables, or to classify variables. The aim of this study was to investigate if the application of data mining could be used to study geographical variation in the morphometry, craniometry, and diet of a mammalian species (Martes foina), and to determine whether data mining can complement genetic analysis to recognize subsp...

  13. Perancangan Data Mining Untuk Analisis Kriteria Nasabah Kredit Yang Potensial Dan Manfaatnya Untuk Customer Relationship Management Perbankan

    OpenAIRE

    Kurniawan, Putu Sukma

    2015-01-01

    The presence of data mining problems caused by the explosion of data experienced by many organizations that have accumulated so many years of data (purchasing data, sales data, customer data, transaction data, and others). Examples of industries that use data mining is the banking industry. There are still many banks using conventional methods in the analysis of their customers. This would lead to high operating costs for the bank. The concept of data mining can help banks to get a better ana...

  14. Knowledge discovery from models of soil properties developed through data mining

    NARCIS (Netherlands)

    Bui, E.N.; Henderson, B.L.; Viergever, K.

    2006-01-01

    We modelled the distribution of soil properties across the agricultural zone on the Australian continent using data mining and knowledge discovery from databases (DM&KDD) tools. Piecewise linear tree models were built choosing from 19 climate variables, digital elevation model (DEM) and derived

  15. Short Term Prediction of Highway Travel Time using GUHA Data Mining Method

    Czech Academy of Sciences Publication Activity Database

    Coufal, David; Turunen, E.

    2004-01-01

    Roč. 14, - (2004), s. 221-231 ISSN 1210-0552 R&D Projects: GA MŠk OC 274.001 Grant - others:COST(XE) Action 274 TARSKI Institutional research plan: CEZ:AV0Z1030915 Keywords : data mining * many-valued logic Subject RIV: BA - General Mathematics

  16. Data warehousing as a basis for web-based documentation of data mining and analysis.

    Science.gov (United States)

    Karlsson, J; Eklund, P; Hallgren, C G; Sjödin, J G

    1999-01-01

    In this paper we present a case study for data warehousing intended to support data mining and analysis. We also describe a prototype for data retrieval. Further we discuss some technical issues related to a particular choice of a patient record environment.

  17. Development of a Workbench to Address the Educational Data Mining Bottleneck

    Science.gov (United States)

    Rodrigo, Ma. Mercedes T.; Baker, Ryan S. J. d.; McLaren, Bruce M.; Jayme, Alejandra; Dy, Thomas T.

    2012-01-01

    In recent years, machine-learning software packages have made it easier for educational data mining researchers to create real-time detectors of cognitive skill as well as of metacognitive and motivational behavior that can be used to improve student learning. However, there remain challenges to overcome for these methods to become available to…

  18. Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis.

    Science.gov (United States)

    Sigurdardottir, Arun K; Jonsdottir, Helga; Benediktsson, Rafn

    2007-07-01

    To analyze which factors contribute to improvement in glycemic control in educational interventions in type 2 diabetes reported in randomized controlled trials (RCT) published in 2001-2005. Papers were extracted from Medline and Scopus using educational intervention and adults with type 2 diabetes as keywords. Inclusion criteria were RCT design. Data were analyzed with a data-mining program. Of 464 titles extracted, 21 articles reporting 18 studies met the inclusion criteria. Data mining showed that for initial glycosylated hemoglobin (HbA1c) level education intervention achieved a small change in HbA1c level, or from +0.1 to -0.7%. For initial HbA1c > or = 8.0%, a significant drop in HbA1c level of 0.8-2.5% was found. Data mining indicated that duration, educational content and intensity of education did not predict changes in HbA1c levels. Initial HbA1c level is the single most important factor affecting improvements in glycemic control in response to patient education. Data mining is an appropriate and sufficiently sensitive method to analyze outcomes of educational interventions. Diversity in conceptualization of interventions and diversity of instruments used for outcome measurements could have hampered actual discovery of effective educational practices. Participation in educational interventions generally seems to benefit people with type 2 diabetes. Use of standardized instruments is encouraged as it gives better opportunities to identify conclusive results with consequent development of clinical guidelines.

  19. Population Validity for Educational Data Mining Models: A Case Study in Affect Detection

    Science.gov (United States)

    Ocumpaugh, Jaclyn; Baker, Ryan; Gowda, Sujith; Heffernan, Neil; Heffernan, Cristina

    2014-01-01

    Information and communication technology (ICT)-enhanced research methods such as educational data mining (EDM) have allowed researchers to effectively model a broad range of constructs pertaining to the student, moving from traditional assessments of knowledge to assessment of engagement, meta-cognition, strategy and affect. The automated…

  20. A Meta-Analysis of Educational Data Mining on Improvements in Learning Outcomes

    Science.gov (United States)

    AlShammari, Iqbal A.; Aldhafiri, Mohammed D.; Al-Shammari, Zaid

    2013-01-01

    A meta-synthesis study was conducted of 60 research studies on educational data mining (EDM) and their impacts on and outcomes for improving learning outcomes. After an overview, an examination of these outcomes is provided (Romero, Ventura, Espejo, & Hervas, 2008; Romero, "et al.", 2011). Then, a review of other EDM-related research…

  1. Application of Learning Analytics Using Clustering Data Mining for Students' Disposition Analysis

    Science.gov (United States)

    Bharara, Sanyam; Sabitha, Sai; Bansal, Abhay

    2018-01-01

    Learning Analytics (LA) is an emerging field in which sophisticated analytic tools are used to improve learning and education. It draws from, and is closely tied to, a series of other fields of study like business intelligence, web analytics, academic analytics, educational data mining, and action analytics. The main objective of this research…

  2. The State of Educational Data Mining in 2009: A Review and Future Visions

    Science.gov (United States)

    Baker, Ryan S. J. D.; Yacef, Kalina

    2009-01-01

    We review the history and current trends in the field of Educational Data Mining (EDM). We consider the methodological profile of research in the early years of EDM, compared to in 2008 and 2009, and discuss trends and shifts in the research conducted by this community. In particular, we discuss the increased emphasis on prediction, the emergence…

  3. Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence

    Science.gov (United States)

    Papamitsiou, Zacharoula; Economides, Anastasios A.

    2014-01-01

    This paper aims to provide the reader with a comprehensive background for understanding current knowledge on Learning Analytics (LA) and Educational Data Mining (EDM) and its impact on adaptive learning. It constitutes an overview of empirical evidence behind key objectives of the potential adoption of LA/EDM in generic educational strategic…

  4. From Log Files to Assessment Metrics: Measuring Students' Science Inquiry Skills Using Educational Data Mining

    Science.gov (United States)

    Gobert, Janice D.; Sao Pedro, Michael; Raziuddin, Juelaila; Baker, Ryan S.

    2013-01-01

    We present a method for assessing science inquiry performance, specifically for the inquiry skill of designing and conducting experiments, using educational data mining on students' log data from online microworlds in the Inq-ITS system (Inquiry Intelligent Tutoring System; www.inq-its.org). In our approach, we use a 2-step process: First we use…

  5. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    Science.gov (United States)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  6. An NCME Instructional Module on Data Mining Methods for Classification and Regression

    Science.gov (United States)

    Sinharay, Sandip

    2016-01-01

    Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…

  7. Evaluation of Documentation Patterns of Trainees and Supervising Physicians Using Data Mining.

    Science.gov (United States)

    Madhavan, Ramesh; Tang, Chi; Bhattacharya, Pratik; Delly, Fadi; Basha, Maysaa M

    2014-09-01

    The electronic health record (EHR) includes a rich data set that may offer opportunities for data mining and natural language processing to answer questions about quality of care, key aspects of resident education, or attributes of the residents' learning environment. We used data obtained from the EHR to report on inpatient documentation practices of residents and attending physicians at a large academic medical center. We conducted a retrospective observational study of deidentified patient notes entered over 7 consecutive months by a multispecialty university physician group at an urban hospital. A novel automated data mining technology was used to extract patient note-related variables. A sample of 26 802 consecutive patient notes was analyzed using the data mining and modeling tool Healthcare Smartgrid. Residents entered most of the notes (33%, 8178 of 24 787) between noon and 4 pm and 31% (7718 of 24 787) of notes between 8 am and noon. Attending physicians placed notes about teaching attestations within 24 hours in only 73% (17 843 of 24 443) of the records. Surgical residents were more likely to place notes before noon (P Data related to patient note entry was successfully used to objectively measure current work flow of resident physicians and their supervising faculty, and the findings have implications for physician oversight of residents' clinical work. We were able to demonstrate the utility of a data mining model as an assessment tool in graduate medical education.

  8. Early Prediction of Students' Grade Point Averages at Graduation: A Data Mining Approach

    Science.gov (United States)

    Tekin, Ahmet

    2014-01-01

    Problem Statement: There has recently been interest in educational databases containing a variety of valuable but sometimes hidden data that can be used to help less successful students to improve their academic performance. The extraction of hidden information from these databases often implements aspects of the educational data mining (EDM)…

  9. Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program

    Science.gov (United States)

    Yukselturk, Erman; Ozekes, Serhat; Turel, Yalin Kilic

    2014-01-01

    This study examined the prediction of dropouts through data mining approaches in an online program. The subject of the study was selected from a total of 189 students who registered to the online Information Technologies Certificate Program in 2007-2009. The data was collected through online questionnaires (Demographic Survey, Online Technologies…

  10. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    Science.gov (United States)

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  11. Myth Busting: Using Data Mining to Refute Link between Transfer Students and Retention Risk

    Science.gov (United States)

    McAleer, Brenda; Szakas, Joseph S.

    2010-01-01

    In the past few years, universities have become much more involved in outcomes assessment. Outside of the classroom analysis of learning outcomes, an investigation is performed into the use of current data mining tools to assess the issue of student retention within the Computer Information Systems (CIS) department. Utilizing both a historical…

  12. Building a Bridge or Digging a Pipeline? Clinical Data Mining in Evidence-Informed Knowledge Building

    Science.gov (United States)

    Epstein, Irwin

    2015-01-01

    Challenging the "bridge metaphor" theme of this conference, this article contends that current practice-research integration strategies are more like research-to-practice "pipelines." The purpose of this article is to demonstrate the potential of clinical data-mining studies conducted by practitioners, practitioner-oriented PhD…

  13. A Text Matching Method to Facilitate the Validation of Frequent Order Sets Obtained Through Data Mining

    OpenAIRE

    Che, Chengjian; Rocha, Roberto A.

    2006-01-01

    In order to compare order sets discovered using a data mining algorithm with existing order sets, we developed an order matching tool based on Oracle Text. The tool includes both automated searching and manual review processes. The comparison between the automated process and the manual review process indicates that the sensitivity of the automated matching is 81% and the specificity is 84%.

  14. Data-Mining Techniques in Detecting Factors Linked to Academic Achievement

    Science.gov (United States)

    Martínez Abad, Fernando; Chaparro Caso López, Alicia A.

    2017-01-01

    In light of the emergence of statistical analysis techniques based on data mining in education sciences, and the potential they offer to detect non-trivial information in large databases, this paper presents a procedure used to detect factors linked to academic achievement in large-scale assessments. The study is based on a non-experimental,…

  15. Data mining a functional neuroimaging database for functional segregation in brain regions

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup; Balslev, Daniela; Hansen, Lars Kai

    2006-01-01

    We describe a specialized neuroinformatic data mining technique in connection with a meta-analytic functional neuroimaging database: We mine for functional segregation within brain regions by identifying journal articles that report brain activations within the regions and clustering the abstract...

  16. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  17. Data mining scenarios for the discovery of subtypes and the comparison of algorithms

    NARCIS (Netherlands)

    Colas, Fabrice Pierre Robert

    2009-01-01

    A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug

  18. Role of Knowledge Management and Analytical CRM in Business: Data Mining Based Framework

    Science.gov (United States)

    Ranjan, Jayanthi; Bhatnagar, Vishal

    2011-01-01

    Purpose: The purpose of the paper is to provide a thorough analysis of the concepts of business intelligence (BI), knowledge management (KM) and analytical CRM (aCRM) and to establish a framework for integrating all the three to each other. The paper also seeks to establish a KM and aCRM based framework using data mining (DM) techniques, which…

  19. Data mining for water resource management part 2 - methods and approaches to solving contemporary problems

    Science.gov (United States)

    Roehl, Edwin A.; Conrads, Paul

    2010-01-01

    This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.

  20. A practitioners guide to resampling for data analysis, data mining, and modeling: A cookbook for starters

    NARCIS (Netherlands)

    van den Broek, Egon

    A practitioner’s guide to resampling for data analysis, data mining, and modeling provides a gentle and pragmatic introduction in the proposed topics. Its supporting Web site was offline and, hence, its potentially added value could not be verified. The book refrains from using advanced mathematics