WorldWideScience

Sample records for based decision-tree models

  1. MRI-based decision tree model for diagnosis of biliary atresia.

    Science.gov (United States)

    Kim, Yong Hee; Kim, Myung-Joon; Shin, Hyun Joo; Yoon, Haesung; Han, Seok Joo; Koh, Hong; Roh, Yun Ho; Lee, Mi-Jung

    2018-02-23

    To evaluate MRI findings and to generate a decision tree model for diagnosis of biliary atresia (BA) in infants with jaundice. We retrospectively reviewed features of MRI and ultrasonography (US) performed in infants with jaundice between January 2009 and June 2016 under approval of the institutional review board, including the maximum diameter of periportal signal change on MRI (MR triangular cord thickness, MR-TCT) or US (US-TCT), visibility of common bile duct (CBD) and abnormality of gallbladder (GB). Hepatic subcapsular flow was reviewed on Doppler US. We performed conditional inference tree analysis using MRI findings to generate a decision tree model. A total of 208 infants were included, 112 in the BA group and 96 in the non-BA group. Mean age at the time of MRI was 58.7 ± 36.6 days. Visibility of CBD, abnormality of GB and MR-TCT were good discriminators for the diagnosis of BA and the MRI-based decision tree using these findings with MR-TCT cut-off 5.1 mm showed 97.3 % sensitivity, 94.8 % specificity and 96.2 % accuracy. MRI-based decision tree model reliably differentiates BA in infants with jaundice. MRI can be an objective imaging modality for the diagnosis of BA. • MRI-based decision tree model reliably differentiates biliary atresia in neonatal cholestasis. • Common bile duct, gallbladder and periportal signal changes are the discriminators. • MRI has comparable performance to ultrasonography for diagnosis of biliary atresia.

  2. Decision tree based knowledge acquisition and failure diagnosis using a PWR loop vibration model

    International Nuclear Information System (INIS)

    Bauernfeind, V.; Ding, Y.

    1993-01-01

    An analytical vibration model of the primary system of a 1300 MW PWR was used for simulating mechanical faults. Deviations in the calculated power density spectra and coherence functions are determined and classified. The decision tree technique is then used for a personal computer supported knowledge presentation and for optimizing the logical relationships between the simulated faults and the observed symptoms. The optimized decision tree forms the knowledge base and can be used to diagnose known cases as well as to include new data into the knowledge base if new faults occur. (author)

  3. Decision tree modeling using R.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  4. Data-Mining-Based Coronary Heart Disease Risk Prediction Model Using Fuzzy Logic and Decision Tree.

    Science.gov (United States)

    Kim, Jaekwon; Lee, Jongsik; Lee, Youngho

    2015-07-01

    The importance of the prediction of coronary heart disease (CHD) has been recognized in Korea; however, few studies have been conducted in this area. Therefore, it is necessary to develop a method for the prediction and classification of CHD in Koreans. A model for CHD prediction must be designed according to rule-based guidelines. In this study, a fuzzy logic and decision tree (classification and regression tree [CART])-driven CHD prediction model was developed for Koreans. Datasets derived from the Korean National Health and Nutrition Examination Survey VI (KNHANES-VI) were utilized to generate the proposed model. The rules were generated using a decision tree technique, and fuzzy logic was applied to overcome problems associated with uncertainty in CHD prediction. The accuracy and receiver operating characteristic (ROC) curve values of the propose systems were 69.51% and 0.594, proving that the proposed methods were more efficient than other models.

  5. Comprehensive decision tree models in bioinformatics.

    Science.gov (United States)

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly

  6. Comprehensive decision tree models in bioinformatics.

    Directory of Open Access Journals (Sweden)

    Gregor Stiglic

    Full Text Available PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets

  7. Classifiability-based omnivariate decision trees.

    Science.gov (United States)

    Li, Yuanhong; Dong, Ming; Kothari, Ravi

    2005-11-01

    Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle each of the resulting partitions and the process continues. A node is considered terminal if it satisfies some stopping criteria (for example, purity, i.e., all patterns at the node are from a single class). Decision trees may be univariate, linear multivariate, or nonlinear multivariate depending on whether a single attribute, a linear function of all the attributes, or a nonlinear function of all the attributes is used for the partitioning at each node of the decision tree. Though nonlinear multivariate decision trees are the most powerful, they are more susceptible to the risks of overfitting. In this paper, we propose to perform model selection at each decision node to build omnivariate decision trees. The model selection is done using a novel classifiability measure that captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of the subproblem at each node. The proposed approach is fast and does not suffer from as high a computational burden as that incurred by typical model selection algorithms. Empirical results over 26 data sets indicate that our approach is faster and achieves better classification accuracy compared to statistical model select algorithms.

  8. Combined prediction model for supply risk in nuclear power equipment manufacturing industry based on support vector machine and decision tree

    International Nuclear Information System (INIS)

    Shi Chunsheng; Meng Dapeng

    2011-01-01

    The prediction index for supply risk is developed based on the factor identifying of nuclear equipment manufacturing industry. The supply risk prediction model is established with the method of support vector machine and decision tree, based on the investigation on 3 important nuclear power equipment manufacturing enterprises and 60 suppliers. Final case study demonstrates that the combination model is better than the single prediction model, and demonstrates the feasibility and reliability of this model, which provides a method to evaluate the suppliers and measure the supply risk. (authors)

  9. A decision-tree-based model for evaluating the thermal comfort of horses

    Directory of Open Access Journals (Sweden)

    Ana Paula de Assis Maia

    2013-12-01

    Full Text Available Thermal comfort is of great importance in preserving body temperature homeostasis during thermal stress conditions. Although the thermal comfort of horses has been widely studied, there is no report of its relationship with surface temperature (T S. This study aimed to assess the potential of data mining techniques as a tool to associate surface temperature with thermal comfort of horses. T S was obtained using infrared thermography image processing. Physiological and environmental variables were used to define the predicted class, which classified thermal comfort as "comfort" and "discomfort". The variables of armpit, croup, breast and groin T S of horses and the predicted classes were then subjected to a machine learning process. All variables in the dataset were considered relevant for the classification problem and the decision-tree model yielded an accuracy rate of 74 %. The feature selection methods used to reduce computational cost and simplify predictive learning decreased model accuracy to 70 %; however, the model became simpler with easily interpretable rules. For both these selection methods and for the classification using all attributes, armpit and breast T S had a higher power rating for predicting thermal comfort. Data mining techniques show promise in the discovery of new variables associated with the thermal comfort of horses.

  10. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  11. Detecting Structural Metadata with Decision Trees and Transformation-Based Learning

    National Research Council Canada - National Science Library

    Kim, Joungbum; Schwarm, Sarah E; Ostendorf, Mari

    2004-01-01

    .... Specifically, combinations of decision trees and language models are used to predict sentence ends and interruption points and given these events transformation based learning is used to detect edit...

  12. EEG feature selection method based on decision tree.

    Science.gov (United States)

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.

  13. Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.

    Science.gov (United States)

    Beck, Kirk A.

    2005-01-01

    This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…

  14. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev

    2010-01-01

    measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186) were developed using (1) all of the parameters, (2) the primary DEM-derived topographic (morphological/hydrological) parameters only, (3) selected pairs of parameters and (4) excluding......Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic...... distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index...

  15. River flow modelling using fuzzy decision trees

    NARCIS (Netherlands)

    Han, D.; Cluckie, I. D.; Karbassioun, D.; Lawry, J.; Krauskopf, B.

    2002-01-01

    A modern real time flood forecasting system requires its mathematical model(s) to handle highly complex rainfall runoff processes. Uncertainty in real time flood forecasting will involve a variety of components such as measurement noise from telemetry systems, inadequacy of the models, insufficiency

  16. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    Science.gov (United States)

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value.

  17. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    Science.gov (United States)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  18. PCA based feature reduction to improve the accuracy of decision tree c4.5 classification

    Science.gov (United States)

    Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.

    2018-03-01

    Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.

  19. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  20. Decision tree models for data mining in hit discovery.

    Science.gov (United States)

    Hammann, Felix; Drewe, Juergen

    2012-04-01

    Decision tree induction (DTI) is a powerful means of modeling data without much prior preparation. Models are readable by humans, robust and easily applied in real-world applications, features that are mutually exclusive in other commonly used machine learning paradigms. While DTI is widely used in disciplines ranging from economics to medicine, they are an intriguing option in pharmaceutical research, especially when dealing with large data stores. This review covers the automated technologies available for creating decision trees and other rules efficiently, even from large datasets such as chemical libraries. The authors discuss the need for properly documented and validated models. Lastly, the authors cover several case studies in hit discovery, drug metabolism and toxicology, and drug surveillance, and compare them with other established techniques. DTI is a competitive and easy-to-use tool in basic research as well as in hit and drug discovery. Its strengths lie in its ability to handle all sorts of different data formats, the visual nature of the models, and the small computational effort needed for implementation in real-world systems. Limitations include lack of robustness and over-fitted models for certain types of data. As with any modeling technique, proper validation and quality measures are of utmost importance. © 2012 Informa UK, Ltd.

  1. Spatial soil zinc content distribution from terrain parameters: a GIS-based decision-tree model in Lebanon.

    Science.gov (United States)

    Bou Kheir, Rania; Greve, Mogens H; Abdallah, Chadi; Dalgaard, Tommy

    2010-02-01

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. Copyright (c) 2009 Elsevier Ltd. All rights reserved.

  2. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  3. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    Science.gov (United States)

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  4. Case Study on High Dimensional Data Analysis Using Decision Tree Model

    OpenAIRE

    Smitha.T; V.Sundaram

    2012-01-01

    The major aspire of this paper is to build a model to predict the chances of occurrences of disease in an area. This paper mainly concentrating the data mining technique-Decision tree model to identify the significant parameters for prediction process. The decision tree model created with the help of ID3 algorithm.

  5. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

  6. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  7. Cost-effectiveness of rabies post-exposure prophylaxis in the context of very low rabies risk: A decision-tree model based on the experience of France.

    Science.gov (United States)

    Ribadeau Dumas, Florence; N'Diaye, Dieynaba S; Paireau, Juliette; Gautret, Philippe; Bourhy, Hervé; Le Pen, Claude; Yazdanpanah, Yazdan

    2015-05-11

    Benefit-risk of different anti-rabies post-exposure prophylaxis (PEP) strategies after scratches or bites from dogs with unknown rabies status is unknown in very low rabies risk settings. A cost-effectiveness analysis in metropolitan France using a decision-tree model and input data from 2001 to 2011. A cohort of 2807 patients, based on the mean annual number of patients exposed to category CII (minor scratches) or CIII (transdermal bite) dog attacks in metropolitan France between 2001 and 2011. Five PEP strategies: (A) no PEP for CII and CIII; (B) vaccine only for CIII; (C) vaccine for CII and CIII; (D) vaccine+ rabies immunoglobulin (RIG) only for CIII; and (E) vaccine for CII and vaccine+ RIG for CIII. The number of deaths related to rabies and to traffic accidents on the way to anti-rabies centers (ARC), effectiveness in terms of years of life gained by reducing rabies cases and avoiding traffic accidents, costs, and incremental cost-effectiveness ratios (ICER) associated with each strategy. Strategy E led to the fewest rabies cases (3.6 × 10(-8)) and the highest costs (€ 1,606,000) but also to 1.7 × 10(-3) lethal traffic accidents. Strategy A was associated with the most rabies cases (4.8 × 10(-6)), but the risk of traffic accidents and costs were null; therefore, strategy A was the most effective and the least costly. The sensitivity analysis showed that, when the probability that a given dog is rabid a given day (PA) was > 1.4 × 10(-6), strategy D was more effective than strategy A; strategy B became cost-effective (i.e. ICER vs strategy A 1 .4 × 10(-4). In the metropolitan France's very low rabies prevalence context, PEP with rabies vaccine, administered alone or with RIG, is associated with significant and unnecessary costs and unfavourable benefit-risk ratios regardless to exposure category. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Simple Prediction of Type 2 Diabetes Mellitus via Decision Tree Modeling

    Directory of Open Access Journals (Sweden)

    Mehrab Sayadi

    2017-06-01

    Full Text Available Background: Type 2 Diabetes Mellitus (T2DM is one of the most important risk factors in cardiovascular disorders considered as a common clinical and public health problem. Early diagnosis can reduce the burden of the disease. Decision tree, as an advanced data mining method, can be used as a reliable tool to predict T2DM. Objectives: This study aimed to present a simple model for predicting T2DM using decision tree modeling. Materials and Methods: This analytical model-based study used a part of the cohort data obtained from a database in Healthy Heart House of Shiraz, Iran. The data included routine information, such as age, gender, Body Mass Index (BMI, family history of diabetes, and systolic and diastolic blood pressure, which were obtained from the individuals referred for gathering baseline data in Shiraz cohort study from 2014 to 2015. Diabetes diagnosis was used as binary datum. Decision tree technique and J48 algorithm were applied using the WEKA software (version 3.7.5, New Zealand. Additionally, Receiver Operator Characteristic (ROC curve and Area Under Curve (AUC were used for checking the goodness of fit. Results: The age of the 11302 cases obtained after data preparation ranged from 18 to 89 years with the mean age of 48.1 ± 11.4 years. Additionally, 51.1% of the cases were male. In the tree structure, blood pressure and age were placed where most information was gained. In our model, however, gender was not important and was placed on the final branch of the tree. Total precision and AUC were 87% and 89%, respectively. This indicated that the model had good accuracy for distinguishing patients from normal individuals. Conclusions: The results showed that T2DM could be predicted via decision tree model without laboratory tests. Thus, this model can be used in pre-clinical and public health screening programs.

  9. Application of Decision-Tree Model to Groundwater Productivity-Potential Mapping

    Directory of Open Access Journals (Sweden)

    Saro Lee

    2015-09-01

    Full Text Available For the sustainable use of groundwater, this study analyzed groundwater productivity-potential using a decision-tree approach in a geographic information system (GIS in Boryeong and Pohang cities, Korea. The model was based on the relationship between groundwater-productivity data, including specific capacity (SPC, and its related hydrogeological factors. SPC data which is measured and calculated for groundwater productivity and data about related factors, including topography, lineament, geology, forest and soil data, were collected and input into a spatial database. A decision-tree model was applied and decision trees were constructed using the chi-squared automatic interaction detector (CHAID and the quick, unbiased, and efficient statistical tree (QUEST algorithms. The resulting groundwater-productivity-potential (GPP maps were validated using area-under-the-curve (AUC analysis with the well data that had not been used for training the model. In the Boryeong city, the CHAID and QUEST algorithms had accuracies of 83.31% and 79.47%, and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 86.18% and 80.00%. As another validation, the GPP maps were validated by comparing the actual SPC data. As the result, in the Boryeong city, the CHAID and QUEST algorithms had accuracies of 96.55% and 94.92% and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 87.88% and 87.50%. These results indicate that decision-tree models can be useful for development of groundwater resources.

  10. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  11. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models.

    Science.gov (United States)

    Wang, Jing; Li, Man; Hu, Yun-tao; Zhu, Yu

    2009-09-14

    In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN) and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608) and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806). The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.

  12. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models

    Directory of Open Access Journals (Sweden)

    Hu Yun-tao

    2009-09-01

    Full Text Available Abstract Background In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Methods Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. Results After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608 and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806. The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. Conclusion ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.

  13. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study.

    Science.gov (United States)

    Ramezankhani, Azra; Hadavandi, Esmaeil; Pournik, Omid; Shahrabi, Jamal; Azizi, Fereidoun; Hadaegh, Farzad

    2016-12-01

    The current study was undertaken for use of the decision tree (DT) method for development of different prediction models for incidence of type 2 diabetes (T2D) and for exploring interactions between predictor variables in those models. Prospective cohort study. Tehran Lipid and Glucose Study (TLGS). A total of 6647 participants (43.4% men) aged >20 years, without T2D at baselines ((1999-2001) and (2002-2005)), were followed until 2012. 2 series of models (with and without 2-hour postchallenge plasma glucose (2h-PCPG)) were developed using 3 types of DT algorithms. The performances of the models were assessed using sensitivity, specificity, area under the ROC curve (AUC), geometric mean (G-Mean) and F-Measure. T2D was primary outcome which defined if fasting plasma glucose (FPG) was ≥7 mmol/L or if the 2h-PCPG was ≥11.1 mmol/L or if the participant was taking antidiabetic medication. During a median follow-up of 9.5 years, 729 new cases of T2D were identified. The Quick Unbiased Efficient Statistical Tree (QUEST) algorithm had the highest sensitivity and G-Mean among all the models for men and women. The models that included 2h-PCPG had sensitivity and G-Mean of (78% and 0.75%) and (78% and 0.78%) for men and women, respectively. Both models achieved good discrimination power with AUC above 0.78. FPG, 2h-PCPG, waist-to-height ratio (WHtR) and mean arterial blood pressure (MAP) were the most important factors to incidence of T2D in both genders. Among men, those with an FPG≤4.9 mmol/L and 2h-PCPG≤7.7 mmol/L had the lowest risk, and those with an FPG>5.3 mmol/L and 2h-PCPG>4.4 mmol/L had the highest risk for T2D incidence. In women, those with an FPG≤5.2 mmol/L and WHtR≤0.55 had the lowest risk, and those with an FPG>5.2 mmol/L and WHtR>0.56 had the highest risk for T2D incidence. Our study emphasises the utility of DT for exploring interactions between predictor variables. Published by the BMJ Publishing Group Limited. For permission

  14. Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites.

    Science.gov (United States)

    Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael

    2015-08-01

    The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. © 2015 Society for Conservation Biology.

  15. The risk factors of laryngeal pathology in Korean adults using a decision tree model.

    Science.gov (United States)

    Byeon, Haewon

    2015-01-01

    The purpose of this study was to identify risk factors affecting laryngeal pathology in the Korean population and to evaluate the derived prediction model. Cross-sectional study. Data were drawn from the 2008 Korea National Health and Nutritional Examination Survey. The subjects were 3135 persons (1508 male and 2114 female) aged 19 years and older living in the community. The independent variables were age, sex, occupation, smoking, alcohol drinking, and self-reported voice problems. A decision tree analysis was done to identify risk factors for predicting a model of laryngeal pathology. The significant risk factors of laryngeal pathology were age, gender, occupation, smoking, and self-reported voice problem in decision tree model. Four significant paths were identified in the decision tree model for the prediction of laryngeal pathology. Those identified as high risk groups for laryngeal pathology included those who self-reported a voice problem, those who were males in their 50s who did not recognize a voice problem, those who were not economically active males in their 40s, and male workers aged 19 and over and under 50 or 60 and over who currently smoked. The results of this study suggest that individual risk factors, such as age, sex, occupation, health behavior, and self-reported voice problem, affect the onset of laryngeal pathology in a complex manner. Based on the results of this study, early management of the high-risk groups is needed for the prevention of laryngeal pathology. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  16. Decision-tree model of treatment-seeking behaviors after detecting symptoms by Korean stroke patients.

    Science.gov (United States)

    Oh, Hyo-Sook; Park, Hyeoun-Ae

    2006-06-01

    This study was performed to develop and test a decision-tree model of treatment-seeking behaviors about when Korean patients visit a doctor after experiencing stroke symptoms. The study used methodological triangulation. The model was developed based on qualitative data collected from in-depth interviews with 18 stroke patients. The model was tested using quantitative data collected from interviews and a structured questionnaire involving 150 stroke patients. The predictability of the decision-tree model was quantified as the proportion of participants who followed the pathway predicted by the model. Decision outcomes of the model were categorized into immediate and delayed treatment-seeking behavior. The model was influenced by lowered consciousness, social-group influences, perceived seriousness of symptoms, past history of hypertension or stroke, and barriers to hospital visits. The predictability of the model was found to be 90.7%. The results from this study can help healthcare personnel understand the education needs of stroke patients regarding treatment-seeking behaviors, and hence aid in the development of educational strategies for stroke patients.

  17. Improvement of adequate use of warfarin for the elderly using decision tree-based approaches.

    Science.gov (United States)

    Liu, K E; Lo, C-L; Hu, Y-H

    2014-01-01

    Due to the narrow therapeutic range and high drug-to-drug interactions (DDIs), improving the adequate use of warfarin for the elderly is crucial in clinical practice. This study examines whether the effectiveness of using warfarin among elderly inpatients can be improved when machine learning techniques and data from the laboratory information system are incorporated. Having employed 288 validated clinical cases in the DDI group and 89 cases in the non-DDI group, we evaluate the prediction performance of seven classification techniques, with and without an Adaptive Boosting (AdaBoost) algorithm. Measures including accuracy, sensitivity, specificity and area under the curve are used to evaluate model performance. Decision tree-based classifiers outperform other investigated classifiers in all evaluation measures. The classifiers supplemented with AdaBoost can generally improve the performance. In addition, weight, congestive heart failure, and gender are among the top three critical variables affecting prediction accuracy for the non-DDI group, while age, ALT, and warfarin doses are the most influential factors for the DDI group. Medical decision support systems incorporating decision tree-based approaches improve predicting performance and thus may serve as a supplementary tool in clinical practice. Information from laboratory tests and inpatients' history should not be ignored because related variables are shown to be decisive in our prediction models, especially when the DDIs exist.

  18. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    Science.gov (United States)

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.

  19. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  20. Reanalysis and External Validation of a Decision Tree Model for Detecting Unrecognized Diabetes in Rural Chinese Individuals

    OpenAIRE

    Xin, Zhong; Hua, Lin; Wang, Xu-Hong; Zhao, Dong; Yu, Cai-Guo; Ma, Ya-Hong; Zhao, Lei; Cao, Xi; Yang, Jin-Kui

    2017-01-01

    We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detect...

  1. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  2. Decision trees in epidemiological research

    Directory of Open Access Journals (Sweden)

    Ashwini Venkatasubramaniam

    2017-09-01

    Full Text Available Abstract Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART technique and the newer Conditional Inference tree (CTree technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  3. Decision trees in epidemiological research.

    Science.gov (United States)

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  4. Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data

    Directory of Open Access Journals (Sweden)

    Esther I. Metting

    2016-01-01

    Full Text Available The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215. Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma–COPD overlap syndrome (ACOS patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%. Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool.

  5. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

    Science.gov (United States)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-07

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  6. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    Science.gov (United States)

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. A ROUGH SET DECISION TREE BASED MLP-CNN FOR VERY HIGH RESOLUTION REMOTELY SENSED IMAGE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2017-09-01

    Full Text Available Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP, which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  8. a Rough Set Decision Tree Based Mlp-Cnn for Very High Resolution Remotely Sensed Image Classification

    Science.gov (United States)

    Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.

    2017-09-01

    Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  9. OmniGA: Optimized Omnivariate Decision Trees for Generalizable Classification Models

    KAUST Repository

    Magana-Mora, Arturo

    2017-06-14

    Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.

  10. Modeling flash floods in ungauged mountain catchments of China: A decision tree learning approach for parameter regionalization

    Science.gov (United States)

    Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.

    2017-12-01

    Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. Since sub-daily streamflow information is unavailable for most small basins in China, one of the main challenges is finding appropriate parameter values for simulating flash floods in ungauged catchments. In this study, we use decision tree learning to explore parameter set transferability between different catchments. For this purpose, the physically-based, semi-distributed rainfall-runoff model PRMS-OMS is set up for 35 catchments in ten Chinese provinces. Hourly data from more than 800 storm runoff events are used to calibrate the model and evaluate the performance of parameter set transfers between catchments. For each catchment, 58 catchment attributes are extracted from several data sets available for whole China. We then use a data mining technique (decision tree learning) to identify catchment similarities that can be related to good transfer performance. Finally, we use the splitting rules of decision trees for finding suitable donor catchments for ungauged target catchments. We show that decision tree learning allows to optimally utilize the information content of available catchment descriptors and outperforms regionalization based on a conventional measure of physiographic-climatic similarity by 15%-20%. Similar performance can be achieved with a regionalization method based on spatial proximity, but decision trees offer flexible rules for selecting suitable donor catchments, not relying on the vicinity of gauged catchments. This flexibility makes the method particularly suitable for implementation in sparsely gauged environments. We evaluate the probability to detect flood events exceeding a given return period, considering measured discharge and PRMS-OMS simulated flows with regionalized parameters

  11. Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.

    Science.gov (United States)

    Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin

    2017-08-16

    The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.

  12. An Assessment for A Filtered Containment Venting Strategy Using Decision Tree Models

    International Nuclear Information System (INIS)

    Shin, Hoyoung; Jae, Moosung

    2016-01-01

    In this study, a probabilistic assessment of the severe accident management strategy through a filtered containment venting system was performed by using decision tree models. In Korea, the filtered containment venting system has been installed for the first time in Wolsong unit 1 as a part of Fukushima follow-up steps, and it is planned to be applied gradually for all the remaining reactors. Filtered containment venting system, one of severe accident countermeasures, prevents a gradual pressurization of the containment building exhausting noncondensable gas and vapor to the outside of the containment building. In this study, a probabilistic assessment of the filtered containment venting strategy, one of the severe accident management strategies, was performed by using decision tree models. Containment failure frequencies of each decision were evaluated by the developed decision tree model. The optimum accident management strategies were evaluated by comparing the results. Various strategies in severe accident management guidelines (SAMG) could be improved by utilizing the methodology in this study and the offsite risk analysis methodology

  13. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    Science.gov (United States)

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.

  14. Predicting Lung Radiotherapy-Induced Pneumonitis Using a Model Combining Parametric Lyman Probit With Nonparametric Decision Trees

    International Nuclear Information System (INIS)

    Das, Shiva K.; Zhou Sumin; Zhang, Junan; Yin, F.-F.; Dewhirst, Mark W.; Marks, Lawrence B.

    2007-01-01

    Purpose: To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. Methods and Materials: The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a 'boosting' process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. Results: The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Conclusions: Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation

  15. Predicting lung radiotherapy-induced pneumonitis using a model combining parametric Lyman probit with nonparametric decision trees.

    Science.gov (United States)

    Das, Shiva K; Zhou, Sumin; Zhang, Junan; Yin, Fang-Fang; Dewhirst, Mark W; Marks, Lawrence B

    2007-07-15

    To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a "boosting" process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation.

  16. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe

    2013-01-01

    and adopts a methodology of importance sampling to maximize the information contained in the database so as to increase the accuracy of DT. Further, this paper also studies the effectiveness of DT by implementing its corresponding preventive control schemes. These approaches are tested on the detailed model......Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time...

  17. SITUATIONAL CONTROL OF HOT BLAST STOVES GROUP BASED ON DECISION TREE

    Directory of Open Access Journals (Sweden)

    E. I. Kobysh

    2016-09-01

    Full Text Available In this paper was developed the control system of group of hot blast stoves, which operates on the basis of the packing heating control subsystem and subsystem of forecasting of modes duration in the hot blast stoves APCS of iron smelting in a blast furnace. With the use of multi-criteria optimization methods, implemented the adjustment of control system conduct, which takes into account the current production situation that has arisen in the course of the heating packing of each hot blast stove group. Developed a situation recognition algorithm and the choice of scenarios of control based on a decision tree.

  18. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    OpenAIRE

    Meisam Azad-Manjiri

    2014-01-01

    Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory) and uses decision tree a...

  19. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe

    2013-01-01

    Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented......The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... significant impact on dynamic security of Danish power system in future, if alternative measures are not considered seriously....

  20. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  1. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  2. Application of a hybrid association rules/decision tree model for drought monitoring

    Science.gov (United States)

    Nourani, Vahid; Molajou, Amir

    2017-12-01

    The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.

  3. Data Fusion Research of Triaxial Human Body Motion Gesture based on Decision Tree

    Directory of Open Access Journals (Sweden)

    Feihong Zhou

    2014-05-01

    Full Text Available The development status of human body motion gesture data fusion domestic and overseas has been analyzed. A triaxial accelerometer is adopted to develop a wearable human body motion gesture monitoring system aimed at old people healthcare. On the basis of a brief introduction of decision tree algorithm, the WEKA workbench is adopted to generate a human body motion gesture decision tree. At last, the classification quality of the decision tree has been validated through experiments. The experimental results show that the decision tree algorithm could reach an average predicting accuracy of 97.5 % with lower time cost.

  4. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran.

    Science.gov (United States)

    Khosravi, Khabat; Pham, Binh Thai; Chapi, Kamran; Shirzadi, Ataollah; Shahabi, Himan; Revhaug, Inge; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Mapping mangrove forests using multi-tidal remotely-sensed data and a decision-tree-based procedure

    Science.gov (United States)

    Zhang, Xuehong; Treitz, Paul M.; Chen, Dongmei; Quan, Chang; Shi, Lixin; Li, Xinhui

    2017-10-01

    Mangrove forests grow in intertidal zones in tropical and subtropical regions and have suffered a dramatic decline globally over the past few decades. Remote sensing data, collected at various spatial resolutions, provide an effective way to map the spatial distribution of mangrove forests over time. However, the spectral signatures of mangrove forests are significantly affected by tide levels. Therefore, mangrove forests may not be accurately mapped with remote sensing data collected during a single-tidal event, especially if not acquired at low tide. This research reports how a decision-tree -based procedure was developed to map mangrove forests using multi-tidal Landsat 5 Thematic Mapper (TM) data and a Digital Elevation Model (DEM). Three indices, including the Normalized Difference Moisture Index (NDMI), the Normalized Difference Vegetation Index (NDVI) and NDVIL·NDMIH (the multiplication of NDVIL by NDMIH, L: low tide level, H: high tide level) were used in this algorithm to differentiate mangrove forests from other land-cover and land-use types in Fangchenggang City, China. Additionally, the recent Landsat 8 OLI (Operational Land Imager) data were selected to validate the results and compare if the methodology is reliable. The results demonstrate that short-term multi-tidal remotely-sensed data better represent the unique nearshore coastal wetland habitats of mangrove forests than single-tidal data. Furthermore, multi-tidal remotely-sensed data has led to improved accuracies using two classification approaches: i.e. decision trees and the maximum likelihood classification (MLC). Since mangrove forests are typically found at low elevations, the inclusion of elevation data in the two classification procedures was tested. Given the decision-tree method does not assume strict data distribution parameters, it was able to optimize the application of multi-tidal and elevation data, resulting in higher classification accuracies of mangrove forests. When using multi

  6. Mapping potential carbon and timber losses from hurricanes using a decision tree and ecosystem services driver model.

    Science.gov (United States)

    Delphin, S; Escobedo, F J; Abd-Elrahman, A; Cropper, W

    2013-11-15

    Information on the effect of direct drivers such as hurricanes on ecosystem services is relevant to landowners and policy makers due to predicted effects from climate change. We identified forest damage risk zones due to hurricanes and estimated the potential loss of 2 key ecosystem services: aboveground carbon storage and timber volume. Using land cover, plot-level forest inventory data, the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, and a decision tree-based framework; we determined potential damage to subtropical forests from hurricanes in the Lower Suwannee River (LS) and Pensacola Bay (PB) watersheds in Florida, US. We used biophysical factors identified in previous studies as being influential in forest damage in our decision tree and hurricane wind risk maps. Results show that 31% and 0.5% of the total aboveground carbon storage in the LS and PB, respectively was located in high forest damage risk (HR) zones. Overall 15% and 0.7% of the total timber net volume in the LS and PB, respectively, was in HR zones. This model can also be used for identifying timber salvage areas, developing ecosystem service provision and management scenarios, and assessing the effect of other drivers on ecosystem services and goods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    Science.gov (United States)

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  8. Cost-effectiveness of a new rotavirus vaccination program in Pakistan: a decision tree model.

    Science.gov (United States)

    Patel, Hiten D; Roberts, Eric T; Constenla, Dagna O

    2013-12-09

    Rotavirus gastroenteritis places a significant health and economic burden on Pakistan. To determine the public health impact of a national rotavirus vaccination program, we performed a cost-effectiveness study from the perspective of the health care system. A decision tree model was developed to assess the cost-effectiveness of a national vaccination program in Pakistan. Disease and cost burden with the program were compared to the current state. Disease parameters, vaccine-related costs, and medical treatment costs were based on published epidemiological and economic data, which were specific to Pakistan when possible. An annual birth cohort of children was followed for 5 years to model the public health impact of vaccination on health-related events and costs. The cost-effectiveness was assessed and quantified in cost (2012 US$) per disability-adjusted life-year (DALY) averted and cost per death averted. Sensitivity analyses were performed to assess the robustness of the incremental cost-effectiveness ratios (ICERs). The base case results showed vaccination prevented 1.2 million cases of rotavirus gastroenteritis, 93,000 outpatient visits, 43,000 hospitalizations, and 6700 deaths by 5 years of age for an annual birth cohort scaled from 6% current coverage to DPT3 levels (85%). The medical cost savings would be US$1.4 million from hospitalizations and US$200,000 from outpatient visit costs. The vaccination program would cost US$35 million at a vaccine price of US$5.00. The ICER was US$149.50 per DALY averted or US$4972 per death averted. Sensitivity analyses showed changes in case-fatality ratio, vaccine efficacy, and vaccine cost exerted the greatest influence on the ICER. Across a range of sensitivity analyses, a national rotavirus vaccination program was predicted to decrease health and economic burden due to rotavirus gastroenteritis in Pakistan by ~40%. Vaccination was highly cost-effective in this context. As discussions of implementing the intervention

  9. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  10. Intrusion Detection System Based on Decision Tree over Big Data in Fog Environment

    Directory of Open Access Journals (Sweden)

    Kai Peng

    2018-01-01

    Full Text Available Fog computing, as the supplement of cloud computing, can provide low-latency services between mobile users and the cloud. However, fog devices may encounter security challenges as a result of the fog nodes being close to the end users and having limited computing ability. Traditional network attacks may destroy the system of fog nodes. Intrusion detection system (IDS is a proactive security protection technology and can be used in the fog environment. Although IDS in tradition network has been well investigated, unfortunately directly using them in the fog environment may be inappropriate. Fog nodes produce massive amounts of data at all times, and, thus, enabling an IDS system over big data in the fog environment is of paramount importance. In this study, we propose an IDS system based on decision tree. Firstly, we propose a preprocessing algorithm to digitize the strings in the given dataset and then normalize the whole data, to ensure the quality of the input data so as to improve the efficiency of detection. Secondly, we use decision tree method for our IDS system, and then we compare this method with Naïve Bayesian method as well as KNN method. Both the 10% dataset and the full dataset are tested. Our proposed method not only completely detects four kinds of attacks but also enables the detection of twenty-two kinds of attacks. The experimental results show that our IDS system is effective and precise. Above all, our IDS system can be used in fog computing environment over big data.

  11. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Science.gov (United States)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  12. Modeling flash floods in ungauged mountain catchments of China: A decision tree learning approach for parameter regionalization

    Science.gov (United States)

    Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L.

    2017-12-01

    Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. One of the main challenges of setting up such a system is finding appropriate model parameter values for ungauged catchments. Previous studies have shown that the transfer of parameter sets from hydrologically similar gauged catchments is one of the best performing regionalization methods. However, a remaining key issue is the identification of suitable descriptors of similarity. In this study, we use decision tree learning to explore parameter set transferability in the full space of catchment descriptors. For this purpose, a semi-distributed rainfall-runoff model is set up for 35 catchments in ten Chinese provinces. Hourly runoff data from in total 858 storm events are used to calibrate the model and to evaluate the performance of parameter set transfers between catchments. We then present a novel technique that uses the splitting rules of classification and regression trees (CART) for finding suitable donor catchments for ungauged target catchments. The ability of the model to detect flood events in assumed ungauged catchments is evaluated in series of leave-one-out tests. We show that CART analysis increases the probability of detection of 10-year flood events in comparison to a conventional measure of physiographic-climatic similarity by up to 20%. Decision tree learning can outperform other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Spatial proximity can be used as a selection criteria but is skipped in the case where no similar gauged catchments are in the vicinity. We conclude that the CART regionalization concept is particularly suitable for implementation in sparsely gauged and topographically complex environments where a proximity-based

  13. A composition theorem for decision tree complexity

    OpenAIRE

    Montanaro, Ashley

    2013-01-01

    We completely characterise the complexity in the decision tree model of computing composite relations of the form h = g(f^1,...,f^n), where each relation f^i is boolean-valued. Immediate corollaries include a direct sum theorem for decision tree complexity and a tight characterisation of the decision tree complexity of iterated boolean functions.

  14. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  15. [A prediction model for internet game addiction in adolescents: using a decision tree analysis].

    Science.gov (United States)

    Kim, Ki Sook; Kim, Kyung Hee

    2010-06-01

    This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet café, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.

  16. Termination of pregnancy for fetal abnormalities: main arguments and a decision-tree model.

    Science.gov (United States)

    Kose, Semir; Altunyurt, Sabahattin; Yıldırım, Nuri; Keskinoğlu, Pembe; Çankaya, Tufan; Bora, Elçin; Erçal, Derya; Özer, Erdener

    2015-11-01

    By looking through our ethical committee cases, we demonstrate the main arguments we use for making a judgment in face of fetal abnormalities. Our decision making model is a simplified algorithm of the arguments and concepts we use in scientific-ethic discussion. A retrospective analysis was conducted from single, tertiary referral center of patients evaluated for fetal abnormalities from 2004 to 2014. We hypothesized that all our judgments would fit into a decision-tree model. 553 fetal abnormality cases were discussed, 348 (63%) were given termination of pregnancy (TOP) proposal. When detected genetic disorders (n:100) and with mental retardation risk (n:93) ended up with TOP proposal. For incompatibility with life cases (n:111) and the multimorbidity cases (n:44) the committee suggest TOP, regardless of gestational age. The highest family approval ratios were in chromosomal abnormalities/genetic disorders group (93%), and the lowest figures were in mental retardation risk group (80%). Continuously changing literature on prenatal and postnatal therapy options and the long term outcome of various fetal abnormalities influence committee decisions. Theoretical high success rates and inconsistent data on long term prognosis of some anomaly groups resulted in heterogenous decisions and various approval ratios. © 2015 John Wiley & Sons, Ltd.

  17. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  18. An application based on the decision tree to classify the marbling of beef by hyperspectral imaging.

    Science.gov (United States)

    Velásquez, Lía; Cruz-Tirado, J P; Siche, Raúl; Quevedo, Roberto

    2017-11-01

    The aim of this study was to develop a system to classify the marbling of beef using the hyperspectral imaging technology. The Japanese standard classification of the degree of marbling of beef was used as reference and twelve standards were digitized to obtain the parameters of shape and spatial distribution of marbling of each class. A total of 35 samples M. longissmus dorsi muscle were scanned by the hyperspectral imaging system of 400-1000 nm in reflectance mode. The wavelength of 528nm was selected to segment the sample and the background, and 440nm was used for classified the samples. Processing algorithms on image, based on decision tree method, were used in the region of interest obtaining a classification error of 0.08% in the building stage. The results showed that the proposed technique has a great potential, as a non-destructive and fast technique, that can be used to classify beef with respect to the degree of marbling. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization.

    Science.gov (United States)

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar

    2015-06-23

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have.

  20. Beef Quality Identification Using Thresholding Method and Decision Tree Classification Based on Android Smartphone

    Directory of Open Access Journals (Sweden)

    Kusworo Adi

    2017-01-01

    Full Text Available Beef is one of the animal food products that have high nutrition because it contains carbohydrates, proteins, fats, vitamins, and minerals. Therefore, the quality of beef should be maintained so that consumers get good beef quality. Determination of beef quality is commonly conducted visually by comparing the actual beef and reference pictures of each beef class. This process presents weaknesses, as it is subjective in nature and takes a considerable amount of time. Therefore, an automated system based on image processing that is capable of determining beef quality is required. This research aims to develop an image segmentation method by processing digital images. The system designed consists of image acquisition processes with varied distance, resolution, and angle. Image segmentation is done to separate the images of fat and meat using the Otsu thresholding method. Classification was carried out using the decision tree algorithm and the best accuracies were obtained at 90% for training and 84% for testing. Once developed, this system is then embedded into the android programming. Results show that the image processing technique is capable of proper marbling score identification.

  1. Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.

    Science.gov (United States)

    Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang

    2013-01-01

    Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.

  2. Objective consensus from decision trees.

    Science.gov (United States)

    Putora, Paul Martin; Panje, Cedric M; Papachristofilou, Alexandros; Dal Pra, Alan; Hundsberger, Thomas; Plasswilm, Ludwig

    2014-12-05

    Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.

  3. Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data

    NARCIS (Netherlands)

    Metting, Esther I; In 't Veen, Johannes C C M; Dekhuijzen, P N Richard; van Heijst, Ellen; Kocks, Janwillem W H; Muilwijk-Kroes, Jacqueline B; Chavannes, Niels H; van der Molen, Thys

    2016-01-01

    The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an

  4. Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data

    NARCIS (Netherlands)

    Metting, E.I.; Veen, J.C. In 't; Dekhuijzen, P.N.R.; Heijst, E. van; Kocks, J.W.; Muilwijk-Kroes, J.B.; Chavannes, N.H.; Molen, T. van der

    2016-01-01

    The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53+/-17 years) with suspicion of an obstructive pulmonary disease was derived from an

  5. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe

    2014-01-01

    With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...

  6. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  7. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    Science.gov (United States)

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results.

  8. The risk of disabling, surgery and reoperation in Crohn's disease - A decision tree-based approach to prognosis.

    Science.gov (United States)

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando

    2017-01-01

    Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.

  9. Reanalysis and External Validation of a Decision Tree Model for Detecting Unrecognized Diabetes in Rural Chinese Individuals

    Directory of Open Access Journals (Sweden)

    Zhong Xin

    2017-01-01

    Full Text Available We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.

  10. Reanalysis and External Validation of a Decision Tree Model for Detecting Unrecognized Diabetes in Rural Chinese Individuals.

    Science.gov (United States)

    Xin, Zhong; Hua, Lin; Wang, Xu-Hong; Zhao, Dong; Yu, Cai-Guo; Ma, Ya-Hong; Zhao, Lei; Cao, Xi; Yang, Jin-Kui

    2017-01-01

    We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.

  11. How to differentiate acute pelvic inflammatory disease from acute appendicitis ? A decision tree based on CT findings.

    Science.gov (United States)

    El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Molinari, Nicolas; Taourel, Patrice

    2018-02-01

    To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. • Appendiceal diameter and marked left tubal thickening allow differentiating PID from AA. • PID should be considered if appendiceal diameter is < 7 mm. • Marked left tubal diameter indicates PID rather than AA when enlarged appendix. • No pathological CT findings were identified in 5 % of PID patients.

  12. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    Science.gov (United States)

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. How to differentiate acute pelvic inflammatory disease from acute appendicitis? A decision tree based on CT findings

    International Nuclear Information System (INIS)

    El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Taourel, Patrice; Molinari, Nicolas

    2018-01-01

    To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. (orig.)

  14. Skin autofluorescence based decision tree in detection of impaired glucose tolerance and diabetes.

    Directory of Open Access Journals (Sweden)

    Andries J Smit

    Full Text Available Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons.Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE.218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%.SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.

  15. 決策樹形式知識之線上預測系統架構 | An On-Line Decision Tree-Based Predictive System Architecture

    Directory of Open Access Journals (Sweden)

    馬芳資、林我聰 Fang-Tz Ma、Woo-Tsong Lin

    2003-10-01

    ="font-size: small;">This paper presents an on-line decision tree-based predictive system architecture. The architecture contains nine components, including a database of the examples, a learning system of the decision trees, a knowledge base, a historical knowledge base, a maintaining interface of the decision trees, an interface to upload training and testing examples, a PMML (Predictive Model Markup Language translator, an on-line predictive system, and a merging optional decision trees system. There are three channels to import knowledge in the architecture; the developers can upload the examples to the learning system to induce the decision tree, directly input the information of decision trees through the user interface, or import the decision trees in PMML format. In order to integrate the knowledge of the decision trees, we added the merging optional decision trees system into this architecture. The merging optional decision trees system can combine multiple decision trees into a single decision tree to integrate the knowledge of the trees. In the future research, we will implement this architecture as a real system in the web-based platform to do some empirical analyses. And in order to improve the performance of the merging decision trees, we will also develop some pruning strategies in the merging optional decision trees system.

  16. Decision Tree Phytoremediation

    Science.gov (United States)

    1999-12-01

    8 2.4 Irrigation, Agronomic Inputs, and...documents will provide the reader in-depth background on the science and engineering mechanisms of phytoremediation. Using the decision tree and the...ITRC – Phytoremediation Decision Tree December 1999 8 • Contaminant levels • Plant selection • Treatability • Irrigation, agronomic

  17. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions.

    Science.gov (United States)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3(-) contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3(-) pollution activities via an unsupervised learning algorithm based on δ(15)N- and δ(18)O-NO3(-) and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3(-) contamination via a decision tree model. When a combination of δ(15)N-, δ(18)O-NO3(-) and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO4(2-) and Cl(-) variables. The NO3(-) and the δ(15)N- and δ(18)O-NO3(-) variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3(-) concentrations and isotopic values. Although only the SO4(2-) and Cl(-) were selected as important discriminating variables, concentration data alone could not identify the specific NO3(-) sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3(-) contamination, an integrated approach should be set-up by combining N and O isotopes of NO3(-) with land-uses and physico-chemical properties, especially in areas with complex agricultural activities. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    International Nuclear Information System (INIS)

    Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin

    2014-01-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies

  19. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain

    2014-01-01

    system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost......This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power...

  20. Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

    Science.gov (United States)

    Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

    2018-02-09

    Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P patterns.

  1. Comparison of hospital charge prediction models for colorectal cancer patients: neural network vs. decision tree models.

    Science.gov (United States)

    Lee, Seung-Mi; Kang, Jin-Oh; Suh, Yong-Moo

    2004-10-01

    Analysis and prediction of the care charges related to colorectal cancer in Korea are important for the allocation of medical resources and the establishment of medical policies because the incidence and the hospital charges for colorectal cancer are rapidly increasing. But the previous studies based on statistical analysis to predict the hospital charges for patients did not show satisfactory results. Recently, data mining emerges as a new technique to extract knowledge from the huge and diverse medical data. Thus, we built models using data mining techniques to predict hospital charge for the patients. A total of 1,022 admission records with 154 variables of 492 patients were used to build prediction models who had been treated from 1999 to 2002 in the Kyung Hee University Hospital. We built an artificial neural network (ANN) model and a classification and regression tree (CART) model, and compared their prediction accuracy. Linear correlation coefficients were high in both models and the mean absolute errors were similar. But ANN models showed a better linear correlation than CART model (0.813 vs. 0.713 for the hospital charge paid by insurance and 0.746 vs. 0.720 for the hospital charge paid by patients). We suggest that ANN model has a better performance to predict charges of colorectal cancer patients.

  2. Totally optimal decision trees for Boolean functions

    KAUST Repository

    Chikalov, Igor

    2016-07-28

    We study decision trees which are totally optimal relative to different sets of complexity parameters for Boolean functions. A totally optimal tree is an optimal tree relative to each parameter from the set simultaneously. We consider the parameters characterizing both time (in the worst- and average-case) and space complexity of decision trees, i.e., depth, total path length (average depth), and number of nodes. We have created tools based on extensions of dynamic programming to study totally optimal trees. These tools are applicable to both exact and approximate decision trees, and allow us to make multi-stage optimization of decision trees relative to different parameters and to count the number of optimal trees. Based on the experimental results we have formulated the following hypotheses (and subsequently proved): for almost all Boolean functions there exist totally optimal decision trees (i) relative to the depth and number of nodes, and (ii) relative to the depth and average depth.

  3. Design of a new hybrid artificial neural network method based on decision trees for calculating the Froude number in rigid rectangular channels

    Directory of Open Access Journals (Sweden)

    Ebtehaj Isa

    2016-09-01

    Full Text Available A vital topic regarding the optimum and economical design of rigid boundary open channels such as sewers and drainage systems is determining the movement of sediment particles. In this study, the incipient motion of sediment is estimated using three datasets from literature, including a wide range of hydraulic parameters. Because existing equations do not consider the effect of sediment bed thickness on incipient motion estimation, this parameter is applied in this study along with the multilayer perceptron (MLP, a hybrid method based on decision trees (DT (MLP-DT, to estimate incipient motion. According to a comparison with the observed experimental outcome, the proposed method performs well (MARE = 0.048, RMSE = 0.134, SI = 0.06, BIAS = -0.036. The performance of MLP and MLP-DT is compared with that of existing regression-based equations, and significantly higher performance over existing models is observed. Finally, an explicit expression for practical engineering is also provided.

  4. Decision tree methods: applications for classification and prediction.

    Science.gov (United States)

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  5. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  6. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India.

    Science.gov (United States)

    Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K

    2013-01-01

    The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index.

  7. Safety validation of decision trees for hepatocellular carcinoma.

    Science.gov (United States)

    Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong

    2015-08-21

    To evaluate a different decision tree for safe liver resection and verify its efficiency. A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time 50 μmol/L (the "50-50" criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P decision trees. The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma.

  8. Parallel object-oriented decision tree system

    Science.gov (United States)

    Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  9. A decision tree-based approach for determining low bone mineral density in inflammatory bowel disease using WEKA software.

    Science.gov (United States)

    Firouzi, Farzad; Rashidi, Marjan; Hashemi, Sattar; Kangavari, Mohammadreza; Bahari, Ali; Daryani, Naser Ebrahimi; Emam, Mohammad Mehdi; Naderi, Nosratollah; Shalmani, Hamid Mohaghegh; Farnood, Alma; Zali, Mohammadreza

    2007-12-01

    Decision tree classification is a standard machine learning technique that has been used for a wide range of applications. Patients with inflammatory bowel disease (IBD) are at increased risk of developing low bone mineral density (BMD). This study aimed at developing a new approach to select truly affected IBD patients who are indicated for densitometry, hence, subjecting fewer patients for bone densitometry and reducing expenses. Simple decision trees have been developed by means of WEKA (Waikato Environment for Knowledge Analysis) package of machine learning algorithms to predict factors influencing the bone density among IBD patients. The BMD status was the outcome variable whereas age, sex, duration of disease, smoking status, corticosteroid use, oral contraceptive use, calcium or vitamin D supplementation, menstruation, milk abstinence, BMI, and levels of calcium, phosphorous, alkaline phosphatase, and 25-OH vitamin D were all attributes. Testing showed the decision trees to have sensitivities of 65.7-82.8%, specificities of 95.2-96.3%, accuracies of 86.2-89.8%, and Matthews correlation coefficients of 0.68-0.79. Smoking status was the most significant node (root) for ulcerative colitis and IBD-associated trees whereas calcium status was the root of Crohn's disease patients' decision tree. BD specialists could use such decision trees to reduce substantially the number of patients referred for bone densitometry and potentially save resources.

  10. An Applied Research of Decision Tree Algorithm in Track and Field Equipment Training

    Directory of Open Access Journals (Sweden)

    Liu Shaoqing

    2015-01-01

    Full Text Available This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering computation cost through the selection of data as well as the application and optimization of ID3 algorithm model.

  11. Detecting surface coal mining areas from remote sensing imagery: an approach based on object-oriented decision trees

    Science.gov (United States)

    Zeng, Xiaoji; Liu, Zhifeng; He, Chunyang; Ma, Qun; Wu, Jianguo

    2017-01-01

    Detecting surface coal mining areas (SCMAs) using remote sensing data in a timely and an accurate manner is necessary for coal industry management and environmental assessment. We developed an approach to effectively extract SCMAs from remote sensing imagery based on object-oriented decision trees (OODT). This OODT approach involves three main steps: object-oriented segmentation, calculation of spectral characteristics, and extraction of SCMAs. The advantage of this approach lies in its effective integration of the spectral and spatial characteristics of SCMAs so as to distinguish the mining areas (i.e., the extracting areas, stripped areas, and dumping areas) from other areas that exhibit similar spectral features (e.g., bare soils and built-up areas). We implemented this method to extract SCMAs in the eastern part of Ordos City in Inner Mongolia, China. Our results had an overall accuracy of 97.07% and a kappa coefficient of 0.80. As compared with three other spectral information-based methods, our OODT approach is more accurate in quantifying the amount and spatial pattern of SCMAs in dryland regions.

  12. Genetic program based data mining of fuzzy decision trees and methods of improving convergence and reducing bloat

    Science.gov (United States)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  13. Accident diagnosis system based on real-time decision tree expert system

    Science.gov (United States)

    Nicolau, Andressa dos S.; Augusto, João P. da S. C.; Schirru, Roberto

    2017-06-01

    Safety is one of the most studied topics when referring to power stations. For that reason, sensors and alarms develop an important role in environmental and human protection. When abnormal event happens, it triggers a chain of alarms that must be, somehow, checked by the control room operators. In this case, diagnosis support system can help operators to accurately identify the possible root-cause of the problem in short time. In this article, we present a computational model of a generic diagnose support system based on artificial intelligence, that was applied on the dataset of two real power stations: Angra1 Nuclear Power Plant and Santo Antônio Hydroelectric Plant. The proposed system processes all the information logged in the sequence of events before a shutdown signal using the expert's knowledge inputted into an expert system indicating the chain of events, from the shutdown signal to its root-cause. The results of both applications showed that the support system is a potential tool to help the control room operators identify abnormal events, as accidents and consequently increase the safety.

  14. Estimating Surface Downward Shortwave Radiation over China Based on the Gradient Boosting Decision Tree Method

    Directory of Open Access Journals (Sweden)

    Lu Yang

    2018-01-01

    Full Text Available Downward shortwave radiation (DSR is an essential parameter in the terrestrial radiation budget and a necessary input for models of land-surface processes. Although several radiation products using satellite observations have been released, coarse spatial resolution and low accuracy limited their application. It is important to develop robust and accurate retrieval methods with higher spatial resolution. Machine learning methods may be powerful candidates for estimating the DSR from remotely sensed data because of their ability to perform adaptive, nonlinear data fitting. In this study, the gradient boosting regression tree (GBRT was employed to retrieve DSR measurements with the ground observation data in China collected from the China Meteorological Administration (CMA Meteorological Information Center and the satellite observations from the Advanced Very High Resolution Radiometer (AVHRR at a spatial resolution of 5 km. The validation results of the DSR estimates based on the GBRT method in China at a daily time scale for clear sky conditions show an R2 value of 0.82 and a root mean square error (RMSE value of 27.71 W·m−2 (38.38%. These values are 0.64 and 42.97 W·m−2 (34.57%, respectively, for cloudy sky conditions. The monthly DSR estimates were also evaluated using ground measurements. The monthly DSR estimates have an overall R2 value of 0.92 and an RMSE of 15.40 W·m−2 (12.93%. Comparison of the DSR estimates with the reanalyzed and retrieved DSR measurements from satellite observations showed that the estimated DSR is reasonably accurate but has a higher spatial resolution. Moreover, the proposed GBRT method has good scalability and is easy to apply to other parameter inversion problems by changing the parameters and training data.

  15. Application of decision tree algorithms for discriminating among woody plant taxa based on the pollen season characteristics

    Directory of Open Access Journals (Sweden)

    Kubik-Komar Agnieszka

    2015-01-01

    Full Text Available The aim of this study was to verify whether and which parameters of the atmospheric pollen season can distinguish between pollen types, the ranges of parameter values that delineate classes of taxa, and finally which taxa are similar to others within the domain of these parameter ranges. Decision tree algorithms were applied and the best tree was chosen to describe the rules of pollen classification. The study material consisted of airborne pollen grains of the following eight taxa: Alnus, Betula, Carpinus, Corylus, Cupressaceae, Fraxinus, Populus and Ulmus. Research was conducted in Lublin in eastern Poland during 2001-2013. The following six atmospheric pollen season parameters were analyzed: season start and end, duration, maximum daily pollen concentration, date of maximum pollen concentration, and the Seasonal Pollen Index (SPI. Four algorithms were used in data analysis and the J4.8 algorithm was chosen as the best for taxa classification, date of the end of season and the SPI value belonging to characteristics that served most to discriminate between pollen types. Based on the classification tree, the following four groups of taxa were identified: (i Ulmus; (ii Corylus, Alnus, Populus; (iii Betula; and (iv Carpinus, Fraxinus, Cupressaceae.

  16. Method for Walking Gait Identification in a Lower Extremity Exoskeleton Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors' information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person's motion.

  17. Skin Autofluorescence Based Decision Tree in Detection of Impaired Glucose Tolerance and Diabetes

    NARCIS (Netherlands)

    Smit, Andries J.; Smit, Jitske M.; Botterblom, Gijs J.; Mulder, Douwe

    2013-01-01

    Aim: Diabetes (DM) and impaired glucose tolerance (IGT) detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF) is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE) which are considered to be a carrier of glycometabolic memory. We

  18. Predictability of the future development of aggressive behavior of cranial dural arteriovenous fistulas based on decision tree analysis.

    Science.gov (United States)

    Satomi, Junichiro; Ghaibeh, A Ammar; Moriguchi, Hiroki; Nagahiro, Shinji

    2015-07-01

    The severity of clinical signs and symptoms of cranial dural arteriovenous fistulas (DAVFs) are well correlated with their pattern of venous drainage. Although the presence of cortical venous drainage can be considered a potential predictor of aggressive DAVF behaviors, such as intracranial hemorrhage or progressive neurological deficits due to venous congestion, accurate statistical analyses are currently not available. Using a decision tree data mining method, the authors aimed at clarifying the predictability of the future development of aggressive behaviors of DAVF and at identifying the main causative factors. Of 266 DAVF patients, 89 were eligible for analysis. Under observational management, 51 patients presented with intracranial hemorrhage/infarction during the follow-up period. The authors created a decision tree able to assess the risk for the development of aggressive DAVF behavior. Evaluated by 10-fold cross-validation, the decision tree's accuracy, sensitivity, and specificity were 85.28%, 88.33%, and 80.83%, respectively. The tree shows that the main factor in symptomatic patients was the presence of cortical venous drainage. In its absence, the lesion location determined the risk of a DAVF developing aggressive behavior. Decision tree analysis accurately predicts the future development of aggressive DAVF behavior.

  19. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  20. [Use the Markov-decision tree model to optimize vaccination strategies of hepatitis E among women aged 15 to 49].

    Science.gov (United States)

    Chen, Z M; Ji, S B; Shi, X L; Zhao, Y Y; Zhang, X F; Jin, H

    2017-02-10

    Objective: To evaluate the cost-utility of different hepatitis E vaccination strategies in women aged 15 to 49. Methods: The Markov-decision tree model was constructed to evaluate the cost-utility of three hepatitis E virus vaccination strategies. Parameters of the models were estimated on the basis of published studies and experience of experts. Both methods on sensitivity and threshold analysis were used to evaluate the uncertainties of the model. Results: Compared with non-vaccination group, strategy on post-screening vaccination with rate as 100%, could save 0.10 quality-adjusted life years per capital in the women from the societal perspectives. After implementation of screening program and with the vaccination rate reaching 100%, the incremental cost utility ratio (ICUR) of vaccination appeared as 5 651.89 and 6 385.33 Yuan/QALY, respectively. Vaccination post to the implementation of a screening program, the result showed better benefit than the vaccination rate of 100%. Results from the sensitivity analysis showed that both the cost of hepatitis E vaccine and the inoculation compliance rate presented significant effects. If the cost were lower than 191.56 Yuan (RMB) or the inoculation compliance rate lower than 0.23, the vaccination rate of 100% strategy was better than the post-screening vaccination strategy, otherwise the post-screening vaccination strategy appeared the optimal strategy. Conclusion: Post-screening vaccination for women aged 15 to 49 from social perspectives seemed the optimal one but it had to depend on the change of vaccine cost and the rate of inoculation compliance.

  1. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard

    2013-01-01

    integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy......-intensive but efficient vessels conducting pelagic or industrial fishing are more inclined to base their decision on fish price only, while numerous smaller and less efficient vessels conducting demersal mixed or crustacean fishery usually consider other flexible factors, e.g., the potential for a large catch, weather...... the adaptations of individual fishermen to resource availability dynamics, increasing fuel prices, changes in regulations, and the consequences of socioeconomic external pressures on harvested stocks. A new methodology is described here to obtain quantitative information on the fishermen’s micro-scale decisions...

  2. Modelling alcohol consumption during adolescence using zero inflated negative binomial and decision trees

    Directory of Open Access Journals (Sweden)

    Alfonso Palmer

    2010-07-01

    Full Text Available Alcohol is currently the most consumed substance among the Spanish adolescent population. Some of the variables that bear an influence on this consumption include ease of access, use of alcohol by friends and some personality factors. The aim of this study was to analyze and quantify the predictive value of these variables specifically on alcohol consumption in the adolescent population. The useful sample was made up of 6,145 adolescents (49.8% boys and 50.2% girls with a mean age of 15.4 years (SE= 1.2. The data were analyzed using the statistical model for a count variable and Data Mining techniques. The results show the influence of ease of access, alcohol consumption by the group of friends, and certain personality factors on alcohol intake, allowing us to quantify the intensity of this influence according to age and gender. Knowing these factors is the starting point in elaborating specific preventive actions against alcohol consumption.

  3. Maximal standard dose of parenteral iron for hemodialysis patients: an MRI-based decision tree learning analysis.

    Directory of Open Access Journals (Sweden)

    Guy Rostoker

    Full Text Available Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin, in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%. The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4 with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.

  4. A protocol for developing early warning score models from vital signs data in hospitals using ensembles of decision trees.

    Science.gov (United States)

    Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison

    2015-09-09

    Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case-control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17,151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. I. Khader

    2013-05-01

    Full Text Available Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i ignore the health risk of nitrate-contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water

  6. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  7. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A.; Rosenberg, D.; McKee, M.

    2012-12-01

    Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i) ignore the health risk of nitrate contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  8. The risk of disabling, surgery and reoperation in Crohn’s disease – A decision tree-based approach to prognosis

    Science.gov (United States)

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula

    2017-01-01

    Introduction Crohn’s disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients’ risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. Materials and methods This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Results Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50–4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09–0.25] and 0.50 [0.24–1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. Conclusions The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation. PMID:28225800

  9. A decision-tree-based method for reconstructing disturbance history in the Russia boreal forests over 30 years

    Science.gov (United States)

    Chen, D.; Loboda, T. V.

    2012-12-01

    The boreal forest is one of the largest biomes on Earth and carries crucial significance in numerous aspects. Located in the high latitude region of the Northern Hemisphere, it is predicted that the boreal forest is subject to the highest level of influence under the changing climate, which may impose profound impacts on the global carbon and energy budget. Of the entire boreal biome, approximately two thirds consists of the Russian boreal forest, which is also the largest forested zone in the world. Fire and logging have been the predominant disturbance types in the Russian boreal forest, which accelerate the speed of carbon release into the atmosphere. To better understand these processes, records of past disturbance are in great need. However, there has been no comprehensive and unbiased multi-decadal record of forest disturbance in this region. This paper illustrates a method for reconstructing disturbance history in the Russia boreal forests over 30 years. This method takes advantage of data from both Landsat, which has a long data record but limited spatial coverage, and the Moderate Resolution Spectroradiometer (MODIS), which has wall-to-wall spatial coverage but limited period of observations. We developed a standardized and semi-automated approach to extract training and validation data samples from Landsat imagery. Landsat data, dating back to 1984, were used to generate maps of forest disturbance using temporal shifts in Disturbance Index through the multi-temporal stack of imagery in selected locations. The disturbed forests are attributed to logging or burning causes by means of visual examination. The Landsat-based disturbance maps are then used as reference data to train a decision tree classifier on 2003 MODIS data. This classifier utilizes multiple direct MODIS products including the BRDF-adjusted surface reflectance, a suite of vegetation indices, and land surface temperature. The algorithm also capitalizes on seasonal variability in class

  10. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan M.

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  11. A novel decision tree approach based on transcranial Doppler sonography to screen for blunt cervical vascular injuries.

    Science.gov (United States)

    Purvis, Dianna; Aldaghlas, Tayseer; Trickey, Amber W; Rizzo, Anne; Sikdar, Siddhartha

    2013-06-01

    Early detection and treatment of blunt cervical vascular injuries prevent adverse neurologic sequelae. Current screening criteria can miss up to 22% of these injuries. The study objective was to investigate bedside transcranial Doppler sonography for detecting blunt cervical vascular injuries in trauma patients using a novel decision tree approach. This prospective pilot study was conducted at a level I trauma center. Patients undergoing computed tomographic angiography for suspected blunt cervical vascular injuries were studied with transcranial Doppler sonography. Extracranial and intracranial vasculatures were examined with a portable power M-mode transcranial Doppler unit. The middle cerebral artery mean flow velocity, pulsatility index, and their asymmetries were used to quantify flow patterns and develop an injury decision tree screening protocol. Student t tests validated associations between injuries and transcranial Doppler predictive measures. We evaluated 27 trauma patients with 13 injuries. Single vertebral artery injuries were most common (38.5%), followed by single internal carotid artery injuries (30%). Compared to patients without injuries, mean flow velocity asymmetry was higher for single internal carotid artery (P = .003) and single vertebral artery (P = .004) injuries. Similarly, pulsatility index asymmetry was higher in single internal carotid artery (P = .015) and single vertebral artery (P = .042) injuries, whereas the lowest pulsatility index was elevated for bilateral vertebral artery injuries (P = .006). The decision tree yielded 92% specificity, 93% sensitivity, and 93% correct classifications. In this pilot feasibility study, transcranial Doppler measures were significantly associated with the blunt cervical vascular injury status, suggesting that transcranial Doppler sonography might be a viable bedside screening tool for trauma. Patient-specific hemodynamic information from transcranial Doppler assessment has the potential to alter

  12. Tips for teachers of evidence-based medicine: making sense of decision analysis using a decision tree.

    Science.gov (United States)

    Lee, Anna; Joynt, Gavin M; Ho, Anthony M H; Keitz, Sheri; McGinn, Thomas; Wyer, Peter C

    2009-05-01

    Decision analysis is a tool that clinicians can use to choose an option that maximizes the overall net benefit to a patient. It is an explicit, quantitative, and systematic approach to decision making under conditions of uncertainty. In this article, we present two teaching tips aimed at helping clinical learners understand the use and relevance of decision analysis. The first tip demonstrates the structure of a decision tree. With this tree, a clinician may identify the optimal choice among complicated options by calculating probabilities of events and incorporating patient valuations of possible outcomes. The second tip demonstrates how to address uncertainty regarding the estimates used in a decision tree. We field tested the tips twice with interns and senior residents. Teacher preparatory time was approximately 90 minutes. The field test utilized a board and a calculator. Two handouts were prepared. Learners identified the importance of incorporating values into the decision-making process as well as the role of uncertainty. The educational objectives appeared to be reached. These teaching tips introduce clinical learners to decision analysis in a fashion aimed to illustrate principles of clinical reasoning and how patient values can be actively incorporated into complex decision making.

  13. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    Science.gov (United States)

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  14. Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem

    Directory of Open Access Journals (Sweden)

    Bryant Stephen H

    2008-09-01

    Full Text Available Abstract Background Recent advances in high-throughput screening (HTS techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. Results In this study, Decision Trees (DT based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system http://pubchem.ncbi.nlm.nih.gov. The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV sensitivity, specificity and Matthews Correlation Coefficient (MCC for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. Conclusion Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.

  15. Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in PubChem.

    Science.gov (United States)

    Han, Lianyi; Wang, Yanli; Bryant, Stephen H

    2008-09-25

    Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system http://pubchem.ncbi.nlm.nih.gov. The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2 approximately 80.5%, 97.3 approximately 99.0%, 0.4 approximately 0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.

  16. CorRECTreatment: a web-based decision support tool for rectal cancer treatment that uses the analytic hierarchy process and decision tree.

    Science.gov (United States)

    Suner, A; Karakülah, G; Dicle, O; Sökmen, S; Çelikoğlu, C C

    2015-01-01

    The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians' decision making. The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratiodecisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options.

  17. [The application of decision tree in the research of anemia among rural children under 3-year-old].

    Science.gov (United States)

    Ma, Yu-gang; Bi, Yu-xue; Yan, Hong; Deng, Li-na; Liang, Wei-feng; Wang, Bei; Zhang, Xue-li

    2009-05-01

    To study the application of decision tree in the research of anemia among rural children. In the Enterprise Miner module of software SAS 8.2, 3000 observations were sampled from database and the decision tree model was built. The model using decision tree of CART bases on Gini impurity index. The misclassification rate of decision tree model was, training set 21.2%, validation set 21.9%. The Root ASE of decision tree model was, training set 0.399, validation set 0.404. The area under the ROC curve was larger than the reference line. The diagnostic chart showed that the corresponding percentage was higher than the other. The decision tree model selected 9 important factors and ranked them by their power, among which mother of anemia (1.00) was the most important factor. Others were children's age (0.75), time of ablactation (0.53), mother's age (0.32), the time of egg supplementation (0.26), category of the project county (0.26), the time of milk supplementation (0.16), number of people in the family (0.13), the education status of the mother (0.12). Decision tree produced simple and easy rules that might be used to classify and predict in the same research. Decision tree could screen out the important factors of anemia and identify the cutting-points for factors. With the wide application of decision tree, it would exhibit important application values in the research of the rural children health care.

  18. A decision tree-based approach for identifying urban-rural differences in metabolic syndrome risk factors in the adult Korean population.

    Science.gov (United States)

    Kim, T N; Kim, J M; Won, J C; Park, M S; Lee, S K; Yoon, S H; Kim, H-R; Ko, K S; Rhee, B D

    2012-10-01

    The purpose of this study was to explore the difference in the pattern of metabolic syndrome (MetS) in urban and rural populations in Korea using data mining techniques. In total, 1013 adults >30 yr of age from urban (184 males and 313 females) and rural districts (211 males and 305 females) were recruited from Gyeongsangnam-do, Korea. Modified National Cholesterol Education Program Adult Treatment Panel III criteria were used to identify individuals with MetS. We applied a decision tree analysis to elucidate the differences in the clustering of MetS components between the urban and rural populations. The prevalence of MetS was 33.2% and 35.2% in urban and rural districts, respectively (p=0.598). The decision-tree approach revealed that the combination of high serum triglycerides (TG) + high systolic blood pressure (SBP), high TG + low HDL cholesterol, and high waist circumference (WC) + high SBP + high fasting plasma glucose (FPG) were strong predictors of MetS in the urban population, whereas the combination of TG + SBP + WC and SBP + WC + FPG showed high positive predictive value for the presence of MetS in the rural population. Although no significant difference was found for the prevalence of MetS between the two populations, the differences in the clustering pattern of MetS components in urban and rural districts in Korea were identified by decision tree analysis. Our findings may serve as a basis to design necessary population-based intervention programs for prevention and progression of MetS and its complications in Korea.

  19. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  20. The application of artificial neural networks and decision tree model in predicting post-operative complication for gastric cancer patients.

    Science.gov (United States)

    Chien, Ching-Wen; Lee, Yi-Chih; Ma, Tsochiang; Lee, Tian-Shyug; Lin, Yang-Chu; Wang, Weu; Lee, Wei-Jei

    2008-01-01

    Gastric cancer remains a leading cause of death worldwide. Post-operative complication is one important factor which causes mortality of gastric cancer patients after gastrectomy. Better prediction of post-operative complication before gastrectomy can significantly reduce post-operative mortality and morbidity. Therefore, 3 data mining techniques were applied in this study on improving prediction of post-operative complication. A retrospective study was performed on 521 patients from 3 over 2,000 acute-bed medical centers in Taiwan during February 2002 to October 2004. Pre- and post-operative clinical data were collected and analyzed by applying 3 data mining techniques, included Artificial Neural Networks (ANN), Decision Tree (DT) and Logistic Regression (LR). Results of this study indicated that ANN was a better technique than DT and LR in predicting post-operative complication. Nutritious status, pathological characteristics and operational characteristics were important predictors of post-operative complication. Further study on predicting postoperative complication in gastric cancer patients is still important. However, how to combine different data mining techniques to improve accuracies of prediction will be another important issue for clinicians and researchers.

  1. Interpreting CNNs via Decision Trees

    OpenAIRE

    Zhang, Quanshi; Yang, Yu; Wu, Ying Nian; Zhu, Song-Chun

    2018-01-01

    This paper presents a method to learn a decision tree to quantitatively explain the logic of each prediction of a pre-trained convolutional neural networks (CNNs). Our method boosts the following two aspects of network interpretability. 1) In the CNN, each filter in a high conv-layer must represent a specific object part, instead of describing mixed patterns without clear meanings. 2) People can explain each specific prediction made by the CNN at the semantic level using a decision tree, i.e....

  2. A comparison of the decision tree approach and the neural-networks-based heuristic dynamic programming approach for subcircuit extraction problem

    Science.gov (United States)

    Zhang, Nian; Wunsch, Donald C., II

    2003-08-01

    The applications of non-standard logic device are increasing fast in the industry. Many of these applications require high speed, low power, functionality and flexibility, which cannot be obtained by standard logic device. These special logic cells can be constructed by the topology design strategy automatically or manually. However, the need arises for the topology design verification. The layout versus schematic (LVS) analysis is an essential part of topology design verification, and subcircuit extraction is one of the operations in the LVS testing. In this paper, we first provided an efficient decision tree approach to the graph isomorphism problem, and then effectively applied it to the subcircuit extraction problem based on the solution to the graph isomorphism problem. To evaluate its performance, we compare it with the neural networks based heuristic dynamic programming algorithm (SubHDP) which is by far one of the fastest algorithms for subcircuit extraction problem.

  3. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  4. Applied Research of Decision Tree Method on Football Training

    Directory of Open Access Journals (Sweden)

    Liu Jinhui

    2015-01-01

    Full Text Available This paper will make an analysis of decision tree at first, and then offer a further analysis of CLS based on it. As CLS contains the most substantial and most primitive decision-making idea, it can provide the basis of decision tree establishment. Due to certain limitation in details, the ID3 decision tree algorithm is introduced to offer more details. It applies information gain as attribute selection metrics to provide reference for seeking the optimal segmentation point. At last, the ID3 algorithm is applied in football training. Verification is made on this algorithm and it has been proved effectively and reasonably.

  5. Decision tree analysis in subarachnoid hemorrhage: prediction of outcome parameters during the course of aneurysmal subarachnoid hemorrhage using decision tree analysis.

    Science.gov (United States)

    Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela

    2018-01-19

    OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.

  6. Measuring performance in health care: case-mix adjustment by boosted decision trees.

    Science.gov (United States)

    Neumann, Anke; Holstein, Josiane; Le Gall, Jean-Roger; Lepage, Eric

    2004-10-01

    The purpose of this paper is to investigate the suitability of boosted decision trees for the case-mix adjustment involved in comparing the performance of various health care entities. First, we present logistic regression, decision trees, and boosted decision trees in a unified framework. Second, we study in detail their application for two common performance indicators, the mortality rate in intensive care and the rate of potentially avoidable hospital readmissions. For both examples the technique of boosting decision trees outperformed standard prognostic models, in particular linear logistic regression models, with regard to predictive power. On the other hand, boosting decision trees was computationally demanding and the resulting models were rather complex and needed additional tools for interpretation. Boosting decision trees represents a powerful tool for case-mix adjustment in health care performance measurement. Depending on the specific priorities set in each context, the gain in predictive power might compensate for the inconvenience in the use of boosted decision trees.

  7. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    Science.gov (United States)

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. DTMACC: Decision Trees with Multiple Attributes Concept Clustering

    Science.gov (United States)

    Kushi, Yusuke; Inazumi, Hiroshige

    A decision tree is one of the machine learning techniques and also one of the major knowledge representations of data mining results.This is because it is easy to understand its meaning for human analysts.Even ID3, the representative algorithm, is known to exhibit remarkable performance deterioration under certain circumstances, particularly due to strong correlation between attributes representing the class of examples. One of the approaches to get more preferable decision trees is pre-processing the training data to extend its description, such as attributes generation and attribute selection. There is also the idea of decision trees with a region rule. In this paper, we consider two approaches, i.e., decision trees with a region rule allowing multiple attributes, and a pre-processing method of a region rule to enabling any suitable number of attributes to correspond to branch nodes, where an optimal division condition with arbitrarily multiple attributes is acquired. By using this method, we propose a new decision tree generation algorithm guaranteeing to select effective compound attributes with each branch node, where an MDL-based new evaluation criterion is also defined for determining the optimal number of compound attributes specified to each node.This algorithm is applied to datasets containing only nominal values. It consists of three processes: compound attributes selection, parent node integration, and pruning. We call this new decision trees DTMACC (Decision Trees with Multiple Attributes Concept Clustering). The effectiveness and comprehensiveness of the proposed algorithm are confirmed through experiments comparing to the ordinary decision trees and an effective pre-processing method.

  9. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  10. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  11. MALDI-TOF MS combined with magnetic beads for detecting serum protein biomarkers and establishment of boosting decision tree model for diagnosis of systemic lupus erythematosus.

    Science.gov (United States)

    Huang, Zhuochun; Shi, Yunying; Cai, Bei; Wang, Lanlan; Wu, Yongkang; Ying, Binwu; Qin, Li; Hu, Chaojun; Li, Yongzhe

    2009-06-01

    To discover novel potential biomarkers and establish a diagnostic pattern for SLE by using proteomic technology. Serum proteomic spectra were generated by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) combined with weak cationic exchange magnetic beads. A training set of spectra, derived from analysing sera from 32 patients with SLE, 43 patients with other autoimmune diseases and 43 age- and sex-matched healthy volunteers, was used to train and develop a decision tree model with a machine learning algorithm called decision boosting. A blinded testing set, including 32 patients with SLE, 42 patients with other autoimmune diseases and 40 healthy people, was used to determine the accuracy of the model. The diagnostic pattern with a panel of four potential protein biomarkers of mass-to-charge (m/z) ratio 4070.09, 7770.45, 28 045.1 and 3376.02 could accurately recognize 25 of 32 patients with SLE, 36 of 42 patients with other autoimmune diseases and 36 of 40 healthy people. The preliminary data suggested a potential application of MALDI-TOF MS combined with magnetic beads as an effective technology to profile serum proteome, and with pattern analysis, a diagnostic model comprising four potential biomarkers was indicated to differentiate individuals with SLE from RA, SS, SSc and healthy controls rapidly and precisely.

  12. A methodology for the automated creation of fuzzy expert systems for ischaemic and arrhythmic beat classification based on a set of rules obtained by a decision tree.

    Science.gov (United States)

    Exarchos, Themis P; Tsipouras, Markos G; Exarchos, Costas P; Papaloukas, Costas; Fotiadis, Dimitrios I; Michalis, Lampros K

    2007-07-01

    In the current work we propose a methodology for the automated creation of fuzzy expert systems, applied in ischaemic and arrhythmic beat classification. The proposed methodology automatically creates a fuzzy expert system from an initial training dataset. The approach consists of three stages: (a) extraction of a crisp set of rules from a decision tree induced from the training dataset, (b) transformation of the crisp set of rules into a fuzzy model and (c) optimization of the fuzzy model's parameters using global optimization. The above methodology is employed in order to create fuzzy expert systems for ischaemic and arrhythmic beat classification in ECG recordings. The fuzzy expert system for ischaemic beat detection is evaluated in a cardiac beat dataset that was constructed using recordings from the European Society of Cardiology ST-T database. The arrhythmic beat classification fuzzy expert system is evaluated using the MIT-BIH arrhythmia database. The fuzzy expert system for ischaemic beat classification reported 91% sensitivity and 92% specificity. The arrhythmic beat classification fuzzy expert system reported 96% average sensitivity and 99% average specificity for all categories. The proposed methodology provides high accuracy and the ability to interpret the decisions made. The fuzzy expert systems for ischaemic and arrhythmic beat classification compare well with previously reported results, indicating that they could be part of an overall clinical system for ECG analysis and diagnosis.

  13. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  14. TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.

    Science.gov (United States)

    Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald

    2018-01-01

    Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.

  15. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory

    Science.gov (United States)

    Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...

  16. Influence diagrams and decision trees for severe accident management

    International Nuclear Information System (INIS)

    Goetz, W.W.J.; Seebregts, A.J.; Bedford, T.J.

    1996-08-01

    A review of relevent methodologies based on Influence Diagrams (IDs), Decision Trees (DTs), and Containment Event Trees (CETs) was conducted to assess the practicality of these methods for the selection of effective strategies for Severe Accident Management (SAM). The review included an evaluation of some software packages for these methods. The emphasis was on possible pitfalls of using IDs and on practical aspects, the latter by performance of a case study that was based on an existing Level 2 Probabilistic Safety Assessment (PSA). The study showed that the use of a combined ID/DT model has advantages over CET models, in particular when conservatisms in the Level 2 PSA have been identified and replaced by fair assessments of the uncertainties involved. It is recommended to use ID/DT models as complementary to CET models. (orig.)

  17. Exploratory Decision-Tree Modeling of Data from the Randomized REACTT Trial of Tadalafil Versus Placebo to Predict Recovery of Erectile Function After Bilateral Nerve-Sparing Radical Prostatectomy.

    Science.gov (United States)

    Montorsi, Francesco; Oelke, Matthias; Henneges, Carsten; Brock, Gerald; Salonia, Andrea; d'Anzeo, Gianluca; Rossi, Andrea; Mulhall, John P; Büttner, Hartwig

    2016-09-01

    Understanding predictors for the recovery of erectile function (EF) after nerve-sparing radical prostatectomy (nsRP) might help clinicians and patients in preoperative counseling and expectation management of EF rehabilitation strategies. To describe the effect of potential predictors on EF recovery after nsRP by post hoc decision-tree modeling of data from A Study of Tadalafil After Radical Prostatectomy (REACTT). Randomized double-blind double-dummy placebo-controlled trial in 423 men aged decision-tree models, using the International Index of Erectile Function-Erectile Function (IIEF-EF) domain score at the end of double-blind treatment, washout, and open-label treatment as response variable. Each model evaluated the association between potential predictors: presurgery IIEF domain and IIEF single-item scores, surgical approach, nerve-sparing score (NSS), and postsurgery randomized treatment group. The first decision-tree model (n=422, intention-to-treat population) identified high presurgery sexual desire (IIEF item 12: ≥3.5 and decision-tree analyses identified high presurgery sexual desire, confidence, and intercourse satisfaction as key predictors for EF recovery. Patients meeting these criteria might benefit the most from conserving surgery and early postsurgery EF rehabilitation. Strategies for improving EF after surgery should be discussed preoperatively with all patients; this information may support expectation management for functional recovery on an individual patient level. Understanding how patient characteristics and different treatment options affect the recovery of erectile function (EF) after radical surgery for prostate cancer might help physicians select the optimal treatment for their patients. This analysis of data from a clinical trial suggested that high presurgery sexual desire, sexual confidence, and intercourse satisfaction are key factors predicting EF recovery. Patients meeting these criteria might benefit the most from conserving

  18. Decision tree induction in the diagnosis of otoneurological diseases.

    Science.gov (United States)

    Viikki, K; Kentala, E; Juhola, M; Pyykkö, I

    1999-01-01

    Expert systems have been applied in medicine as diagnostic aids and education tools. The construction of a knowledge base for an expert system may be a difficult task; to automate this task several machine learning methods have been developed. These methods can be also used in the refinement of knowledge bases for removing inconsistencies and redundancies, and for simplifying decision rules. In this study, decision tree induction was employed to acquire diagnostic knowledge for otoneurological diseases and to extract relevant parameters from the database of an otoneurological expert system ONE. The records of patients with benign positional vertigo, Meniere's disease, sudden deafness, traumatic vertigo, vestibular neuritis and vestibular schwannoma were retrieved from the database of ONE, and for each disease, decision trees were constructed. The study shows that decision tree induction is a useful technique for acquiring diagnostic knowledge for otoneurological diseases and for extracting relevant parameters from a large set of parameters.

  19. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  20. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  1. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  2. Shopping intention prediction using decision trees

    Directory of Open Access Journals (Sweden)

    Dario Šebalj

    2017-09-01

    Full Text Available Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer. Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they intend to shop or not. Methods: Data sample consists of 305 respondents, who are persons older than 18 years involved in buying groceries for their household. The research was conducted in February 2017. In order to create a model, the decision trees method was used with its several classification algorithms. Results: All models, except the one that used RandomTree algorithm, achieved relatively high classification rate (over the 80%. The highest classification accuracy of 84.75% gave J48 and RandomForest algorithms. Since there is no statistically significant difference between those two algorithms, authors decided to choose J48 algorithm and build a decision tree. Conclusions: The value for money and price level in the store were the most significant variables for classification of shopping intention. Future study plans to compare this model with some other data mining techniques, such as neural networks or support vector machines since these techniques achieved very good accuracy in some previous research in this field.

  3. Multimedia medical case retrieval using decision trees.

    Science.gov (United States)

    Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Cochener, Béatrice; Roux, Christian

    2007-01-01

    In this paper, we present a Case Based Reasoning (CBR) system for the retrieval of medical cases made up of a series of images with contextual information (such as the patient age, sex and medical history). Indeed, medical experts generally need varied sources of information (which might be incomplete) to diagnose a pathology. Consequently, we derive a retrieval framework from decision trees, which are well suited to process heterogeneous and incomplete information. To be integrated in the system, images are indexed by their digital content. The method is evaluated on a classified diabetic retinopathy database. On this database, results are promising: the retrieval sensitivity reaches 79.5% for a window of 5 cases, which is almost twice as good as the retrieval of single images alone. As a comparison, the retrieval sensitivity is 52.3% for a standard multimodal case retrieval using a linear combination of heterogeneous distances.

  4. Multimedia medical case retrieval using decision trees

    Science.gov (United States)

    Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Cochener, Béatrice; Roux, Christian

    2007-01-01

    In this paper, we present a Case Based Reasoning (CBR) system for the retrieval of medical cases made up of a series of images with contextual information (such as the patient age, sex and medical history). Indeed, medical experts generally need varied sources of information (which might be incomplete) to diagnose a pathology. Consequently, we derive a retrieval framework from decision trees, which are well suited to process heterogeneous and incomplete information. To be integrated in the system, images are indexed by their digital content. The method is evaluated on a classified diabetic retinopathy database. On this database, results are promising: the retrieval sensitivity reaches 79.5% for a window of 5 cases, which is almost twice as good as the retrieval of single images alone. As a comparison, the retrieval sensitivity is 52.3% for a standard multimodal case retrieval using a linear combination of heterogeneous distances. PMID:18003014

  5. The decision tree approach to classification

    Science.gov (United States)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  6. New Splitting Criteria for Decision Trees in Stationary Data Streams.

    Science.gov (United States)

    Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek

    2017-05-10

    The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type-$I$ splitting criteria guarantee, with high probability, the highest expected value of split measure. Type-$II$ criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.

  7. PREDIKSI CALON MAHASISWA BARU MENGUNAKAN METODE KLASIFIKASI DECISION TREE

    Directory of Open Access Journals (Sweden)

    Mambang

    2015-02-01

    Full Text Available Prior to the organization of health education begin the new school year, then the first step will be carried out selection of new admissions from general secondary education graduates and vocational. In this study, predicting new students to take multiple data attributes. The model is a decision tree classification prediction method to create a tree consisting of a root node, internal nodes and terminal nodes. While the root node and internal nodes are variables / features, the terminal node. Based on the experimental results and evaluations are done, it can be concluded that algorithm C4.5 with 80.39% accuracy obtained Uncertainty, Precision 94.44%, Recall of 75.00 % while the C4.5 algorithm with Information Gain Accuracy Ratio 88.24%, 98.28% Precision, 83.82% Recall.

  8. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    Science.gov (United States)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  9. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    Science.gov (United States)

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  10. VC-dimension of univariate decision trees.

    Science.gov (United States)

    Yildiz, Olcay Taner

    2015-02-01

    In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.

  11. PRIA 3 Fee Determination Decision Tree

    Science.gov (United States)

    The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.

  12. RE-Powering’s Electronic Decision Tree

    Science.gov (United States)

    Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations

  13. Speech Recognition Using Randomized Relational Decision Trees

    National Research Council Canada - National Science Library

    Amit, Yali

    1999-01-01

    .... This implies that we recognize words as units, without recognizing their subcomponents. Multiple randomized decision trees are used to access the large pool of acoustic events in a systematic manner and are aggregated to produce the classifier.

  14. Solar and Wind Site Screening Decision Trees

    Science.gov (United States)

    EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.

  15. A Decision Tree for Nonmetric Sex Assessment from the Skull.

    Science.gov (United States)

    Langley, Natalie R; Dudzik, Beatrix; Cloutier, Alesia

    2018-01-01

    This study uses five well-documented cranial nonmetric traits (glabella, mastoid process, mental eminence, supraorbital margin, and nuchal crest) and one additional trait (zygomatic extension) to develop a validated decision tree for sex assessment. The decision tree was built and cross-validated on a sample of 293 U.S. White individuals from the William M. Bass Donated Skeletal Collection. Ordinal scores from the six traits were analyzed using the partition modeling option in JMP Pro 12. A holdout sample of 50 skulls was used to test the model. The most accurate decision tree includes three variables: glabella, zygomatic extension, and mastoid process. This decision tree yielded 93.5% accuracy on the training sample, 94% on the cross-validated sample, and 96% on a holdout validation sample. Linear weighted kappa statistics indicate acceptable agreement among observers for these variables. Mental eminence should be avoided, and definitions and figures should be referenced carefully to score nonmetric traits. © 2017 American Academy of Forensic Sciences.

  16. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad

    2016-06-09

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  17. Statistical clustering of parametric maps from dynamic contrast enhanced MRI and an associated decision tree model for non-invasive tumour grading of T1b solid clear cell renal cell carcinoma

    International Nuclear Information System (INIS)

    Xi, Yin; Yuan, Qing; Zhang, Yue; Fulkerson, Michael; Madhuranthakam, Ananth J.; Margulis, Vitaly; Cadeddu, Jeffrey A.; Brugarolas, James; Kapur, Payal; Pedrosa, Ivan

    2018-01-01

    To apply a statistical clustering algorithm to combine information from dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) into a single tumour map to distinguish high-grade from low-grade T1b clear cell renal cell carcinoma (ccRCC). This prospective, Institutional Review Board -approved, Health Insurance Portability and Accountability Act -compliant study included 18 patients with solid T1b ccRCC who underwent pre-surgical DCE MRI. After statistical clustering of the parametric maps of the transfer constant between the intravascular and extravascular space (K trans ), rate constant (K ep ) and initial area under the concentration curve (iAUC) with a fuzzy c-means (FCM) algorithm, each tumour was segmented into three regions (low/medium/high active areas). Percentages of each region and tumour size were compared to tumour grade at histopathology. A decision-tree model was constructed to select the best parameter(s) to predict high-grade ccRCC. Seven high-grade and 11 low-grade T1b ccRCCs were included. High-grade histology was associated with higher percent high active areas (p = 0.0154) and this was the only feature selected by the decision tree model, which had a diagnostic performance of 78% accuracy, 86% sensitivity, 73% specificity, 67% positive predictive value and 89% negative predictive value. The FCM integrates multiple DCE-derived parameter maps and identifies tumour regions with unique pharmacokinetic characteristics. Using this approach, a decision tree model using criteria beyond size to predict tumour grade in T1b ccRCCs is proposed. (orig.)

  18. Statistical clustering of parametric maps from dynamic contrast enhanced MRI and an associated decision tree model for non-invasive tumour grading of T1b solid clear cell renal cell carcinoma

    Energy Technology Data Exchange (ETDEWEB)

    Xi, Yin; Yuan, Qing; Zhang, Yue; Fulkerson, Michael [UT Southwestern Medical Center, Department of Radiology, Dallas, TX (United States); Madhuranthakam, Ananth J. [UT Southwestern Medical Center, Department of Radiology, Dallas, TX (United States); UT Southwestern Medical Center, Advanced Imaging Research Center, Dallas, TX (United States); Margulis, Vitaly; Cadeddu, Jeffrey A. [UT Southwestern Medical Center, Department of Urology, Dallas, TX (United States); UT Southwestern Medical Center, Kidney Cancer Program, Simmons Comprehensive Cancer Center, Dallas, TX (United States); Brugarolas, James [UT Southwestern Medical Center, Kidney Cancer Program, Simmons Comprehensive Cancer Center, Dallas, TX (United States); UT Southwestern Medical Center, Department of Internal Medicine, Dallas, TX (United States); Kapur, Payal [UT Southwestern Medical Center, Department of Urology, Dallas, TX (United States); UT Southwestern Medical Center, Kidney Cancer Program, Simmons Comprehensive Cancer Center, Dallas, TX (United States); UT Southwestern Medical Center, Department of Pathology, Dallas, Texas (United States); Pedrosa, Ivan [UT Southwestern Medical Center, Department of Radiology, Dallas, TX (United States); UT Southwestern Medical Center, Advanced Imaging Research Center, Dallas, TX (United States); UT Southwestern Medical Center, Kidney Cancer Program, Simmons Comprehensive Cancer Center, Dallas, TX (United States)

    2018-01-15

    To apply a statistical clustering algorithm to combine information from dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) into a single tumour map to distinguish high-grade from low-grade T1b clear cell renal cell carcinoma (ccRCC). This prospective, Institutional Review Board -approved, Health Insurance Portability and Accountability Act -compliant study included 18 patients with solid T1b ccRCC who underwent pre-surgical DCE MRI. After statistical clustering of the parametric maps of the transfer constant between the intravascular and extravascular space (K{sup trans}), rate constant (K{sub ep}) and initial area under the concentration curve (iAUC) with a fuzzy c-means (FCM) algorithm, each tumour was segmented into three regions (low/medium/high active areas). Percentages of each region and tumour size were compared to tumour grade at histopathology. A decision-tree model was constructed to select the best parameter(s) to predict high-grade ccRCC. Seven high-grade and 11 low-grade T1b ccRCCs were included. High-grade histology was associated with higher percent high active areas (p = 0.0154) and this was the only feature selected by the decision tree model, which had a diagnostic performance of 78% accuracy, 86% sensitivity, 73% specificity, 67% positive predictive value and 89% negative predictive value. The FCM integrates multiple DCE-derived parameter maps and identifies tumour regions with unique pharmacokinetic characteristics. Using this approach, a decision tree model using criteria beyond size to predict tumour grade in T1b ccRCCs is proposed. (orig.)

  19. Using T3, an improved decision tree classifier, for mining stroke-related medical data.

    Science.gov (United States)

    Tjortjis, C; Saraee, M; Theodoulidis, B; Keane, J A

    2007-01-01

    Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical management and investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.

  20. Binary Decision Trees for Preoperative Periapical Cyst Screening Using Cone-beam Computed Tomography.

    Science.gov (United States)

    Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa

    2017-03-01

    Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.

  1. Decision-tree approach to evaluating inactive uranium-processing sites for liner requirements

    International Nuclear Information System (INIS)

    Relyea, J.F.

    1983-03-01

    Recently, concern has been expressed about potential toxic effects of both radon emission and release of toxic elements in leachate from inactive uranium mill tailings piles. Remedial action may be required to meet disposal standards set by the states and the US Environmental Protection Agency (EPA). In some cases, a possible disposal option is the exhumation and reburial (either on site or at a new location) of tailings and reliance on engineered barriers to satisfy the objectives established for remedial actions. Liners under disposal pits are the major engineered barrier for preventing contaminant release to ground and surface water. The purpose of this report is to provide a logical sequence of action, in the form of a decision tree, which could be followed to show whether a selected tailings disposal design meets the objectives for subsurface contaminant release without a liner. This information can be used to determine the need and type of liner for sites exhibiting a potential groundwater problem. The decision tree is based on the capability of hydrologic and mass transport models to predict the movement of water and contaminants with time. The types of modeling capabilities and data needed for those models are described, and the steps required to predict water and contaminant movement are discussed. A demonstration of the decision tree procedure is given to aid the reader in evaluating the need for the adequacy of a liner

  2. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  3. Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features.

    Science.gov (United States)

    Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.

  4. The Bayesian Decision Tree Technique with a Sweeping Strategy

    OpenAIRE

    Schetinin, V.; Fieldsend, J. E.; Partridge, D.; Krzanowski, W. J.; Everson, R. M.; Bailey, T. C.; Hernandez, A.

    2005-01-01

    The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information. Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable. The use of the Markov C...

  5. Application of decision-tree technique to assess herd specific risk factors for coliform mastitis in sows

    Directory of Open Access Journals (Sweden)

    Imke Gerjets

    2011-06-01

    Full Text Available The aim of the study was to investigate factors associated with coliform mastitis in sows, determined at herd level, by applying the decision-tree technique. Coliform mastitis represents an economically important disease in sows after farrowing that also affects the health, welfare and performance of the piglets. The decision-tree technique, a data mining method, may be an effective tool for making large datasets accessible and different sow herd information comparable. It is based on the C4.5-algorithm which generates trees in a top-down recursive strategy. The technique can be used to detect weak points in farm management. Two datasets of two farms in Germany, consisting of sow-related parameters, were analysed and compared by decision-tree algorithms. Data were collected over the period of April 2007 to August 2010 from 987 sows (499 CM-positive sows and 488 CM-negative sows and 596 sows (322 CM-positive sows and 274 CM-negative sows, respectively. Depending on the dataset, different graphical trees were built showing relevant factors at the herd level which may lead to coliform mastitis. To our understanding, this is the first time decision-tree modeling was used to assess risk factors for coliform mastitis. Herd specific risk factors for the disease were illustrated what could prove beneficial in disease and herd management.

  6. Evaluation with Decision Trees of Efficacy and Safety of Semirigid Ureteroscopy in the Treatment of Proximal Ureteral Calculi.

    Science.gov (United States)

    Sancak, Eyup Burak; Kılınç, Muhammet Fatih; Yücebaş, Sait Can

    2017-01-01

    The decision on the choice of proximal ureteral stone therapy depends on many factors, and sometimes urologists have difficulty in choosing the treatment option. This study is aimed at evaluating the factors affecting the success of semirigid ureterorenoscopy (URS) using the "decision tree" method. From January 2005 to November 2015, the data of consecutive patients treated for proximal ureteral stone were retrospectively analyzed. A total of 920 patients with proximal ureteral stone treated with semirigid URS were included in the study. All statistically significant attributes were tested using the decision tree method. The model created using decision tree had a sensitivity of 0.993 and an accuracy of 0.857. While URS treatment was successful in 752 patients (81.7%), it was unsuccessful in 168 patients (18.3%). According to the decision tree method, the most important factor affecting the success of URS is whether the stone is impacted to the ureteral wall. The second most important factor affecting treatment was intramural stricture requiring dilatation if the stone is impacted, and the size of the stone if not impacted. Our study suggests that the impacted stone, intramural stricture requiring dilatation and stone size may have a significant effect on the success rate of semirigid URS for proximal ureteral stone. Further studies with population-based and longitudinal design should be conducted to confirm this finding. © 2017 S. Karger AG, Basel.

  7. Multi-test decision tree and its application to microarray data classification.

    Science.gov (United States)

    Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

    2014-05-01

    The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    Science.gov (United States)

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-12-28

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  9. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  10. A decision tree for differentiating multiple system atrophy from Parkinson's disease using 3-T MR imaging.

    Science.gov (United States)

    Nair, Shalini Rajandran; Tan, Li Kuo; Mohd Ramli, Norlisah; Lim, Shen Yang; Rahmat, Kartini; Mohd Nor, Hazman

    2013-06-01

    To develop a decision tree based on standard magnetic resonance imaging (MRI) and diffusion tensor imaging to differentiate multiple system atrophy (MSA) from Parkinson's disease (PD). 3-T brain MRI and DTI (diffusion tensor imaging) were performed on 26 PD and 13 MSA patients. Regions of interest (ROIs) were the putamen, substantia nigra, pons, middle cerebellar peduncles (MCP) and cerebellum. Linear, volumetry and DTI (fractional anisotropy and mean diffusivity) were measured. A three-node decision tree was formulated, with design goals being 100 % specificity at node 1, 100 % sensitivity at node 2 and highest combined sensitivity and specificity at node 3. Nine parameters (mean width, fractional anisotropy (FA) and mean diffusivity (MD) of MCP; anteroposterior diameter of pons; cerebellar FA and volume; pons and mean putamen volume; mean FA substantia nigra compacta-rostral) showed statistically significant (P decision tree. Threshold values were 14.6 mm, 21.8 mm and 0.55, respectively. Overall performance of the decision tree was 92 % sensitivity, 96 % specificity, 92 % PPV and 96 % NPV. Twelve out of 13 MSA patients were accurately classified. Formation of the decision tree using these parameters was both descriptive and predictive in differentiating between MSA and PD. • Parkinson's disease and multiple system atrophy can be distinguished on MR imaging. • Combined conventional MRI and diffusion tensor imaging improves the accuracy of diagnosis. • A decision tree is descriptive and predictive in differentiating between clinical entities. • A decision tree can reliably differentiate Parkinson's disease from multiple system atrophy.

  11. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities.

    Science.gov (United States)

    Moon, Mikyung; Lee, Soo-Kyoung

    2017-01-01

    The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89 * ). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.

  12. Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

    Science.gov (United States)

    Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

    2009-01-01

    The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.

  13. [RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

    Science.gov (United States)

    Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

    2017-10-01

    By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.

  14. Improving Land Use/Land Cover Classification by Integrating Pixel Unmixing and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chao Yang

    2017-11-01

    Full Text Available Decision tree classification is one of the most efficient methods for obtaining land use/land cover (LULC information from remotely sensed imageries. However, traditional decision tree classification methods cannot effectively eliminate the influence of mixed pixels. This study aimed to integrate pixel unmixing and decision tree to improve LULC classification by removing mixed pixel influence. The abundance and minimum noise fraction (MNF results that were obtained from mixed pixel decomposition were added to decision tree multi-features using a three-dimensional (3D Terrain model, which was created using an image fusion digital elevation model (DEM, to select training samples (ROIs, and improve ROI separability. A Landsat-8 OLI image of the Yunlong Reservoir Basin in Kunming was used to test this proposed method. Study results showed that the Kappa coefficient and the overall accuracy of integrated pixel unmixing and decision tree method increased by 0.093% and 10%, respectively, as compared with the original decision tree method. This proposed method could effectively eliminate the influence of mixed pixels and improve the accuracy in complex LULC classifications.

  15. Landslide susceptibility mapping of a landside-prone area from Turkey by decision tree analysis

    Science.gov (United States)

    Gorum, Tolga; Celal Tunusluoglu, M.; Sezer, Ebru; Nefeslioglu, Hakan A.; Bozkir, A. Selman; Gokceoglu, Candan

    2010-05-01

    The landslides are accepted as one of the important natural hazards throughout the world. Besides, the regional landslide susceptibility assessments is one of the first stages of the landslide hazard mitigation efforts. For this purpose, various methods have been applied to produce landslide susceptibility maps for many years. However, application of decision tree to landslide susceptibility mapping, one of data mining methods, is not common. Considering this lack in the landslide literature,an application of decision tree method to landslide susceptibility mapping is the main purpose of the present study. As the study area, the Inegol region (Northwestern Turkey) is selected. In the first stage of the study, a landslide inventory is produced by aerial-photo interpretations and field studies. Employing 16 topographic and lithologic variables, the landslide susceptibility analyses are performed by decision tree method. The AUC (Area Under Curve) values for ROC (Receiver-Operating Characteristics) curves are calculated as 0.942 for the landslide susceptibility model obtained from the decision tree analysis. According to the AUC values, the decision tree analysis presents a considerable performance. As a result of the present study, it may be concluded that the decision tree method presents promising results for the regional landslide susceptibility assessment. However, the technique should be studied for different landslide-prone areas and compared with other prediction techniques such as logistic regression, artificial neural networks, fuzzy approaches, etc.

  16. Development of a New Decision Tree to Rapidly Screen Chemical Estrogenic Activities of Xenopus laevis.

    Science.gov (United States)

    Wang, Ting; Li, Weiying; Zheng, Xiaofeng; Lin, Zhifen; Kong, Deyang

    2014-02-01

    During the last past decades, there is an increasing number of studies about estrogenic activities of the environmental pollutants on amphibians and many determination methods have been proposed. However, these determination methods are time-consuming and expensive, and a rapid and simple method to screen and test the chemicals for estrogenic activities to amphibians is therefore imperative. Herein is proposed a new decision tree formulated not only with physicochemical parameters but also a biological parameter that was successfully used to screen estrogenic activities of the chemicals on amphibians. The biological parameter, CDOCKER interaction energy (Ebinding ) between chemicals and the target proteins was calculated based on the method of molecular docking, and it was used to revise the decision tree formulated by Hong only with physicochemical parameters for screening estrogenic activity of chemicals in rat. According to the correlation between Ebinding of rat and Xenopus laevis, a new decision tree for estrogenic activities in Xenopus laevis is finally proposed. Then it was validated by using the randomly 8 chemicals which can be frequently exposed to Xenopus laevis, and the agreement between the results from the new decision tree and the ones from experiments is generally satisfactory. Consequently, the new decision tree can be used to screen the estrogenic activities of the chemicals, and combinational use of the Ebinding and classical physicochemical parameters can greatly improves Hong's decision tree. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Quasar Identification and Classification with Decision Trees

    Science.gov (United States)

    Spinka, T.; Carpenter, T.; Brunner, R. J.; Aydt, R.; Auvil, L.; Redman, T.; Tcheng, D.

    2003-12-01

    The massive amounts of data flooding into the astronomy field hold many answers to important problems in contemporary astrophysics. The biggest problem is sifting through massive amounts of data to uncover these secrets. In this presentation, we identify an approach in which we apply data-mining techniques to the problem of photometric quasar identification. We employ decision trees to quickly and robustly identify potential quasars to a high degree of accuracy. We emphasize computational scalability due to the high volume of data and complexity of the data-mining algorithms.

  18. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  19. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  20. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  1. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China.

    Science.gov (United States)

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-05-20

    In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  2. Classification of soil respiration in areas of sugarcane renewal using decision tree

    Directory of Open Access Journals (Sweden)

    Camila Viana Vieira Farhate

    Full Text Available ABSTRACT: The use of data mining is a promising alternative to predict soil respiration from correlated variables. Our objective was to build a model using variable selection and decision tree induction to predict different levels of soil respiration, taking into account physical, chemical and microbiological variables of soil as well as precipitation in renewal of sugarcane areas. The original dataset was composed of 19 variables (18 independent variables and one dependent (or response variable. The variable-target refers to soil respiration as the target classification. Due to a large number of variables, a procedure for variable selection was conducted to remove those with low correlation with the variable-target. For that purpose, four approaches of variable selection were evaluated: no variable selection, correlation-based feature selection (CFS, chisquare method (χ2 and Wrapper. To classify soil respiration, we used the decision tree induction technique available in the Weka software package. Our results showed that data mining techniques allow the development of a model for soil respiration classification with accuracy of 81 %, resulting in a knowledge base composed of 27 rules for prediction of soil respiration. In particular, the wrapper method for variable selection identified a subset of only five variables out of 18 available in the original dataset, and they had the following order of influence in determining soil respiration: soil temperature > precipitation > macroporosity > soil moisture > potential acidity.

  3. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  4. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.

    Science.gov (United States)

    Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto

    2012-11-21

    This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

  5. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  6. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    Science.gov (United States)

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.

  7. A framework for sensitivity analysis of decision trees.

    Science.gov (United States)

    Kamiński, Bogumił; Jakubczyk, Michał; Szufel, Przemysław

    2018-01-01

    In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expected-value-maximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/mode-favoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.

  8. Cost-effectiveness analysis of antimuscarinics in the treatment of patients with overactive bladder in Spain: A decision-tree model

    Directory of Open Access Journals (Sweden)

    Trocio Jeffrey

    2011-05-01

    Full Text Available Abstract Background Fesoterodine, a new once daily antimuscarinic, has proven to be an effective, safe, and well-tolerated treatment in patients with overactive bladder (OAB. To date, no analysis has evaluated the economic costs and benefits associated with fesoterodine, compared to antimuscarinics in Spain. The purpose of this analysis was to assess the economic value of OAB treatment with fesoterodine relative to extended release tolterodine and solifenacin, from the societal perspective. Methods The economic model was based on data from two 12-week, randomized, double-blind, and multicenter trials comparing fesoterodine and tolterodine extended released (ER. Treatment response rates for solifenacin were extracted from the published literature. Discontinuation and efficacy were based on the results of a 12-week multinational randomized clinical trial extrapolated to 52 weeks. Changes in health related quality of life were assessed with the King's Health Questionnaire, which was transformed into preference-based utility values. Medical costs included (expressed in € 2010 were antimuscarinics, physician visits, laboratory tests, incontinence pads and the costs of OAB-related comorbidities, fractures, skin infections, urinary tract infections, depression, and nursing home admissions associated with incontinence. Time lost from work was also considered. Univariate sensitivity analyses were also performed. Results At week 12, continents accounted for 50.6%, 40.6% and 47.2% of patients in the fesoterodine, tolterodine, and solifenacin groups, respectively. By week 52, the projected proportions of patients remaining on therapy were 33.1%, 26.5% and 30.8%, respectively. The projected quality- adjusted life years (QALY gain (compared to baseline over the 52-week simulation period were 0.01014, 0.00846 and 0.00957, respectively. The overall treatment cost was estimated at €1,937, €2,089 and €1,960 for fesoterodine, tolterodine and solifenacin

  9. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  10. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  11. Decision analysis using decision trees for a simple clinical decision.

    Science.gov (United States)

    Blakley, Brian

    2012-10-01

    To illustrate the use of decision trees with a utility index in clinical decision making. A decision tree was created related to whether or not to perform a tonsillectomy. Data from the literature were applied to a common hypothetical clinical scenario. A decision tree graphically represents the typical decision-making process that many clinicians use. The addition of utility functions permitted consideration of the adverse or beneficial effects of outcomes, altering the treatment decision. Quantitative tools such as decision trees may quantify outcome preferences and aid in clinical decision making, but the proper tool and background data are essential.

  12. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process.

    Science.gov (United States)

    Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  13. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

    Science.gov (United States)

    Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657

  14. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  15. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    Science.gov (United States)

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  16. Empirically Derived Dehydration Scoring and Decision Tree Models for Children With Diarrhea: Assessment and Internal Validation in a Prospective Cohort Study in Dhaka, Bangladesh.

    Science.gov (United States)

    Levine, Adam C; Glavis-Bloom, Justin; Modi, Payal; Nasrin, Sabiha; Rege, Soham; Chu, Chieh; Schmid, Christopher H; Alam, Nur H

    2015-08-18

    Diarrhea remains one of the most common and most deadly conditions affecting children worldwide. Accurately assessing dehydration status is critical to determining treatment course, yet no clinical diagnostic models for dehydration have been empirically derived and validated for use in resource-limited settings. In the Dehydration: Assessing Kids Accurately (DHAKA) prospective cohort study, a random sample of children under 5 with acute diarrhea was enrolled between February and June 2014 in Bangladesh. Local nurses assessed children for clinical signs of dehydration on arrival, and then serial weights were obtained as subjects were rehydrated. For each child, the percent weight change with rehydration was used to classify subjects with severe dehydration (>9% weight change), some dehydration (3-9%), or no dehydration (Dehydration Score and DHAKA Dehydration Tree, respectively. Models were assessed for their accuracy using the area under their receiver operating characteristic curve (AUC) and for their reliability through repeat clinical exams. Bootstrapping was used to internally validate the models. A total of 850 children were enrolled, with 771 included in the final analysis. Of the 771 children included in the analysis, 11% were classified with severe dehydration, 45% with some dehydration, and 44% with no dehydration. Both the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant AUCs of 0.79 (95% CI = 0.74, 0.84) and 0.76 (95% CI = 0.71, 0.80), respectively, for the diagnosis of severe dehydration. Additionally, the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant positive likelihood ratios of 2.0 (95% CI = 1.8, 2.3) and 2.5 (95% CI = 2.1, 2.8), respectively, and significant negative likelihood ratios of 0.23 (95% CI = 0.13, 0.40) and 0.28 (95% CI = 0.18, 0.44), respectively, for the diagnosis of severe dehydration. Both models demonstrated 90% agreement between independent raters and good

  17. The Studies of Decision Tree in Estimation of Breast Cancer Risk by Using Polymorphism Nucleotide

    Directory of Open Access Journals (Sweden)

    Frida Seyedmir

    2017-07-01

    Full Text Available Abstract Introduction:   Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousands genetic markers were identified as factors associated with breast cancer. The objective of this study is to evaluate the training data on decision tree predictor error of the risk of breast cancer by using single nucleotide polymorphism genotype. Methods: The risk of breast cancer were calculated associated with the use of SNP formula:xj = fo * In human,  The decision tree can be used To predict the probability of disease using single nucleotide polymorphisms .Seven SNP with different odds ratio associated with breast cancer considered and coding and design of decision tree model, C4.5, by  Csharp2013 programming language were done. In the decision tree created with the coding, the four important associated SNP was considered. The decision tree error in two case of coding and using WEKA were assessment and percentage of decision tree accuracy in prediction of breast cancer were calculated. The number of trained samples was obtained with systematic sampling. With coding, two scenarios as well as software WEKA, three scenarios with different sets of data and the number of different learning and testing, were evaluated. Results: In both scenarios of coding, by increasing the training percentage from 66/66 to 86/42, the error reduced from 55/56 to 9/09. Also by running of WEKA on three scenarios with different sets of data, the number of different education, and different tests by increasing records number from 81 to 2187, the error rate decreased from 48/15 to 13

  18. Using decision trees to understand structure in missing data.

    Science.gov (United States)

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-06-29

    Demonstrate the application of decision trees--classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)--to understand structure in missing data. Data taken from employees at 3 different industrial sites in Australia. 7915 observations were included. The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the 'rpart' and 'gbm' packages for CART and BRT analyses, respectively, from the statistical software 'R'. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Researchers are encouraged to use CART and BRT models to explore and understand missing data. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  19. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  20. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  1. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 ...

  2. The Decision Tree: A Tool for Achieving Behavioral Change.

    Science.gov (United States)

    Saren, Dru

    1999-01-01

    Presents a "Decision Tree" process for structuring team decision making and problem solving about specific student behavioral goals. The Decision Tree involves a sequence of questions/decisions that can be answered in "yes/no" terms. Questions address reasonableness of the goal, time factors, importance of the goal, responsibilities, safety,…

  3. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  4. Comparison of Taxi Time Prediction Performance Using Different Taxi Speed Decision Trees

    Science.gov (United States)

    Lee, Hanbong

    2017-01-01

    In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.

  5. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  6. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  7. IND - THE IND DECISION TREE PACKAGE

    Science.gov (United States)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  8. An Efficient Method of Vibration Diagnostics For Rotating Machinery Using a Decision Tree

    Directory of Open Access Journals (Sweden)

    Bo Suk Yang

    2000-01-01

    Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.

  9. Boundary expansion algorithm of a decision tree induction for an imbalanced dataset

    Directory of Open Access Journals (Sweden)

    Kesinee Boonchuay

    2017-10-01

    Full Text Available A decision tree is one of the famous classifiers based on a recursive partitioning algorithm. This paper introduces the Boundary Expansion Algorithm (BEA to improve a decision tree induction that deals with an imbalanced dataset. BEA utilizes all attributes to define non-splittable ranges. The computed means of all attributes for minority instances are used to find the nearest minority instance, which will be expanded along all attributes to cover a minority region. As a result, BEA can successfully cope with an imbalanced dataset comparing with C4.5, Gini, asymmetric entropy, top-down tree, and Hellinger distance decision tree on 25 imbalanced datasets from the UCI Repository.

  10. Using decision trees and their ensembles for analysis of NIR spectroscopic data

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey V.

    Advanced machine learning methods, like convolutional neural networks and decision trees, became extremely popular in the last decade. This, first of all, is directly related to the current boom in Big data analysis, where traditional statistical methods are not efficient. According to the kaggle.......com — the most popular online resource for Big data problems and solutions — methods based on decision trees and their ensembles are most widely used for solving the problems. It can be noted that the decision trees and convolutional neural networks are not very popular in Chemometrics. One of the reasons...... for that is the landscape of the data matrix: the modern machine learning methods need number of measurements much larger than the number of variables to avoid overfitting, which is opposite to the layout of the data we usually deal with. Another drawback is a lack of interactive instruments for exploring...

  11. Decision tree modeling with relational views

    OpenAIRE

    Bentayeb, Fadila; Darmont, Jérôme

    2007-01-01

    International audience; Data mining is a useful decision support technique that can be used to discover production rules in warehouses or corporate data. Data mining research has made much effort to apply various mining algorithms efficiently on large databases. However, a serious problem in their practical application is the long processing time of such algorithms. Nowadays, one of the key challenges is to integrate data mining methods within the framework of traditional database systems. In...

  12. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  13. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    Directory of Open Access Journals (Sweden)

    Tran Hoai Linh

    2014-09-01

    Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution

  14. Ship Engine Room Casualty Analysis by Using Decision Tree Method

    Directory of Open Access Journals (Sweden)

    Ömür Yaşar SAATÇİOĞLU

    2017-03-01

    Full Text Available Ships may encounter undesirable conditions during operations. In consequence of a casualty, fire, explosion, flooding, grounding, injury even death may occur. Besides, these results can be avoidable with precautions and preventive operating processes. In maritime transportation, casualties depend on various factors. These were listed as misuse of the engine equipment and tools, defective machinery or equipment, inadequacy of operational procedure and measure of safety and force majeure effects. Casualty reports which were published in Australia, New Zealand, United Kingdom, Canada and United States until 2015 were examined and the probable causes and consequences of casualties were determined with their occurrence percentages. In this study, 89 marine investigation reports regarding engine room casualties were analyzed. Casualty factors were analyzed with their frequency percentages and also their main causes were constructed. This study aims to investigate engine room based casualties, frequency of each casualty type and main causes by using decision tree method.

  15. Medical case retrieval from a committee of decision trees.

    Science.gov (United States)

    Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice

    2010-09-01

    A novel content-based information retrieval framework, designed to cover several medical applications, is presented in this paper. The presented framework allows the retrieval of possibly incomplete medical cases consisting of several images together with semantic information. It relies on a committee of decision trees, decision support tools well suited to process this type of information. In our proposed framework, images are characterized by their digital content. It was applied to two heterogeneous medical datasets for computer-aided diagnoses: a diabetic retinopathy follow-up dataset (DRD) and a mammography-screening dataset (DDSM). Measure of precision among the top five retrieved results of 0.788 + or - 0.137 and 0.869 + or - 0.161 was obtained on DRD and DDSM, respectively. On DRD, for instance, it increases by half the retrieval of single images.

  16. Toward the Decision Tree for Inferring Requirements Maturation Types

    Science.gov (United States)

    Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi

    Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.

  17. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  18. Decision-tree induction to detect clinical mastitis with automatic milking

    NARCIS (Netherlands)

    Kamphuis, C.; Mollenhorst, H.; Feelders, A.; Pietersma, D.; Hogeveen, H.

    2010-01-01

    a b s t r a c t This study explored the potential of using decision-tree induction to develop models for the detection of clinical mastitis with automatic milking. Sensor data (including electrical conductivity and colour) of over 711,000 quarter milkings were collected from December 2006 till

  19. A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence

    Science.gov (United States)

    Rey S. Ofren; Edward Harvey

    2000-01-01

    A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...

  20. Assessing School Readiness for a Practice Arrangement Using Decision Tree Methodology.

    Science.gov (United States)

    Barger, Sara E.

    1998-01-01

    Questions in a decision-tree address mission, faculty interest, administrative support, and practice plan as a way of assessing arrangements for nursing faculty's clinical practice. Decisions should be based on congruence between the human resource allocation and the reward systems. (SK)

  1. A decision tree approach using silvics to guide planning for forest restoration

    Science.gov (United States)

    Sharon M. Hermann; John S. Kush; John C. Gilbert

    2013-01-01

    We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....

  2. Which Types of Leadership Styles Do Followers Prefer? A Decision Tree Approach

    Science.gov (United States)

    Salehzadeh, Reza

    2017-01-01

    Purpose: The purpose of this paper is to propose a new method to find the appropriate leadership styles based on the followers' preferences using the decision tree technique. Design/methodology/approach: Statistical population includes the students of the University of Isfahan. In total, 750 questionnaires were distributed; out of which, 680…

  3. The application of a decision tree to establish the parameters associated with hypertension.

    Science.gov (United States)

    Tayefi, Maryam; Esmaeili, Habibollah; Saberi Karimian, Maryam; Amirabadi Zadeh, Alireza; Ebrahimi, Mahmoud; Safarian, Mohammad; Nematy, Mohsen; Parizadeh, Seyed Mohammad Reza; Ferns, Gordon A; Ghayour-Mobarhan, Majid

    2017-02-01

    Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    Science.gov (United States)

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  5. Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

    Science.gov (United States)

    Shiffman, Smadar

    2004-01-01

    Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.

  6. Automated Sleep Stage Scoring by Decision Tree Learning

    National Research Council Canada - National Science Library

    Hanaoka, Masaaki

    2001-01-01

    In this paper we describe a waveform recognition method that extracts characteristic parameters from wave- forms and a method of automated sleep stage scoring using decision tree learning that is in...

  7. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  8. Decision tree approach for classification of remotely sensed satellite ...

    Indian Academy of Sciences (India)

    DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using. WEKA, open source ...

  9. Decision tree approach for classification of remotely sensed satellite

    Indian Academy of Sciences (India)

    DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source ...

  10. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  11. On the relationship between the prices of oil and the precious metals: Revisiting with a multivariate regime-switching decision tree

    International Nuclear Information System (INIS)

    Charlot, Philippe; Marimoutou, Vêlayoudom

    2014-01-01

    This study examines the volatility and correlation and their relationships among the euro/US dollar exchange rates, the S and P500 equity indices, and the prices of WTI crude oil and the precious metals (gold, silver, and platinum) over the period 2005 to 2012. Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. The ensuing Hidden Markov Decision Tree (HMDT) model is in fact an extension of the Hidden Markov Model (HMM) introduced by Jordan et al. (1997). The architecture of this model is the opposite that of the classical deterministic approach based on a binary decision tree and, it allows a probabilistic vision of the relationship between univariate volatility and correlation. Our results are categorized into three groups, namely (1) exchange rates and oil, (2) S and P500 indices, and (3) precious metals. A switching dynamics is seen to characterize the volatilities, while, in the case of the correlations, the series switch from one regime to another, this movement touching a peak during the period of the Subprime crisis in the US, and again during the days following the Tohoku earthquake in Japan. Our findings show that the relationships between volatility and correlation are dependent upon the nature of the series considered, sometimes corresponding to those found in econometric studies, according to which correlation increases in bear markets, at other times differing from them. - Highlights: • This study examines the volatility and correlation and their relationships of precious metals and crude oil. • Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. • This model allows a probabilistic point of view of the relationship between univariate volatility and correlation. • Results show the relationships between volatility and correlation are dependent upon the nature of the series considered

  12. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  13. Use of CHAID Decision Trees to Formulate Pathways for the Early Detection of Metabolic Syndrome in Young Adults

    Directory of Open Access Journals (Sweden)

    Brian Miller

    2014-01-01

    Full Text Available Metabolic syndrome (MetS in young adults (age 20–39 is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES 2009-2010 Cohort as a representative sample of the United States population (n=745. Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.

  14. Use of CHAID decision trees to formulate pathways for the early detection of metabolic syndrome in young adults.

    Science.gov (United States)

    Miller, Brian; Fridline, Mark; Liu, Pei-Yang; Marino, Deborah

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20-39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.

  15. Predicting the probability of mortality of gastric cancer patients using decision tree.

    Science.gov (United States)

    Mohammadzadeh, F; Noorkojuri, H; Pourhoseingholi, M A; Saadat, S; Baghestani, A R

    2015-06-01

    Gastric cancer is the fourth most common cancer worldwide. This reason motivated us to investigate and introduce gastric cancer risk factors utilizing statistical methods. The aim of this study was to identify the most important factors influencing the mortality of patients who suffer from gastric cancer disease and to introduce a classification approach according to decision tree model for predicting the probability of mortality from this disease. Data on 216 patients with gastric cancer, who were registered in Taleghani hospital in Tehran,Iran, were analyzed. At first, patients were divided into two groups: the dead and alive. Then, to fit decision tree model to our data, we randomly selected 20% of dataset to the test sample and remaining dataset considered as the training sample. Finally, the validity of the model examined with sensitivity, specificity, diagnosis accuracy and the area under the receiver operating characteristic curve. The CART version 6.0 and SPSS version 19.0 softwares were used for the analysis of the data. Diabetes, ethnicity, tobacco, tumor size, surgery, pathologic stage, age at diagnosis, exposure to chemical weapons and alcohol consumption were determined as effective factors on mortality of gastric cancer. The sensitivity, specificity and accuracy of decision tree were 0.72, 0.75 and 0.74 respectively. The indices of sensitivity, specificity and accuracy represented that the decision tree model has acceptable accuracy to prediction the probability of mortality in gastric cancer patients. So a simple decision tree consisted of factors affecting on mortality of gastric cancer may help clinicians as a reliable and practical tool to predict the probability of mortality in these patients.

  16. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Wong G William

    2008-06-01

    Full Text Available Abstract Background Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. Results This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost. We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. Conclusion In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection

  17. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  18. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    Science.gov (United States)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  19. Intracranial hypertension prediction using extremely randomized decision trees.

    Science.gov (United States)

    Scalzo, Fabien; Hamilton, Robert; Asgari, Shadnaz; Kim, Sunghan; Hu, Xiao

    2012-10-01

    Intracranial pressure (ICP) elevation (intracranial hypertension, IH) in neurocritical care is typically treated in a reactive fashion; it is only delivered after bedside clinicians notice prolonged ICP elevation. A proactive solution is desirable to improve the treatment of intracranial hypertension. Several studies have shown that the waveform morphology of the intracranial pressure pulse holds predictors about future intracranial hypertension and could therefore be used to alert the bedside clinician of a likely occurrence of the elevation in the immediate future. In this paper, a computational framework is proposed to predict prolonged intracranial hypertension based on morphological waveform features computed from the ICP. A key contribution of this work is to exploit an ensemble classifier method based on extremely randomized decision trees (Extra-Trees). Experiments on a representative set of 30 patients admitted for various intracranial pressure related conditions demonstrate the effectiveness of the predicting framework on ICP pulses acquired under clinical conditions and the superior results of the proposed approach in comparison to linear and AdaBoost classifiers. Copyright © 2011 IPEM. Published by Elsevier Ltd. All rights reserved.

  20. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  1. Defender-Attacker Decision Tree Analysis to Combat Terrorism.

    Science.gov (United States)

    Garcia, Ryan J B; von Winterfeldt, Detlof

    2016-12-01

    We propose a methodology, called defender-attacker decision tree analysis, to evaluate defensive actions against terrorist attacks in a dynamic and hostile environment. Like most game-theoretic formulations of this problem, we assume that the defenders act rationally by maximizing their expected utility or minimizing their expected costs. However, we do not assume that attackers maximize their expected utilities. Instead, we encode the defender's limited knowledge about the attacker's motivations and capabilities as a conditional probability distribution over the attacker's decisions. We apply this methodology to the problem of defending against possible terrorist attacks on commercial airplanes, using one of three weapons: infrared-guided MANPADS (man-portable air defense systems), laser-guided MANPADS, or visually targeted RPGs (rocket propelled grenades). We also evaluate three countermeasures against these weapons: DIRCMs (directional infrared countermeasures), perimeter control around the airport, and hardening airplanes. The model includes deterrence effects, the effectiveness of the countermeasures, and the substitution of weapons and targets once a specific countermeasure is selected. It also includes a second stage of defensive decisions after an attack occurs. Key findings are: (1) due to the high cost of the countermeasures, not implementing countermeasures is the preferred defensive alternative for a large range of parameters; (2) if the probability of an attack and the associated consequences are large, a combination of DIRCMs and ground perimeter control are preferred over any single countermeasure. © 2016 Society for Risk Analysis.

  2. Multivariate analysis of flow cytometric data using decision trees.

    Science.gov (United States)

    Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver

    2012-01-01

    Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.

  3. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    Science.gov (United States)

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  4. USING PRECEDENTS FOR REDUCTION OF DECISION TREE BY GRAPH SEARCH

    Directory of Open Access Journals (Sweden)

    I. A. Bessmertny

    2015-01-01

    Full Text Available The paper considers the problem of mutual payment organization between business entities by means of clearing that is solved by search of graph paths. To reduce the decision tree complexity a method of precedents is proposed that consists in saving the intermediate solution during the moving along decision tree. An algorithm and example are presented demonstrating solution complexity coming close to a linear one. The tests carried out in civil aviation settlement system demonstrate approximately 30 percent shortage of real money transfer. The proposed algorithm is planned to be implemented also in other clearing organizations of the Russian Federation.

  5. Predicting metabolic syndrome using decision tree and support vector machine methods.

    Science.gov (United States)

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-05-01

    Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According

  6. Bringing Science and Pragmatism together - a Tiered Approach for Modelling Toxicological Impacts in LCA

    DEFF Research Database (Denmark)

    Guinée, J; De Koning, A; Pennington, David W.

    2004-01-01

    for as broad a range of chemicals as possible: 1) A base model representing a state-of-the-art multimedia model and 2) a simple model derived from the base model using statistical tools. Discussion. A preliminary decision tree for using the OMNIITOX information system (IS) is presented. The decision tree aims...

  7. External validation of a decision tree early warning score using only laboratory data

    DEFF Research Database (Denmark)

    Holm Atkins, Tara E; Öhman, Malin C; Brabrand, Mikkel

    2018-01-01

    INTRODUCTION: Early warning scores (EWS) have been developed to identify the degree of illness severity among acutely ill patients. One system, The Laboratory Decision Tree Early Warning Score (LDT-EWS) is wholly laboratory data based. Laboratory data was used in the development of a rare...... computerized method, developing a decision tree analysis. This article externally validates LDT-EWS, which is obligatory for an EWS before clinical use. METHOD: We conducted a retrospective review of prospectively collected data based on a time limited sample of all patients admitted through the medical...... with a goodness-of-fit test of X2=5.37 (7 degrees of freedom), p=0.62. CONCLUSION: LDT-EWS has acceptable ability to identify patients at high risk of dying during hospitalization with good precision. Further studies performing impact analysis are required before this score should be implemented in clinical...

  8. Exploratory Use of Decision Tree Analysis in Classification of Outcome in Hypoxic–Ischemic Brain Injury

    Directory of Open Access Journals (Sweden)

    Thanh G. Phan

    2018-03-01

    Full Text Available BackgroundPrognostication following hypoxic ischemic encephalopathy (brain injury is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data.MethodThe inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003–2011. Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS, features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume associates of severe disability and death. We used the area under the ROC (auROC to determine accuracy of model. There were 41 (63.7% males patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82–1.00. At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86–1.00.ConclusionOur findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.

  9. Efficient, reliable and fast high-level triggering using a bonsai boosted decision tree

    CERN Document Server

    Gligorov, V.V.

    2013-01-01

    High-level triggering is a vital component in many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called "bonsai" BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.

  10. Efficient, reliable and fast high-level triggering using a bonsai boosted decision tree

    Science.gov (United States)

    Gligorov, V. V.; Williams, M.

    2013-02-01

    High-level triggering is a vital component of many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called bonsai BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.

  11. Efficient, reliable and fast high-level triggering using a bonsai boosted decision tree

    International Nuclear Information System (INIS)

    Gligorov, V V; Williams, M

    2013-01-01

    High-level triggering is a vital component of many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called bonsai BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.

  12. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

    Science.gov (United States)

    Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

    2012-05-30

    The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  14. Alternative measures of risk of extreme events in decision trees

    International Nuclear Information System (INIS)

    Frohwein, H.I.; Lambert, J.H.; Haimes, Y.Y.

    1999-01-01

    A need for a methodology to control the extreme events, defined as low-probability, high-consequence incidents, in sequential decisions is identified. A variety of alternative and complementary measures of the risk of extreme events are examined for their usability as objective functions in sequential decisions, represented as single- or multiple-objective decision trees. Earlier work had addressed difficulties, related to non-separability, with the minimization of some measures of the risk of extreme events in sequential decisions. In an extension of these results, it is shown how some non-separable measures of the risk of extreme events can be interpreted in terms of separable constituents of risk, thereby enabling a wider class of measures of the risk of extreme events to be handled in a straightforward manner in a decision tree. Also for extreme events, results are given to enable minimax- and Hurwicz-criterion analyses in decision trees. An example demonstrates the incorporation of different measures of the risk of extreme events in a multi-objective decision tree. Conceptual formulations for optimizing non-separable measures of the risk of extreme events are identified as an important area for future investigation

  15. The Decision Tree for Teaching Management of Uncertainty

    Science.gov (United States)

    Knaggs, Sara J.; And Others

    1974-01-01

    A 'decision tree' consists of an outline of the patient's symptoms and a logic for decision and action. It is felt that this approach to the decisionmaking process better facilitates each learner's application of his own level of knowledge and skills. (Author)

  16. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    de Hoogh, Sebastiaan; Schoenmakers, Berry; Chen, Ping; op den Akker, Harm

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our

  17. Decision tree approach for classification of remotely sensed satellite ...

    Indian Academy of Sciences (India)

    The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result ...

  18. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  19. Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

    Directory of Open Access Journals (Sweden)

    Chuan Ding

    2016-10-01

    Full Text Available Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.

  20. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  1. Prognostic Factors and Decision Tree for Long-term Survival in Metastatic Uveal Melanoma.

    Science.gov (United States)

    Lorenzo, Daniel; Ochoa, María; Piulats, Josep Maria; Gutiérrez, Cristina; Arias, Luis; Català, Jaum; Grau, María; Peñafiel, Judith; Cobos, Estefanía; Garcia-Bru, Pere; Rubio, Marcos Javier; Padrón-Pérez, Noel; Dias, Bruno; Pera, Joan; Caminal, Josep Maria

    2017-12-04

    The purpose of this study was to demonstrate the existence of a bimodal survival pattern in metastatic uveal melanoma. Secondary aims were to identify the characteristics and prognostic factors associated with long-term survival and to develop a clinical decision tree. The medical records of 99 metastatic uveal melanoma patients were retrospectively reviewed. Patients were classified as either short (≤ 12 months) or long-term survivors (> 12 months) based on a graphical interpretation of the survival curve after diagnosis of the first metastatic lesion. Ophthalmic and oncological characteristics were assessed in both groups. Of the 99 patients, 62 (62.6%) were classified as short-term survivors, and 37 (37.4%) as long-term survivors. The multivariate analysis identified the following predictors of long-term survival: age ≤ 65 years (p=0.012) and unaltered serum lactate dehydrogenase levels (p=0.018); additionally, the size (smaller vs. larger) of the largest liver metastasis showed a trend towards significance (p=0.063). Based on the variables significantly associated with long-term survival, we developed a decision tree to facilitate clinical decision-making. The findings of this study demonstrate the existence of a bimodal survival pattern in patients with metastatic uveal melanoma. The presence of certain clinical characteristics at diagnosis of distant disease is associated with long-term survival. A decision tree was developed to facilitate clinical decision-making and to counsel patients about the expected course of disease.

  2. Development and acceptability testing of decision trees for self-management of prosthetic socket fit in adults with lower limb amputation.

    Science.gov (United States)

    Lee, Daniel Joseph; Veneri, Diana A

    2018-05-01

    The most common complaint lower limb prosthesis users report is inadequacy of a proper socket fit. Adjustments to the residual limb-socket interface can be made by the prosthesis user without consultation of a clinician in many scenarios through skilled self-management. Decision trees guide prosthesis wearers through the self-management process, empowering them to rectify fit issues, or referring them to a clinician when necessary. This study examines the development and acceptability testing of patient-centered decision trees for lower limb prosthesis users. Decision trees underwent a four-stage process: literature review and expert consultation, designing, two-rounds of expert panel review and revisions, and target audience testing. Fifteen lower limb prosthesis users (average age 61 years) reviewed the decision trees and completed an acceptability questionnaire. Participants reported agreement of 80% or above in five of the eight questions related to acceptability of the decision trees. Disagreement was related to the level of experience of the respondent. Decision trees were found to be easy to use, illustrate correct solutions to common issues, and have terminology consistent with that of a new prosthesis user. Some users with greater than 1.5 years of experience would not use the decision trees based on their own self-management skills. Implications for Rehabilitation Discomfort of the residual limb-prosthetic socket interface is the most common reason for clinician visits. Prosthesis users can use decision trees to guide them through the process of obtaining a proper socket fit independently. Newer users may benefit from using the decision trees more than experienced users.

  3. Bayesian Decision Trees for predicting survival of patients: a study on the US National Trauma Data Bank.

    Science.gov (United States)

    Schetinin, Vitaly; Jakaite, Livia; Jakaitis, Janis; Krzanowski, Wojtek

    2013-09-01

    Trauma and Injury Severity Score (TRISS) models have been developed for predicting the survival probability of injured patients the majority of which obtain up to three injuries in six body regions. Practitioners have noted that the accuracy of TRISS predictions is unacceptable for patients with a larger number of injuries. Moreover, the TRISS method is incapable of providing accurate estimates of predictive density of survival, that are required for calculating confidence intervals. In this paper we propose Bayesian inference for estimating the desired predictive density. The inference is based on decision tree models which split data along explanatory variables, that makes these models interpretable. The proposed method has outperformed the TRISS method in terms of accuracy of prediction on the cases recorded in the US National Trauma Data Bank. The developed method has been made available for evaluation purposes as a stand-alone application. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    Science.gov (United States)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  5. Using decision tree to predict serum ferritin level in women with anemia

    Directory of Open Access Journals (Sweden)

    Parisa Safaee

    2016-04-01

    Full Text Available Background: Data mining is known as a process of discovering and analysing large amounts of data in order to find meaningful rules and trends. In healthcare, data mining offers numerous opportunities to study the unknown patterns in a data set. These patterns can be used to diagnosis, prognosis and treatment of patients by physicians. The main objective of this study was to predict the level of serum ferritin in women with anemia and to specify the basic predictive factors of iron deficiency anemia using data mining techniques. Methods: In this research 690 patients and 22 variables have been studied in women population with anemia. These data include 11 laboratories and 11 clinical variables of patients related to the patients who have referred to the laboratory of Imam Hossein and Shohada-E- Haft Tir hospitals from April 2013 to April 2014. Decision tree technique has been used to build the model. Results: The accuracy of the decision tree with all the variables is 75%. Different combinations of variables were examined in order to determine the best model to predict. Regarding the optimum obtained model of the decision tree, the RBC, MCH, MCHC, gastrointestinal cancer and gastrointestinal ulcer were identified as the most important predictive factors. The results indicate if the values of MCV, MCHC and MCH variables are normal and the value of RBC variable is lower than normal limitation, it is diagnosed that the patient is likely 90% iron deficiency anemia. Conclusion: Regarding the simplicity and the low cost of the complete blood count examination, the model of decision tree was taken into consideration to diagnose iron deficiency anemia in patients. Also the impact of new factors such as gastrointestinal hemorrhoids, gastrointestinal surgeries, different gastrointestinal diseases and gastrointestinal ulcers are considered in this paper while the previous studies have been limited only to assess laboratory variables. The rules of the

  6. Shopping intention prediction using decision trees

    OpenAIRE

    Šebalj, Dario; Franjković, Jelena; Hodak, Kristina

    2017-01-01

    Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer.Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they inten...

  7. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  8. Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

    Science.gov (United States)

    Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

    2018-05-12

    Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.

  9. DTreeSim: A new approach to compute decision tree similarity using re-mining

    OpenAIRE

    BAKIRLI, GÖZDE; BİRANT, DERYA

    2017-01-01

    A number of recent studies have used a decision tree approach as a data mining technique; some of them needed to evaluate the similarity of decision trees to compare the knowledge reflected in different trees or datasets. There have been multiple perspectives and multiple calculation techniques to measure the similarity of two decision trees, such as using a simple formula or an entropy measure. The main objective of this study is to compute the similarity of decision trees using ...

  10. The Representation of Discrete Functions by Decision Trees.

    Science.gov (United States)

    1982-02-28

    complexity theory, is then reviewed. The various findings are regrouped in a short summary of the "state-of-the- art " knowledge about decision trees. 3.2...tables, and tables incorporating calls to subtables in place of accions (each of which is beyond the reach of published analyses). The extension to...I, 135-143. Knuth, D. E. (1973). The Art of Computer Programming. Volume 1: Fundamental Alzorithms. Addison-Wesley, Reading, Mass. (2nd ed.). 122

  11. The Utility of Decision Trees in Oncofertility Care in Japan.

    Science.gov (United States)

    Ito, Yuki; Shiraishi, Eriko; Kato, Atsuko; Haino, Takayuki; Sugimoto, Kouhei; Okamoto, Aikou; Suzuki, Nao

    2017-03-01

    To identify the utility and issues associated with the use of decision trees in oncofertility patient care in Japan. A total of 35 women who had been diagnosed with cancer, but had not begun anticancer treatment, were enrolled. We applied the oncofertility decision tree for women published by Gardino et al. to counsel a consecutive series of women on fertility preservation (FP) options following cancer diagnosis. Percentage of women who decided to undergo oocyte retrieval for embryo cryopreservation and the expected live-birth rate for these patients were calculated using the following equation: expected live-birth rate = pregnancy rate at each age per embryo transfer × (1 - miscarriage rate) × No. of cryopreserved embryos. Oocyte retrieval was performed for 17 patients (48.6%; mean ± standard deviation [SD] age, 36.35 ± 3.82 years). The mean ± SD number of cryopreserved embryos was 5.29 ± 4.63. The expected live-birth rate was 0.66. The expected live-birth rate with FP indicated that one in three oncofertility patients would not expect to have a live birth following oocyte retrieval and embryo cryopreservation. While the decision trees were useful as decision-making tools for women contemplating FP, in the context of the current restrictions on oocyte donation and the extremely small number of adoptions in Japan, the remaining options for fertility after cancer are limited. In order for cancer survivors to feel secure in their decisions, the decision tree may need to be adapted simultaneously with improvements to the social environment, such as greater support for adoption.

  12. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    Science.gov (United States)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  13. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  14. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

    Directory of Open Access Journals (Sweden)

    SAID BAHASSINE

    2017-06-01

    Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.

  15. Using boosted decision trees for tau identification in the ATLAS experiment

    CERN Document Server

    Godfrey, Jennifer

    The ATLAS detector will begin taking data from p - p collisions in 2009. This experiment will allo w for man y dif ferent physics measurements and searches. The production of tau leptons at the LHC is a key signature of the decay of both the standard model Higgs (via H ! t t ) and SUSY particles. Taus have a short lifetime ( c t = 87 m m) and decay hadroni- cally 65% of the time. Man y QCD interactions produce similar hadronic sho wers and have cross-sections about 1 billion times lar ger than tau production. Multi variate techniques are therefore often used to distinguish taus from this background. Boosted Decision Trees (BDTs) are a machine-learning technique for developing cut-based discriminants which can signicantly aid in extracting small signal samples from overwhelming backgrounds. In this study , BDTs are used for tau identication for the ATLAS experiment. The y are a fast, exible alternati ve to existing discriminants with comparable or better performance.

  16. Discovering Decision Knowledge from Web Log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology.

    Science.gov (United States)

    Chen, Gwo-Dong; Liu, Chen-Chung; Ou, Kuo-Liang; Liu, Baw-Jhiune

    2000-01-01

    Discusses the use of Web logs to record student behavior that can assist teachers in assessing performance and making curriculum decisions for distance learning students who are using Web-based learning systems. Adopts decision tree and data cube information processing methodologies for developing more effective pedagogical strategies. (LRW)

  17. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  18. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    Science.gov (United States)

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2016-08-31

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  19. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    Science.gov (United States)

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2018-01-01

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  20. Decision Tree and Texture Analysis for Mapping Debris-Covered Glaciers in the Kangchenjunga Area, Eastern Himalaya

    Directory of Open Access Journals (Sweden)

    Adina Racoviteanu

    2012-10-01

    Full Text Available In this study we use visible, short-wave infrared and thermal Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER data validated with high-resolution Quickbird (QB and Worldview2 (WV2 for mapping debris cover in the eastern Himalaya using two independent approaches: (a a decision tree algorithm, and (b texture analysis. The decision tree algorithm was based on multi-spectral and topographic variables, such as band ratios, surface reflectance, kinetic temperature from ASTER bands 10 and 12, slope angle, and elevation. The decision tree algorithm resulted in 64 km2 classified as debris-covered ice, which represents 11% of the glacierized area. Overall, for ten glacier tongues in the Kangchenjunga area, there was an area difference of 16.2 km2 (25% between the ASTER and the QB areas, with mapping errors mainly due to clouds and shadows. Texture analysis techniques included co-occurrence measures, geostatistics and filtering in spatial/frequency domain. Debris cover had the highest variance of all terrain classes, highest entropy and lowest homogeneity compared to the other classes, for example a mean variance of 15.27 compared to 0 for clouds and 0.06 for clean ice. Results of the texture image for debris-covered areas were comparable with those from the decision tree algorithm, with 8% area difference between the two techniques.

  1. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software “R”; the generation of the dense and accurate digital surface model by the “Match-T DSM” program of the Trimble Company. A practical...... like buildings, roads, grassland, trees, hedges, and walls from such an ‘intelligent’ point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using...

  2. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    Science.gov (United States)

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions.

  3. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining.

    Science.gov (United States)

    Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh

    2015-03-18

    The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as "Inputs" were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as "Class". We applied the "Decision Tree" technique and "J48" algorithm in the WEKA (3.6.10 version) software to develop the model. After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis.

  4. Socioeconomic determinants of menarche in rural Polish girls using the decision trees method.

    Science.gov (United States)

    Matusik, Stanisław; Laska-Mierzejewska, Teresa; Chrzanowska, Maria

    2011-05-01

    The aim of this study was to assess the usefulness of the decision trees method as a research method of multidimensional associations between menarche and socioeconomic variables. The article is based on data collected from the rural area of Choszczno in the West Pomerania district of Poland between 1987 and 2001. Girls were asked about the appearance of first menstruation (a yes/no method). The average menarchal age was estimated by the probit analysis method, using second grade polynomials. The socioeconomic status of the girls' families was determined using five qualitative variables: fathers' and mothers' educational level, source of income, household appliances and the number of children in a family. For classification based on five socioeconomic variables, one of the most effective algorithms CART (Classification and Regression Trees) was used. In 2001 the menarchal age in 66% of examined girls was properly classified, while a higher efficiency of 70% was obtained for girls examined in 1987. The decision trees method enabled the definition of the hierarchy of socioeconomic variables influencing girls' biological development level. The strongest discriminatory power was attributed to the number of children in a family, and the mother's and then father's educational level. Using this method it is possible to detect differences in strength of socioeconomic variables associated with girls' pubescence before 1987 and after 2001 during the transformation of the economic and political systems in Poland. However, the decision trees method is infrequently applied in social sciences and constitutes a novelty; this article proves its usefulness in examining relations between biological processes and a population's living conditions.

  5. Using Boosted Decision Trees to look for displaced Jets in the ATLAS Calorimeter

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    A boosted decision tree is used to identify unique jets in a recently released conference note describing a search for long lived particles decaying to hadrons in the ATLAS Calorimeter. Neutral Long lived particles decaying to hadrons are “typical” signatures in a lot of models including Hidden Valley models, Higgs Portal Models, Baryogenesis, Stealth SUSY, etc. Long lived neutral particles that decay in the calorimeter leave behind an object that looks like a regular Standard Model jet, with subtle differences. For example, the later in the calorimeter it decays, the less energy will be deposited in the early layers of the calorimeter. Because the jet does not originate at the interaction point, it will likely be more narrow as reconstructed by the standard Anti-kT jet reconstruction algorithm used by ATLAS. To separate the jets due to neutral long lived decays from the standard model jets we used a boosted decision tree with thirteen variables as inputs. We used the information from the boosted decision...

  6. Determination of Component Failure Modes for a Fire PSA by Using Decision Trees

    International Nuclear Information System (INIS)

    Kang, Dae Il; Han, Sang Hoon; Lim, Jae Won

    2007-01-01

    KAERI developed the method, called a mapping technique, for the quantification of external events PSA models with one top model for an internal events PSA. The mapping technique can be implemented by the construction of mapping tables. The mapping tables include initiating events and transfer events of fire, and internal PSA basic events affected by a fire. This year, KAERI is making mapping tables for the one top model for Ulchin Unit 3 and 4 fire PSA with previously conducted Fire PSA results for Ulchin Unit 3 and 4. A Fire PSA requires a PSA analyst to determine component failure modes affected by a fire. The component failure modes caused by a fire depend on several factors. These several factors are whether components are located at fire initiation and propagation areas or not, fire effects on control and power cables for components, designed failure modes of components, success criteria in a PSA model, etc. Thus, it is not easy to manually determine component failure modes caused by a fire. In this paper, we propose the use of decision trees for the determination of component failure modes affected by a fire and the selection of internal PSA basic events. Section 2 presents the procedure for previously performed the Ulchin Unit 3 and 4 fire PSA and mapping technique. Section 3 presents the process for identification of basic events and decision trees. Section 4 presents the concluding remarks

  7. Peripheral Exophytic Oral Lesions: A Clinical Decision Tree

    Directory of Open Access Journals (Sweden)

    Hamed Mortazavi

    2017-01-01

    Full Text Available Diagnosis of peripheral oral exophytic lesions might be quite challenging. This review article aimed to introduce a decision tree for oral exophytic lesions according to their clinical features. General search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of keywords such as “oral soft tissue lesion,” “oral tumor like lesion,” “oral mucosal enlargement,” and “oral exophytic lesion.” Related English-language articles published since 1988 to 2016 in both medical and dental journals were appraised. Upon compilation of data, peripheral oral exophytic lesions were categorized into two major groups according to their surface texture: smooth (mesenchymal or nonsquamous epithelium-originated and rough (squamous epithelium-originated. Lesions with smooth surface were also categorized into three subgroups according to their general frequency: reactive hyperplastic lesions/inflammatory hyperplasia, salivary gland lesions (nonneoplastic and neoplastic, and mesenchymal lesions (benign and malignant neoplasms. In addition, lesions with rough surface were summarized in six more common lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by a stepwise progression method.

  8. Hedonic price models and indices based on boosting applied to the Dutch housing market

    NARCIS (Netherlands)

    M. Kagie (Martijn); M.C. van Wezel (Michiel)

    2006-01-01

    textabstractWe create a hedonic price model for house prices for six geographical submarkets in the Netherlands. Our model is based on a recent data mining technique called boosting. Boosting is an ensemble technique that combines multiple models, in our case decision trees, into a combined

  9. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  10. Quantifying human and organizational factors in accident management using decision trees: the HORAAM method

    Energy Technology Data Exchange (ETDEWEB)

    Baumont, G.; Menage, F.; Schneiter, J.R.; Spurgin, A.; Vogel, A

    2000-11-01

    In the framework of the level 2 Probabilistic Safety Study (PSA 2) project, the Institute for Nuclear Safety and Protection (IPSN) has developed a method for taking into account Human and Organizational Reliability Aspects during accident management. Actions are taken during very degraded installation operations by teams of experts in the French framework of Crisis Organization (ONC). After describing the background of the framework of the Level 2 PSA, the French specific Crisis Organization and the characteristics of human actions in the Accident Progression Event Tree, this paper describes the method developed to introduce in PSA the Human and Organizational Reliability Analysis in Accident Management (HORAAM). This method is based on the Decision Tree method and has gone through a number of steps in its development. The first one was the observation of crisis center exercises, in order to identify the main influence factors (IFs) which affect human and organizational reliability. These IFs were used as headings in the Decision Tree method. Expert judgment was used in order to verify the IFs, to rank them, and to estimate the value of the aggregated factors to simplify the quantification of the tree. A tool based on Mathematica was developed to increase the flexibility and the efficiency of the study.

  11. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  12. Using histograms to introduce randomization in the generation of ensembles of decision trees

    Science.gov (United States)

    Kamath, Chandrika; Cantu-Paz, Erick; Littau, David

    2005-02-22

    A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.

  13. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  14. Cardiovascular Dysautonomias Diagnosis Using Crisp and Fuzzy Decision Tree: A Comparative Study.

    Science.gov (United States)

    Kadi, Ilham; Idri, Ali

    2016-01-01

    Decision trees (DTs) are one of the most popular techniques for learning classification systems, especially when it comes to learning from discrete examples. In real world, many data occurred in a fuzzy form. Hence a DT must be able to deal with such fuzzy data. In fact, integrating fuzzy logic when dealing with imprecise and uncertain data allows reducing uncertainty and providing the ability to model fine knowledge details. In this paper, a fuzzy decision tree (FDT) algorithm was applied on a dataset extracted from the ANS (Autonomic Nervous System) unit of the Moroccan university hospital Avicenne. This unit is specialized on performing several dynamic tests to diagnose patients with autonomic disorder and suggest them the appropriate treatment. A set of fuzzy classifiers were generated using FID 3.4. The error rates of the generated FDTs were calculated to measure their performances. Moreover, a comparison between the error rates obtained using crisp and FDTs was carried out and has proved that the results of FDTs were better than those obtained using crisp DTs.

  15. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe

    2013-01-01

    The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression...... Tree (CART), has gained increasing interests because of its high performance in terms of computational efficiency, uncertainty manageability, and interpretability. This paper presents an overview of a variety of DT applications to power systems for better interfacing of power systems with data...... analytics. The fundamental knowledge of CART algorithm is also introduced which is then followed by examples of both classification tree and regression tree with the help of case study for security assessment of Danish power system....

  16. Data mining usage in health care management: literature survey and decision tree application

    Directory of Open Access Journals (Sweden)

    Dijana Ćosić

    2008-02-01

    Full Text Available Aim To show the benefits of data mining in health care management.In this example, we are going to show a way to raise awarenessof women in terms of contraceptive methods they use (do notuse.Methods Goal of the data mining analysis was to determine ifthere are common characteristics of the women according to theirchoice of contraception (typical classification problem. Therefore,we decided to use decision trees. We have generated a CHAIDmodel in “Statistica”, based on the database that was formed as aresult of an Indonesian research that was conducted in 1987. Thesample contains married women who were either not pregnant ordid not know if they were pregnant at the time of the interview.The database consists of 1473 cases. Also, an extensive internetsearch was conducted in order to detect a number of articles citedin scientific databases published on the subject of data mining inhealth care management.Results It has shown that the most important variable in case ofwomen’s choice of contraceptive methods is – a husband’s profession.Also we retrieved 221 articles published on the application ofdata mining in health care.Conclusion The goal of the paper is achieved in two ways: first,retrieving 221 articles published on the subject we have proved thebenefits of data mining in the health care management. Second,the decision tree method is successfully applied in explanation ofwomen’s choice of contraceptive methods.

  17. Improved classification of soil salinity by decision tree on remotely sensed images

    Science.gov (United States)

    Rao, Ping; Chen, Shengbo; Sun, Ke

    2006-01-01

    Soil Salinity, caused by natural or human-induced processes, is not only a major cause of soil degradation but also a major environmental hazard all over the world. This results in increasing impact on crop yields and agricultural production in both dry and irrigated areas due to poor land and water management. Multi-temporal optical and microwave remote sensing can significantly contribute to detecting spatial-temporal changes of salt-related surface features. The study area is located in the west of Jilin Province, Northeast China, which is one of most important saline-alkalized areas in semi-arid and arid area in North China. Decision tree classifiers are used to improve the classification of soil salinity on Landsat Thematic Mapper (TM) images in later autumn of 1996. The Kauth-Thomas (K-T) transformation was performed after TM image preprocessing including image registration, mosaic and resizing for the study area. Then the first component of KT transformation, TM 6 imagery (thermal infrared imagery), and NDVI (Normalized Difference Vegetation Index) from TM 4 and TM 3 images, were density-sliced respectively to establish suitable feature classes of soil salinity as the decision nodes. Thus, the classification of soil salinity was improved using decision trees based on these feature classes. Compared with the conventional maximum likelihood classification, this method is more effective to distinguish soil salinity from mixed residential and sand areas in the west of Jilin Province, China.

  18. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  19. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study.

    Science.gov (United States)

    Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad

    2014-09-01

    The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  1. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  2. Identification of Potential Sources of Mercury (Hg in Farmland Soil Using a Decision Tree Method in China

    Directory of Open Access Journals (Sweden)

    Taiyang Zhong

    2016-11-01

    Full Text Available Identification of the sources of soil mercury (Hg on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities—for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals—were the main external sources of a large amount of Hg in the farmland soil.

  3. Calculation of the number of branches of multi-valued decision trees in computer aided importance rank of parameters

    Directory of Open Access Journals (Sweden)

    Tiszbierek Agnieszka

    2017-01-01

    Full Text Available An elaborated digital computer programme supporting the time-consuming process of selecting the importance rank of construction and operation parameters by means of stating optimum sets is based on the Quine – McCluskey algorithm of minimizing individual partial multi-valued logic functions. The example with real time data, calculated by means of the programme, showed that among the obtained optimum sets there were such which had a different number of real branches after being presented on the multi-valued logic decision tree. That is why an idea of elaborating another functionality of the programme – a module calculating the number of branches of real, multi-valued logic decision trees presenting optimum sets chosen by the programme was pursued. This paper presents the idea and the method for developing a module calculating the number of branches, real for each of optimum sets indicated by the programme, as well as to the calculation process.

  4. Identification of Potential Sources of Mercury (Hg) in Farmland Soil Using a Decision Tree Method in China.

    Science.gov (United States)

    Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying

    2016-11-09

    Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities-for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals-were the main external sources of a large amount of Hg in the farmland soil.

  5. A Decision Tree for Psychology Majors: Supplying Questions as Well as Answers.

    Science.gov (United States)

    Poe, Retta E.

    1988-01-01

    Outlines the development of a psychology careers decision tree to help faculty advise students plan their program. States that students using the decision tree may benefit by learning more about their career options and by acquiring better question-asking skills. (GEA)

  6. A new approach to enhance the performance of decision tree for classifying gene expression data.

    Science.gov (United States)

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  7. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    Science.gov (United States)

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  8. Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk

    Directory of Open Access Journals (Sweden)

    Ranque Stéphane

    2005-07-01

    Full Text Available Abstract Background In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results. The aim of our work was to use an Oblique Decision Tree model (ODT which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates. Methods ODT is based on the classification and regression tree (CART. As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in RN, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference. We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan™. Results The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95% [52.22–67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51–60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk

  9. Comparison of robustness against missing values of alternative decision tree and multiple logistic regression for predicting clinical data in primary breast cancer.

    Science.gov (United States)

    Sugimoto, Masahiro; Takada, Masahiro; Toi, Masakazu

    2013-01-01

    Nomogram based on multiple logistic regression (MLR) is a standard technique for predicting diagnostic and treatment outcomes in medical fields. However, the applicability of MLR to data mining of clinical information is limited. To overcome these issues, we have developed prediction models using ensembles of alternative decision trees (ADTree). Here, we compare the performance of MLR and ADTree models in terms of robustness against missing values. As a case study, we employ datasets including pathological complete response (pCR) of neoadjuvant therapy, one of the most important decision-making factors in the diagnosis and treatment of primary breast cancer. Ensembled ADTree models are more robust against missing values than MLR. Sufficient robustness is attained at low boosting and ensemble number, and is compromised as these numbers increase.

  10. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  11. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  12. The use of decision trees in the classification of beach forms/patterns on IKONOS-2 data

    Science.gov (United States)

    Teodoro, A. C.; Ferreira, D.; Gonçalves, H.

    2013-10-01

    Evaluation of beach hydromorphological behaviour and its classification is highly complex. The available beach morphologic and classification models are mainly based on wave, tidal and sediment parameters. Since these parameters are usually unavailable for some regions - such as in the Portuguese coastal zone - a morphologic analysis using remotely sensed data seems to be a valid alternative. Data mining for spatial pattern recognition is the process of discovering useful information, such as patterns/forms, changes and significant structures from large amounts of data. This study focuses on the application of data mining techniques, particularly Decision Trees (DT), to an IKONOS-2 image in order to classify beach features/patterns, in a stretch of the northwest coast of Portugal. Based on the knowledge of the coastal features, five classes were defined: Sea, Suspended-Sediments, Breaking-Zone, Beachface and Beach. The dataset was randomly divided into training and validation subsets. Based on the analysis of several DT algorithms, the CART algorithm was found to be the most adequate and was thus applied. The performance of the DT algorithm was evaluated by the confusion matrix, overall accuracy, and Kappa coefficient. In the classification of beach features/patterns, the algorithm presented an overall accuracy of 98.2% and a kappa coefficient of 0.97. The DTs were compared with a neural network algorithm, and the results were in agreement. The methodology presented in this paper provides promising results and should be considered in further applications of beach forms/patterns classification.

  13. Discovering Patterns in Brain Signals Using Decision Trees

    Directory of Open Access Journals (Sweden)

    Narusci S. Bastos

    2016-01-01

    Full Text Available Even with emerging technologies, such as Brain-Computer Interfaces (BCI systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain’s behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain’s behaviour.

  14. Discovering Patterns in Brain Signals Using Decision Trees.

    Science.gov (United States)

    Bastos, Narusci S; Adamatti, Diana F; Billa, Cleo Z

    Even with emerging technologies, such as Brain-Computer Interfaces (BCI) systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain's behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT) to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain's behaviour.

  15. Efficient OCR using simple features and decision trees with backtracking

    International Nuclear Information System (INIS)

    Abuhaiba, Ibrahim S.I.

    2006-01-01

    In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)

  16. Using decision trees to explore the association between the length of stay and potentially avoidable readmissions: A retrospective cohort study.

    Science.gov (United States)

    Alyahya, Mohammad S; Hijazi, Heba H; Alshraideh, Hussam A; Al-Nasser, Amjad D

    2017-12-01

    There is a growing concern that reduction in hospital length of stay (LOS) may raise the rate of hospital readmission. This study aims to identify the rate of avoidable 30-day readmission and find out the association between LOS and readmission. All consecutive patient admissions to the internal medicine services (n = 5,273) at King Abdullah University Hospital in Jordan between 1 December 2012 and 31 December 2013 were analyzed. To identify avoidable readmissions, a validated computerized algorithm called SQLape was used. The multinomial logistic regression was firstly employed. Then, detailed analysis was performed using the Decision Trees (DTs) model, one of the most widely used data mining algorithms in Clinical Decision Support Systems (CDSS). The potentially avoidable 30-day readmission rate was 44%, and patients with longer LOS were more likely to be readmitted avoidably. However, LOS had a significant negative effect on unavoidable readmissions. The avoidable readmission rate is still highly unacceptable. Because LOS potentially increases the likelihood of avoidable readmission, it is still possible to achieve a shorter LOS without increasing the readmission rate. Moreover, the way the DT model classified patient subgroups of readmissions based on patient characteristics and LOS is applicable in real clinical decisions.

  17. Landslide Susceptibility Mapping of Tegucigalpa, Honduras Using Artificial Neural Network, Bayesian Network and Decision Trees

    Science.gov (United States)

    Garcia Urquia, E. L.; Braun, A.; Yamagishi, H.

    2016-12-01

    Tegucigalpa, the capital city of Honduras, experiences rainfall-induced landslides on a yearly basis. The high precipitation regime and the rugged topography the city has been built in couple with the lack of a proper urban expansion plan to contribute to the occurrence of landslides during the rainy season. Thousands of inhabitants live at risk of losing their belongings due to the construction of precarious shelters in landslide-prone areas on mountainous terrains and next to the riverbanks. Therefore, the city is in the need for landslide susceptibility and hazard maps to aid in the regulation of future development. Major challenges in the context of highly dynamic urbanizing areas are the overlap of natural and anthropogenic slope destabilizing factors, as well as the availability and accuracy of data. Data-driven multivariate techniques have proven to be powerful in discovering interrelations between factors, identifying important factors in large datasets, capturing non-linear problems and coping with noisy and incomplete data. This analysis focuses on the creation of a landslide susceptibility map using different methods from the field of data mining, Artificial Neural Networks (ANN), Bayesian Networks (BN) and Decision Trees (DT). The input dataset of the study contains geomorphological and hydrological factors derived from a digital elevation model with a 10 m resolution, lithological factors derived from a geological map, and anthropogenic factors, such as information on the development stage of the neighborhoods in Tegucigalpa and road density. Moreover, a landslide inventory map that was developed in 2014 through aerial photo interpretation was used as target variable in the analysis. The analysis covers an area of roughly 100 km2, while 8.95 km2 are occupied by landslides. In a first step, the dataset was explored by assessing and improving the data quality, identifying unimportant variables and finding interrelations. Then, based on a training

  18. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  19. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  20. An Analysis on Performance of Decision Tree Algorithms using Student’s Qualitative Data

    OpenAIRE

    T.Miranda Lakshmi; A.Martin; R.Mumtaj Begum; V.Prasanna Venkatesan

    2013-01-01

    Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classif...

  1. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    Science.gov (United States)

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  3. Diagnostic assessment of intraoperative cytology for papillary thyroid carcinoma: using a decision tree analysis.

    Science.gov (United States)

    Pyo, J-S; Sohn, J H; Kang, G

    2017-03-01

    The aim of this study was to elucidate the cytological characteristics and the diagnostic usefulness of intraoperative cytology (IOC) for papillary thyroid carcinoma (PTC). In addition, using decision tree analysis, effective features for accurate cytological diagnosis were sought. We investigated cellularity, cytological features and diagnosis based on the Bethesda System for Reporting Thyroid Cytopathology in IOC of 240 conventional PTCs. The cytological features were evaluated in terms of nuclear score with nuclear features, and additional figures such as presence of swirling sheets, psammoma bodies, and multinucleated giant cells. The nuclear score (range 0-7) was made via seven nuclear features, including (1) enlarged, (2) oval or irregularly shaped nuclei, (3) longitudinal nuclear grooves, (4) intranuclear cytoplasmic pseudoinclusion, (5) pale nuclei with powdery chromatin, (6) nuclear membrane thickening, and (7) marginally placed micronucleoli. Nuclear scores in PTC, suspicious for malignancy, and atypia of undetermined significance cases were 6.18 ± 0.80, 4.48 ± 0.82, and 3.15 ± 0.67, respectively. Additional figures more frequent in PTC than in other diagnostic categories were identified. Cellularity of IOC significantly correlated with tumor size, nuclear score, and presence of additional figures. Also, IOCs with higher nuclear scores (4-7) significantly correlated with larger tumor size and presence of additional figures. In decision tree analysis, IOCs with nuclear score >5 and swirling sheets could be considered diagnostic for PTCs. Our study suggests that IOCs using nuclear features and additional figures could be useful with decreasing the likelihood of inconclusive results.

  4. Decision-Tree Analysis for Predicting First-Time Pass/Fail Rates for the NCLEX-RN® in Associate Degree Nursing Students.

    Science.gov (United States)

    Chen, Hsiu-Chin; Bennett, Sean

    2016-08-01

    Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.

  5. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  6. Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods

    National Research Council Canada - National Science Library

    Chang, LiWu

    2003-01-01

    .... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...

  7. Bi-Criteria Optimization of Decision Trees with Applications to Data Analysis

    KAUST Repository

    Chikalov, Igor

    2017-10-19

    This paper is devoted to the study of bi-criteria optimization problems for decision trees. We consider different cost functions such as depth, average depth, and number of nodes. We design algorithms that allow us to construct the set of Pareto optimal points (POPs) for a given decision table and the corresponding bi-criteria optimization problem. These algorithms are suitable for investigation of medium-sized decision tables. We discuss three examples of applications of the created tools: the study of relationships among depth, average depth and number of nodes for decision trees for corner point detection (such trees are used in computer vision for object tracking), study of systems of decision rules derived from decision trees, and comparison of different greedy algorithms for decision tree construction as single- and bi-criteria optimization algorithms.

  8. EVALUATION OF DECISION TREE CLASSIFICATION ACCURACY TO MAP LAND COVER IN CAPIXABA, ACRE

    Directory of Open Access Journals (Sweden)

    Symone Maria de Melo Figueiredo

    2006-03-01

    Full Text Available This study evaluated the accuracy of mapping land cover in Capixaba, state of Acre, Brazil, using decision trees. Elevenattributes were used to build the decision trees: TM Landsat datafrom bands 1, 2, 3, 4, 5, and 7; fraction images derived from linearspectral unmixing; and the normalized difference vegetation index (NDVI. The Kappa values were greater than 0,83, producingexcellent classification results and demonstrating that the technique is promising for mapping land cover in the study area.

  9. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  10. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  11. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    Science.gov (United States)

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  12. Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree

    Science.gov (United States)

    Wahyuni, Sri

    2018-03-01

    Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.

  13. Metric Sex Determination of the Human Coxal Bone on a Virtual Sample using Decision Trees.

    Science.gov (United States)

    Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert

    2015-11-01

    Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination. © 2015 American Academy of Forensic Sciences.

  14. Interpretable decision-tree induction in a big data parallel framework

    Directory of Open Access Journals (Sweden)

    Weinberg Abraham Itzhak

    2017-12-01

    Full Text Available When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.

  15. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  16. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  17. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers.

    Science.gov (United States)

    Malueka, Rusdy Ghazali; Takaoka, Yutaka; Yagi, Mariko; Awano, Hiroyuki; Lee, Tomoko; Dwianingsih, Ery Kus; Nishida, Atsushi; Takeshima, Yasuhiro; Matsuo, Masafumi

    2012-03-31

    Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a) identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.

  19. Agent-based modeling of sustainable behaviors

    CERN Document Server

    Sánchez-Maroño, Noelia; Fontenla-Romero, Oscar; Polhill, J; Craig, Tony; Bajo, Javier; Corchado, Juan

    2017-01-01

    Using the O.D.D. (Overview, Design concepts, Detail) protocol, this title explores the role of agent-based modeling in predicting the feasibility of various approaches to sustainability. The chapters incorporated in this volume consist of real case studies to illustrate the utility of agent-based modeling and complexity theory in discovering a path to more efficient and sustainable lifestyles. The topics covered within include: households' attitudes toward recycling, designing decision trees for representing sustainable behaviors, negotiation-based parking allocation, auction-based traffic signal control, and others. This selection of papers will be of interest to social scientists who wish to learn more about agent-based modeling as well as experts in the field of agent-based modeling.

  20. Decision trees and decision committee applied to star/galaxy separation problem

    Science.gov (United States)

    Vasconcellos, Eduardo Charles

    Vasconcellos et al [1] study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set. [1] Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R.. Decision Tree Classifiers for Star/Galaxy Separation. The Astronomical Journal, Volume 141, Issue 6, 2011.

  1. Decision making for health care professionals: use of decision trees within the community mental health setting.

    Science.gov (United States)

    Bonner, G

    2001-08-01

    To examine the application of the decision tree approach to collaborative clinical decision-making in mental health care in the United Kingdom (UK). While this approach to decision-making has been examined in the acute care setting, there is little published evidence of its use in clinical decision-making within the mental health setting. The complexities of dual diagnosis (schizophrenia and substance misuse in this case example) and the varied viewpoints of different professionals often hamper the decision-making process. This paper highlights how the approach was used successfully as a multiprofessional collaborative approach to decision-making in the context of British community mental health care. A selective review of the relevant literature and a case study application of the decision tree framework. The process of applying the decision tree framework to clinical decision-making in mental health practice can be time consuming and client inclusion within the process is not always appropriate. The approach offers a method of assigning numerical values to support complex multiprofessional decision-making as well as considering underpinning literature to inform the final decision. Use of the decision tree offers a common framework that can assist professionals to examine the options available to them in depth, while considering the complex variables that influence decision-making in collaborative mental health practice. Use of the decision tree warrants further consideration in mental health care in terms of practice and education.

  2. [Comparison of Discriminant Analysis and Decision Trees for the Detection of Subclinical Keratoconus].

    Science.gov (United States)

    Kleinhans, Sonja; Herrmann, Eva; Kohnen, Thomas; Bühren, Jens

    2017-08-15

    Background Iatrogenic keratectasia is one of the most dreaded complications of refractive surgery. In most cases, keratectasia develops after refractive surgery of eyes suffering from subclinical stages of keratoconus with few or no signs. Unfortunately, there has been no reliable procedure for the early detection of keratoconus. In this study, we used binary decision trees (recursive partitioning) to assess their suitability for discrimination between normal eyes and eyes with subclinical keratoconus. Patients and Methods The method of decision tree analysis was compared with discriminant analysis which has shown good results in previous studies. Input data were 32 eyes of 32 patients with newly diagnosed keratoconus in the contralateral eye and preoperative data of 10 eyes of 5 patients with keratectasia after laser in-situ keratomileusis (LASIK). The control group was made up of 245 normal eyes after LASIK and 12-month follow-up without any signs of iatrogenic keratectasia. Results Decision trees gave better accuracy and specificity than did discriminant analysis. The sensitivity of decision trees was lower than the sensitivity of discriminant analysis. Conclusion On the basis of the patient population of this study, decision trees did not prove to be superior to linear discriminant analysis for the detection of subclinical keratoconus. Georg Thieme Verlag KG Stuttgart · New York.

  3. Computerized Adaptive Test vs. decision trees: Development of a support decision system to identify suicidal behavior.

    Science.gov (United States)

    Delgado-Gomez, D; Baca-Garcia, E; Aguado, D; Courtet, P; Lopez-Castroman, J

    2016-12-01

    Several Computerized Adaptive Tests (CATs) have been proposed to facilitate assessments in mental health. These tests are built in a standard way, disregarding useful and usually available information not included in the assessment scales that could increase the precision and utility of CATs, such as the history of suicide attempts. Using the items of a previously developed scale for suicidal risk, we compared the performance of a standard CAT and a decision tree in a support decision system to identify suicidal behavior. We included the history of past suicide attempts as a class for the separation of patients in the decision tree. The decision tree needed an average of four items to achieve a similar accuracy than a standard CAT with nine items. The accuracy of the decision tree, obtained after 25 cross-validations, was 81.4%. A shortened test adapted for the separation of suicidal and non-suicidal patients was developed. CATs can be very useful tools for the assessment of suicidal risk. However, standard CATs do not use all the information that is available. A decision tree can improve the precision of the assessment since they are constructed using a priori information. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    Science.gov (United States)

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all Plaminitis (OR 40.5, Plaminitis. 'Presence of a flat/convex sole' also significantly enhanced clinical diagnosis discrimination (OR 15.5, Plaminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements. British Veterinary Association.

  5. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  6. Transforming clinical practice guidelines and clinical pathways into fast-and-frugal decision trees to improve clinical care strategies.

    Science.gov (United States)

    Djulbegovic, Benjamin; Hozo, Iztok; Dale, William

    2018-02-27

    Contemporary delivery of health care is inappropriate in many ways, largely due to suboptimal Q5 decision-making. A typical approach to improve practitioners' decision-making is to develop evidence-based clinical practice guidelines (CPG) by guidelines panels, who are instructed to use their judgments to derive practice recommendations. However, mechanisms for the formulation of guideline judgments remains a "black-box" operation-a process with defined inputs and outputs but without sufficient knowledge of its internal workings. Increased explicitness and transparency in the process can be achieved by implementing CPG as clinical pathways (CPs) (also known as clinical algorithms or flow-charts). However, clinical recommendations thus derived are typically ad hoc and developed by experts in a theory-free environment. As any recommendation can be right (true positive or negative), or wrong (false positive or negative), the lack of theoretical structure precludes the quantitative assessment of the management strategies recommended by CPGs/CPs. To realize the full potential of CPGs/CPs, they need to be placed on more solid theoretical grounds. We believe this potential can be best realized by converting CPGs/CPs within the heuristic theory of decision-making, often implemented as fast-and-frugal (FFT) decision trees. This is possible because FFT heuristic strategy of decision-making can be linked to signal detection theory, evidence accumulation theory, and a threshold model of decision-making, which, in turn, allows quantitative analysis of the accuracy of clinical management strategies. Fast-and-frugal provides a simple and transparent, yet solid and robust, methodological framework connecting decision science to clinical care, a sorely needed missing link between CPGs/CPs and patient outcomes. We therefore advocate that all guidelines panels express their recommendations as CPs, which in turn should be converted into FFTs to guide clinical care. © 2018 John Wiley

  7. Development of diagnostic model of lung cancer based on multiple tumor markers and data mining.

    Science.gov (United States)

    Wang, Zhaoxian; Feng, Feifei; Zhou, Xiaoshan; Duan, Liju; Wang, Jing; Wu, Yongjun; Wang, Na

    2017-11-07

    To develop early intelligent discriminative model of lung cancer and evaluate the efficiency of diagnosis value. Based on the genetic polymorphism profile of CYP1A1-rs1048943, GSTM1, mEH-rs1051740, XRCC1-rs1799782 and XRCC1-rs25489 and the methylations of p16 and RASSF1A gene, and the length of telomere in the peripheral blood from 200 lung cancer patients and 200 health persons, the discriminative model was established through decision tree and ANN technique. ACU of the discriminative model based on multiple tumour markers increased by about 10%; The accuracy rate of decision tree model and ANN model for testing set were 93.00% and 89.62% respectively. The ROC analysis showed the decision tree model's AUC is 0.929 (0.894∼0.964), the ANN model's AUC is 0.894 (0.853∼0.935). However, the classify accuracy rate and AUC of Fisher discriminatory analysis model are all about 0.7. The early intelligent discriminative model of lung cancer based on multiple tumor markers and data mining techniques has a higher accuracy rate and might be useful for early diagnosis of lung cancer.

  8. Decision Trees Predicting Tumor Shrinkage for Head and Neck Cancer: Implications for Adaptive Radiotherapy.

    Science.gov (United States)

    Surucu, Murat; Shah, Karan K; Mescioglu, Ibrahim; Roeske, John C; Small, William; Choi, Mehee; Emami, Bahman

    2016-02-01

    To develop decision trees predicting for tumor volume reduction in patients with head and neck (H&N) cancer using pretreatment clinical and pathological parameters. Forty-eight patients treated with definitive concurrent chemoradiotherapy for squamous cell carcinoma of the nasopharynx, oropharynx, oral cavity, or hypopharynx were retrospectively analyzed. These patients were rescanned at a median dose of 37.8 Gy and replanned to account for anatomical changes. The percentages of gross tumor volume (GTV) change from initial to rescan computed tomography (CT; %GTVΔ) were calculated. Two decision trees were generated to correlate %GTVΔ in primary and nodal volumes with 14 characteristics including age, gender, Karnofsky performance status (KPS), site, human papilloma virus (HPV) status, tumor grade, primary tumor growth pattern (endophytic/exophytic), tumor/nodal/group stages, chemotherapy regimen, and primary, nodal, and total GTV volumes in the initial CT scan. The C4.5 Decision Tree induction algorithm was implemented. The median %GTVΔ for primary, nodal, and total GTVs was 26.8%, 43.0%, and 31.2%, respectively. Type of chemotherapy, age, primary tumor growth pattern, site, KPS, and HPV status were the most predictive parameters for primary %GTVΔ decision tree, whereas for nodal %GTVΔ, KPS, site, age, primary tumor growth pattern, initial primary GTV, and total GTV volumes were predictive. Both decision trees had an accuracy of 88%. There can be significant changes in primary and nodal tumor volumes during the course of H&N chemoradiotherapy. Considering the proposed decision trees, radiation oncologists can select patients predicted to have high %GTVΔ, who would theoretically gain the most benefit from adaptive radiotherapy, in order to better use limited clinical resources. © The Author(s) 2015.

  9. Boosted decision trees in the CMS Level-1 endcap muon trigger

    CERN Document Server

    Low, Jia Fu; Busch, Elena Laura; Carnes, Andrew Mathew; Furic, Ivan-Kresimir; Gleyzer, Sergei; Kotov, Khristian; Madorsky, Alexander; Rorie, Jamal Tildon; Scurlock, Bobby; Shi, Wei; Acosta, Darin Edward

    2017-01-01

    The first implementation of Boosted Decision Trees (BDTs) inside a Level-1 trigger system at the LHC is presented. The Endcap Muon Track Finder (EMTF) at CMS uses BDTs to infer the momentum of muons in the forward region of the detector, based on 25 different variables. Combinations of these variables are evaluated offline using regression BDTs, whose output is stored in 1.2 GB look-up tables (LUTs) in the EMTF hardware. These BDTs take advantage of complex correlations between variables, the inhomogeneous magnetic field, and non-linear effects such as inelastic scattering to distinguish high-momentum signal muons from the overwhelming low-momentum background. The LUTs are used to turn the complex BDT evaluation into a simple look-up operation in fixed low latency. The new momentum assignment algorithm has reduced the trigger rate by a factor of 3 at the 25 GeV trigger threshold with respect to the legacy system, with further improvements foreseen in the coming year.

  10. Boosted Decision Trees in the CMS Level-1 Endcap Muon Trigger

    CERN Document Server

    Acosta, Darin Edward; Busch, Elena Laura; Carnes, Andrew Mathew; Furic, Ivan-Kresimir; Gleyzer, Sergei; Kotov, Khristian; Low, Jia Fu; Madorsky, Alexander; Rorie, Jamal Tildon; Scurlock, Bobby; Shi, Wei

    2017-01-01

    The first implementation of Boosted Decision Trees (BDTs) inside a Level-1 trigger system at the LHC is presented. The Endcap Muon Track Finder (EMTF) at CMS uses BDTs to infer the momentum of muons in the forward region of the detector, based on 25 different variables. Combinations of these variables are evaluated offline using regression BDTs, whose output is stored in 1.2 GB look-up tables (LUTs) in the EMTF hardware. These BDTs take advantage of complex correlations between variables, the inhomogeneous magnetic field, and non-linear effects such as inelastic scattering to distinguish high-momentum signal muons from the overwhelming low-momentum background. The LUTs are used to turn the complex BDT evaluation into a simple look-up operation in fixed low latency. The new momentum assignment algorithm has reduced the trigger rate by a factor of 3 at the 25 GeV trigger threshold with respect to the legacy system, with further improvements foreseen in the coming year.

  11. A New Decision Tree to Solve the Puzzle of Alzheimer's Disease Pathogenesis Through Standard Diagnosis Scoring System.

    Science.gov (United States)

    Kumar, Ashwani; Singh, Tiratha Raj

    2017-03-01

    Alzheimer's disease (AD) is a progressive, incurable and terminal neurodegenerative disorder of the brain and is associated with mutations in amyloid precursor protein, presenilin 1, presenilin 2 or apolipoprotein E, but its underlying mechanisms are still not fully understood. Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the clinical dataset are therefore increasingly becoming necessary. The current study deals with the construction of classifiers that can be human readable as well as robust in performance for gene dataset of AD using a decision tree. Models of classification for different AD genes were generated according to Mini-Mental State Examination scores and all other vital parameters to achieve the identification of the expression level of different proteins of disorder that may possibly determine the involvement of genes in various AD pathogenesis pathways. The effectiveness of decision tree in AD diagnosis is determined by information gain with confidence value (0.96), specificity (92 %), sensitivity (98 %) and accuracy (77 %). Besides this functional gene classification using different parameters and enrichment analysis, our finding indicates that the measures of all the gene assess in single cohorts are sufficient to diagnose AD and will help in the prediction of important parameters for other relevant assessments.

  12. Cost-effectiveness of exercise 201Tl myocardial SPECT in patients with chest pain assessed by decision-tree analysis

    International Nuclear Information System (INIS)

    Kosuda, Shigeru; Momiyama, Yukihiko; Ohsuzu, Fumitaka; Kusano, Shoichi; Ichihara, Kiyoshi

    1999-01-01

    To evaluate the potential cost-effectiveness of exercise 201 Tl myocardial SPECT in outpatients with angina-like chest pain, we developed a decision-tree model which comprises three 1000-patients groups, i.e., a coronary arteriography (CAG) group, a follow-up group, and a SPECT group, and total cost and cardiac events, including cardiac deaths, were calculated. Variables used for the decision-tree analysis were obtained from references and the data available at out hospital. The sensitivity and specificity of 201 Tl SPECT for diagnosing angina pectoris, and its prevalence were assumed to be 95%, 85%, and 33%, respectively. The mean costs were 84.9 x 10 4 yen/patient in the CAG group, 30.2 x 10 4 yen/patient in the follow-up group, and 71.0 x 10 4 yen/patient in the SPECT group. The numbers of cardiac events and cardiac deaths were 56 and 15, respectively in the CAG group, 264 and 81 in the follow-up group, and 65 and 17 in the SPECT group. SPECT increases cardiac events and cardiac deaths by 0.9% and 0.2%, but it reduces the number of CAG studies by 50.3%, and saves 13.8 x 10 4 yen/patient, as compared to the CAG group. In conclusion, the exercise 201 Tl myocardial SPECT strategy for patients with chest pain has the potential to reduce health care costs in Japan. (author)

  13. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  14. The decision tree classifier - Design and potential. [for Landsat-1 data

    Science.gov (United States)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  15. Risk Factors Predicting Infectious Lactational Mastitis: Decision Tree Approach versus Logistic Regression Analysis.

    Science.gov (United States)

    Fernández, Leónides; Mediano, Pilar; García, Ricardo; Rodríguez, Juan M; Marín, María

    2016-09-01

    Objectives Lactational mastitis frequently leads to a premature abandonment of breastfeeding; its development has been associated with several risk factors. This study aims to use a decision tree (DT) approach to establish the main risk factors involved in mastitis and to compare its performance for predicting this condition with a stepwise logistic regression (LR) model. Methods Data from 368 cases (breastfeeding women with mastitis) and 148 controls were collected by a questionnaire about risk factors related to medical history of mother and infant, pregnancy, delivery, postpartum, and breastfeeding practices. The performance of the DT and LR analyses was compared using the area under the receiver operating characteristic (ROC) curve. Sensitivity, specificity and accuracy of both models were calculated. Results Cracked nipples, antibiotics and antifungal drugs during breastfeeding, infant age, breast pumps, familial history of mastitis and throat infection were significant risk factors associated with mastitis in both analyses. Bottle-feeding and milk supply were related to mastitis for certain subgroups in the DT model. The areas under the ROC curves were similar for LR and DT models (0.870 and 0.835, respectively). The LR model had better classification accuracy and sensitivity than the DT model, but the last one presented better specificity at the optimal threshold of each curve. Conclusions The DT and LR models constitute useful and complementary analytical tools to assess the risk of lactational infectious mastitis. The DT approach identifies high-risk subpopulations that need specific mastitis prevention programs and, therefore, it could be used to make the most of public health resources.

  16. Behaviour change in overweight and obese pregnancy: a decision tree to support the development of antenatal lifestyle interventions.

    Science.gov (United States)

    Ainscough, Kate M; Lindsay, Karen L; O'Sullivan, Elizabeth J; Gibney, Eileen R; McAuliffe, Fionnuala M

    2017-10-01

    Antenatal healthy lifestyle interventions are frequently implemented in overweight and obese pregnancy, yet there is inconsistent reporting of the behaviour-change methods and behavioural outcomes. This limits our understanding of how and why such interventions were successful or not. The current paper discusses the application of behaviour-change theories and techniques within complex lifestyle interventions in overweight and obese pregnancy. The authors propose a decision tree to help guide researchers through intervention design, implementation and evaluation. The implications for adopting behaviour-change theories and techniques, and using appropriate guidance when constructing and evaluating interventions in research and clinical practice are also discussed. To enhance the evidence base for successful behaviour-change interventions during pregnancy, adoption of behaviour-change theories and techniques, and use of published guidelines when designing lifestyle interventions are necessary. The proposed decision tree may be a useful guide for researchers working to develop effective behaviour-change interventions in clinical settings. This guide directs researchers towards key literature sources that will be important in each stage of study development.

  17. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    Science.gov (United States)

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  18. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  19. Decision tree analysis to evaluate dry cow strategies under UK conditions

    NARCIS (Netherlands)

    Berry, E.A.; Hogeveen, H.; Hillerton, J.E.

    2004-01-01

    Economic decisions on animal health strategies address the cost-benefit aspect along with animal welfare and public health concerns. Decision tree analysis at an individual cow level highlighted that there is little economic difference between the use of either dry cow antibiotic or an internal teat

  20. Dynamic Programming Strategies on the Decision Tree Hidden behind the Optimizing Problems

    OpenAIRE

    Zoltan KATAI

    2007-01-01

    The aim of the paper is to present the characteristics of certain dynamic programming strategies on the decision tree hidden behind the optimizing problems and thus to offer such a clear tool for their study and classification which can help in the comprehension of the essence of this programming technique.

  1. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  2. Construction and application of hierarchical decision tree for classification of ultrasonographic prostate images

    NARCIS (Netherlands)

    Giesen, R. J.; Huynen, A. L.; Aarnink, R. G.; de la Rosette, J. J.; Debruyne, F. M.; Wijkstra, H.

    1996-01-01

    A non-parametric algorithm is described for the construction of a binary decision tree classifier. This tree is used to correlate textural features, computed from ultrasonographic prostate images, with the histopathology of the imaged tissue. The algorithm consists of two parts; growing and pruning.

  3. Visualization of Decision Tree State for the Classification of Parkinson's Disease

    NARCIS (Netherlands)

    Valentijn, E

    2016-01-01

    Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node

  4. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  5. Can Religious Beliefs be a Protective Factor for Suicidal Behavior? A Decision Tree Analysis in a Mid-Sized City in Iran, 2013.

    Science.gov (United States)

    Baneshi, Mohammad Reza; Haghdoost, Ali Akbar; Zolala, Farzaneh; Nakhaee, Nouzar; Jalali, Maryam; Tabrizi, Reza; Akbari, Maryam

    2017-04-01

    This study aimed to assess using tree-based models the impact of different dimensions of religion and other risk factors on suicide attempts in the Islamic Republic of Iran. Three hundred patients who attempted suicide and 300 age- and sex-matched patient attendants with other types of disease who referred to Kerman Afzalipour Hospital were recruited for this study following a convenience sampling. Religiosity was assessed by the Duke University Religion Index. A tree-based model was constructed using the Gini Index as the homogeneity criterion. A complementary discrimination analysis was also applied. Variables contributing to the construction of the tree were stressful life events, mental disorder, family support, and religious belief. Strong religious belief was a protective factor for those with a low number of stressful life events and those with a high mental disorder score; 72 % of those who formed these two groups had not attempted suicide. Moreover, 63 % of those with a high number of stressful life events, strong family support, strong problem-solving skills, and a low mental disorder score were less likely to attempt suicide. The significance of four other variables, GHQ, problem-coping skills, friend support, and neuroticism, was revealed in the discrimination analysis. Religious beliefs seem to be an independent factor that can predict risk for suicidal behavior. Based on the decision tree, religious beliefs among people with a high number of stressful life events might not be a dissuading factor. Such subjects need more family support and problem-solving skills.

  6. A Clinical Decision Tree to Predict Whether a Bacteremic Patient Is Infected With an Extended-Spectrum β-Lactamase-Producing Organism.

    Science.gov (United States)

    Goodman, Katherine E; Lessler, Justin; Cosgrove, Sara E; Harris, Anthony D; Lautenbach, Ebbing; Han, Jennifer H; Milstone, Aaron M; Massey, Colin J; Tamma, Pranita D

    2016-10-01

    Timely identification of extended-spectrum β-lactamase (ESBL) bacteremia can improve clinical outcomes while minimizing unnecessary use of broad-spectrum antibiotics, including carbapenems. However, most clinical microbiology laboratories currently require at least 24 additional hours from the time of microbial genus and species identification to confirm ESBL production. Our objective was to develop a user-friendly decision tree to predict which organisms are ESBL producing, to guide appropriate antibiotic therapy. We included patients ≥18 years of age with bacteremia due to Escherichia coli or Klebsiella species from October 2008 to March 2015 at Johns Hopkins Hospital. Isolates with ceftriaxone minimum inhibitory concentrations ≥2 µg/mL underwent ESBL confirmatory testing. Recursive partitioning was used to generate a decision tree to determine the likelihood that a bacteremic patient was infected with an ESBL producer. Discrimination of the original and cross-validated models was evaluated using receiver operating characteristic curves and by calculation of C-statistics. A total of 1288 patients with bacteremia met eligibility criteria. For 194 patients (15%), bacteremia was due to a confirmed ESBL producer. The final classification tree for predicting ESBL-positive bacteremia included 5 predictors: history of ESBL colonization/infection, chronic indwelling vascular hardware, age ≥43 years, recent hospitalization in an ESBL high-burden region, and ≥6 days of antibiotic exposure in the prior 6 months. The decision tree's positive and negative predictive values were 90.8% and 91.9%, respectively. Our findings suggest that a clinical decision tree can be used to estimate a bacteremic patient's likelihood of infection with ESBL-producing bacteria. Recursive partitioning offers a practical, user-friendly approach for addressing important diagnostic questions. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of

  7. Utilizing Home Healthcare Electronic Health Records for Telehomecare Patients With Heart Failure: A Decision Tree Approach to Detect Associations With Rehospitalizations.

    Science.gov (United States)

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H

    2016-04-01

    Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports.

  8. Using decision tree classifier to predict income levels

    OpenAIRE

    Bekena, Sisay Menji

    2017-01-01

    In this study Random Forest Classifier machine learning algorithm is applied to predict income levels of individuals based on attributes including education, marital status, gender, occupation, country and others. Income levels are defined as a binary variable 0 for income

  9. Decision tree approach for classification of remotely sensed satellite ...

    Indian Academy of Sciences (India)

    The performance of this type of cla- ssifier depends on how well the data match the pre-defined model ... ious types of data sources under a single classi- fier framework has proved to be advantageous in using DTC. .... tion of machine learning algorithms for data mining tasks. The algorithms can either be applied directly.

  10. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    OpenAIRE

    O.BENCHAREF; M.FAKIR; B. MINAOUI; B.BOUIKHALENE

    2011-01-01

    The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of s...

  11. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    Science.gov (United States)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  12. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm.

    Science.gov (United States)

    Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-07-01

    DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.

  13. Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines.

    Science.gov (United States)

    Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim

    2015-07-30

    Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Three-dimensional object recognition using similar triangles and decision trees

    Science.gov (United States)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  15. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  16. Advocating the broad use of the decision tree method in education

    OpenAIRE

    Almeida, Leandro S.; Gomes, Cristiano Mauro Assis

    2017-01-01

    Predictive studies have been widely undertaken in the field of education to provide strategic information about the extensive set of processes related to teaching and learning, as well as about what variables predict certain educational outcomes, such as academic achievement or dropout. As in any other area, there is a set of standard techniques that is usually used in predictive studies in the field education. Even though the Decision Tree Method is a well-known and standard approach in Data...

  17. The Studies of Decision Tree in Estimation of Breast Cancer Risk by Using Polymorphism Nucleotide

    OpenAIRE

    Frida Seyedmir; Kamal Mirzaie; Morteza Bitaraf Sani

    2017-01-01

    Abstract Introduction:   Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousan...

  18. Validating a decision tree for serious infection: diagnostic accuracy in acutely ill children in ambulatory care.

    Science.gov (United States)

    Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank

    2015-08-07

    Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Diagnostic accuracy study validating a clinical prediction rule. Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Physicians were asked to score the decision tree in every child. The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. NCT02024282. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  19. Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis

    OpenAIRE

    Razavi, Amir R; Gill, Hans; Åhlfeldt, Hans; Shahsavar, Nosrat

    2008-01-01

    Background: The guideline for postmastectomy radiotherapy (PMRT), which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Methods: Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI) found patterns of non-compliance with the guideline. The PMRT guideline was us...

  20. Classification of Different Degrees of Disability Following Intracerebral Hemorrhage: A Decision Tree Analysis from VISTA-ICH Collaboration.

    Science.gov (United States)

    Phan, Thanh G; Chen, Jian; Beare, Richard; Ma, Henry; Clissold, Benjamin; Van Ly, John; Srikanth, Velandai

    2017-01-01

    Prognostication following intracerebral hemorrhage (ICH) has focused on poor outcome at the expense of lumping together mild and moderate disability. We aimed to develop a novel approach at classifying a range of disability following ICH. The Virtual International Stroke Trial Archive collaboration database was searched for patients with ICH and known volume of ICH on baseline CT scans. Disability was partitioned into mild [modified Rankin Scale (mRS) at 90 days of 0-2], moderate (mRS = 3-4), and severe disabilities (mRS = 5-6). We used binary and trichotomy decision tree methodology. The data were randomly divided into training (2/3 of data) and validation (1/3 data) datasets. The area under the receiver operating characteristic curve (AUC) was used to calculate the accuracy of the decision tree model. We identified 957 patients, age 65.9 ± 12.3 years, 63.7% males, and ICH volume 22.6 ± 22.1 ml. The binary tree showed that lower ICH volume (27.9 ml), older age (>69.5 years), and low Glasgow Coma Scale (tree showed that ICH volume, age, and serum glucose can separate mild, moderate, and severe disability groups with AUC 0.79 (95% CI 0.71-0.87). Both the binary and trichotomy methods provide equivalent discrimination of disability outcome after ICH. The trichotomy method can classify three categories at once, whereas this action was not possible with the binary method. The trichotomy method may be of use to clinicians and trialists for classifying a range of disability in ICH.

  1. Multivariate decision tree design for the classification of multi-jet topologies in $e^{+}e^{-}$ collisions

    CERN Document Server

    Mjahed, M

    2002-01-01

    The binary decision tree method is used to separate between several multi-jet topologies in e/sup +/e/sup -/ collisions. Instead of the univariate process usually taken, a new design procedure for constructing multivariate decision trees is proposed. The segmentation is obtained by considering some features functions, where linear and nonlinear discriminant functions and a minimal distance method are used. The classification focuses on ALEPH simulated events, with multi-jet topologies. Compared to a standard univariate tree, the multivariate decision trees offer significantly better performance. (30 refs).

  2. Decision Trees for Continuous Data and Conditional Mutual Information as a Criterion for Splitting Instances.

    Science.gov (United States)

    Drakakis, Georgios; Moledina, Saadiq; Chomenidis, Charalampos; Doganis, Philip; Sarimveis, Haralambos

    2016-01-01

    Decision trees are renowned in the computational chemistry and machine learning communities for their interpretability. Their capacity and usage are somewhat limited by the fact that they normally work on categorical data. Improvements to known decision tree algorithms are usually carried out by increasing and tweaking parameters, as well as the post-processing of the class assignment. In this work we attempted to tackle both these issues. Firstly, conditional mutual information was used as the criterion for selecting the attribute on which to split instances. The algorithm performance was compared with the results of C4.5 (WEKA's J48) using default parameters and no restrictions. Two datasets were used for this purpose, DrugBank compounds for HRH1 binding prediction and Traditional Chinese Medicine formulation predicted bioactivities for therapeutic class annotation. Secondly, an automated binning method for continuous data was evaluated, namely Scott's normal reference rule, in order to allow any decision tree to easily handle continuous data. This was applied to all approved drugs in DrugBank for predicting the RDKit SLogP property, using the remaining RDKit physicochemical attributes as input.

  3. Comparative analysis of tree classification models for detecting fusarium oxysporum f. sp cubense (TR4) based on multi soil sensor parameters

    Science.gov (United States)

    Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica

    2017-09-01

    Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.

  4. LOCAL BINARIZATION FOR DOCUMENT IMAGES CAPTURED BY CAMERAS WITH DECISION TREE

    Directory of Open Access Journals (Sweden)

    Naser Jawas

    2012-07-01

    Full Text Available Character recognition in a document image captured by a digital camera requires a good binary image as the input for the separation the text from the background. Global binarization method does not provide such good separation because of the problem of uneven levels of lighting in images captured by cameras. Local binarization method overcomes the problem but requires a method to partition the large image into local windows properly. In this paper, we propose a local binariation method with dynamic image partitioning using integral image and decision tree for the binarization decision. The integral image is used to estimate the number of line in the document image. The number of line in the document image is used to devide the document into local windows. The decision tree makes a decision for threshold in every local window. The result shows that the proposed method can separate the text from the background better than using global thresholding with the best OCR result of the binarized image is 99.4%. Pengenalan karakter pada sebuah dokumen citra yang diambil menggunakan kamera digital membutuhkan citra yang terbinerisasi dengan baik untuk memisahkan antara teks dengan background. Metode binarisasi global tidak memberikan hasil pemisahan yang bagus karena permasalahan tingkat pencahayaan yang tidak seimbang pada citra hasil kamera digital. Metode binarisasi lokal dapat mengatasi permasalahan tersebut namun metode tersebut membutuhkan metode untuk membagi citra ke dalam bagian-bagian window lokal. Pada paper ini diusulkan sebuah metode binarisasi lokal dengan pembagian citra secara dinamis menggunakan integral image dan decision tree untuk keputusan binarisasi lokalnya. Integral image digunakan untuk mengestimasi jumlah baris teks dalam dokumen citra. Jumlah baris tersebut kemudian digunakan untuk membagi citra dokumen ke dalam window lokal. Keputusan nilai threshold untuk setiap window lokal ditentukan dengan decisiontree. Hasilnya menunjukkan

  5. Use of decision trees for evaluating severe accident management strategies in nuclear power plants

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moosung [Hanyang Univ., Seoul (Korea, Republic of). Dept. of Nuclerar Engineering; Lee, Yongjin; Jerng, Dong Wook [Chung-Ang Univ., Seoul (Korea, Republic of). School of Energy Systems Engineering

    2016-07-15

    Accident management strategies are defined to innovative actions taken by plant operators to prevent core damage or to maintain the sound containment integrity. Such actions minimize the chance of offsite radioactive substance leaks that lead to and intensify core damage under power plant accident conditions. Accident management extends the concept of Defense in Depth against core meltdown accidents. In pressurized water reactors, emergency operating procedures are performed to extend the core cooling time. The effectiveness of Severe Accident Management Guidance (SAMG) became an important issue. Severe accident management strategies are evaluated with a methodology utilizing the decision tree technique.

  6. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    Science.gov (United States)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  7. An Examination of Mathematically Gifted Students' Learning Styles by Decision Trees

    Directory of Open Access Journals (Sweden)

    Esra Aksoy

    2015-12-01

    Full Text Available The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all the variables used in the study had a significant effect on mathematically gifted students’ learning styles, but the most effective attribute found was intelligence type.

  8. Development of a decision tree to determine appropriateness of NVivo in analyzing qualitative data sets.

    Science.gov (United States)

    Auld, Garry W; Diker, Ann; Bock, M Ann; Boushey, Carol J; Bruhn, Christine M; Cluskey, Mary; Edlefsen, Miriam; Goldberg, Dena L; Misner, Scottie L; Olson, Beth H; Reicks, Marla; Wang, Changzheng; Zaghloul, Sahar

    2007-01-01

    A decision tree was developed to determine when NVivo is an appropriate tool for qualitative analysis. NVivo, a qualitative analysis software package, was used to analyze interviews of 204 Asian, Hispanic, and white parents in 12 states. The experience provided insight into issues that should be considered when deciding to use the software. NVivo can enhance the qualitative research process, quickly process queries, and expand analytical avenues. Before using, however, the following must be considered: training time, establishing inter-coder reliability, number and length of documents, coding time, coding structure, use of automated coding, and possible need for separate databases or additional supporting software.

  9. The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, Central Iran

    Science.gov (United States)

    Zaremotlagh, S.; Hezarkhani, A.

    2017-04-01

    Some evidences of rare earth elements (REE) concentrations are found in iron oxide-apatite (IOA) deposits which are located in Central Iranian microcontinent. There are many unsolved problems about the origin and metallogenesis of IOA deposits in this district. Although it is considered that felsic magmatism and mineralization were simultaneous in the district, interaction of multi-stage hydrothermal-magmatic processes within the Early Cambrian volcano-sedimentary sequence probably caused some epigenetic mineralizations. Secondary geological processes (e.g., multi-stage mineralization, alteration, and weathering) have affected on variations of major elements and possible redistribution of REE in IOA deposits. Hence, the geochemical behaviors and distribution patterns of REE are expected to be complicated in different zones of these deposits. The aim of this paper is recognizing LREE distribution patterns based on whole-rock chemical compositions and automatic discovery of their geochemical rules. For this purpose, the pattern recognition techniques including decision tree and neural network were applied on a high-dimensional geochemical dataset from Choghart IOA deposit. Because some data features were irrelevant or redundant in recognizing the distribution patterns of each LREE, a greedy attribute subset selection technique was employed to select the best subset of predictors used in classification tasks. The decision trees (CART algorithm) were pruned optimally to more accurately categorize independent test data than unpruned ones. The most effective classification rules were extracted from the pruned tree to describe the meaningful relationships between the predictors and different concentrations of LREE. A feed-forward artificial neural network was also applied to reliably predict the influence of various rock compositions on the spatial distribution patterns of LREE with a better performance than the decision tree induction. The findings of this study could be

  10. Method of decision tree applied in adopting the decision for promoting a company

    Directory of Open Access Journals (Sweden)

    Cezarina Adina TOFAN

    2015-09-01

    Full Text Available The decision can be defined as the way chosen from several possible to achieve an objective. An important role in the functioning of the decisional-informational system is held by the decision-making methods. Decision trees are proving to be very useful tools for taking financial decisions or regarding the numbers, where a large amount of complex information must be considered. They provide an effective structure in which alternative decisions and the implications of their choice can be assessed, and help to form a correct and balanced vision of the risks and rewards that may result from a certain choice. For these reasons, the content of this communication will review a series of decision-making criteria. Also, it will analyse the benefits of using the decision tree method in the decision-making process by providing a numerical example. On this basis, it can be concluded that the procedure may prove useful in making decisions for companies operating on markets where competition intensity is differentiated.

  11. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available – This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  12. Using decision-tree classifier systems to extract knowledge from databases

    Science.gov (United States)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  13. Klasifikasi Nilai Kelayakan Calon Debitur Baru Menggunakan Decision Tree C4.5

    Directory of Open Access Journals (Sweden)

    Bambang Hermanto

    2017-01-01

    Full Text Available In an effort to improve the quality of customer service, especially in terms of feasibility assessment of borrowers due to the increasing number of new prospective borrowers loans financing the purchase of a motor vehicle, then the company needs a decision making tool allowing you to easily and quickly estimate Where the debtor is able to pay off the loans. This study discusses the process generates C4.5 decision tree algorithm and utilizing the learning group of debtor financing dataset motorcycle. The decision tree is then interpreted into the form of decision rules that can be understood and used as a reference in processing the data of borrowers in determining the feasibility of prospective new borrowers. Feasibility value refers to the value of the destination parameter credit status. If the value of the credit is paid off status mean estimated prospective borrower is able to repay the loan in question, but if the credit status parameters estimated worth pull means candidates concerned debtor is unable to pay loans.. System testing is done by comparing the results of the testing data by learning data in three scenarios with the decision that the data is valid at over 70% for all case scenarios. Moreover, in generated tree  and generate rules takes fairly quickly, which is no more than 15 minutes for each test scenario

  14. Application of Decision Tree on Collision Avoidance System Design and Verification for Quadcopter

    Science.gov (United States)

    Chen, C.-W.; Hsieh, P.-H.; Lai, W.-H.

    2017-08-01

    The purpose of the research is to build a collision avoidance system with decision tree algorithm used for quadcopters. While the ultrasonic range finder judges the distance is in collision avoidance interval, the access will be replaced from operator to the system to control the altitude of the UAV. According to the former experiences on operating quadcopters, we can obtain the appropriate pitch angle. The UAS implement the following three motions to avoid collisions. Case1: initial slow avoidance stage, Case2: slow avoidance stage and Case3: Rapid avoidance stage. Then the training data of collision avoidance test will be transmitted to the ground station via wireless transmission module to further analysis. The entire decision tree algorithm of collision avoidance system, transmission data, and ground station have been verified in some flight tests. In the flight test, the quadcopter can implement avoidance motion in real-time and move away from obstacles steadily. In the avoidance area, the authority of the collision avoidance system is higher than the operator and implements the avoidance process. The quadcopter can successfully fly away from the obstacles in 1.92 meter per second and the minimum distance between the quadcopter and the obstacle is 1.05 meters.

  15. APPLICATION OF DECISION TREE ON COLLISION AVOIDANCE SYSTEM DESIGN AND VERIFICATION FOR QUADCOPTER

    Directory of Open Access Journals (Sweden)

    C.-W. Chen

    2017-08-01

    Full Text Available The purpose of the research is to build a collision avoidance system with decision tree algorithm used for quadcopters. While the ultrasonic range finder judges the distance is in collision avoidance interval, the access will be replaced from operator to the system to control the altitude of the UAV. According to the former experiences on operating quadcopters, we can obtain the appropriate pitch angle. The UAS implement the following three motions to avoid collisions. Case1: initial slow avoidance stage, Case2: slow avoidance stage and Case3: Rapid avoidance stage. Then the training data of collision avoidance test will be transmitted to the ground station via wireless transmission module to further analysis. The entire decision tree algorithm of collision avoidance system, transmission data, and ground station have been verified in some flight tests. In the flight test, the quadcopter can implement avoidance motion in real-time and move away from obstacles steadily. In the avoidance area, the authority of the collision avoidance system is higher than the operator and implements the avoidance process. The quadcopter can successfully fly away from the obstacles in 1.92 meter per second and the minimum distance between the quadcopter and the obstacle is 1.05 meters.

  16. Clinical elements that predict outcome after traumatic brain injury: a prospective multicenter recursive partitioning (decision-tree) analysis.

    Science.gov (United States)

    Brown, Allen W; Malec, James F; McClelland, Robyn L; Diehl, Nancy N; Englander, Jeffrey; Cifu, David X

    2005-10-01

    Traumatic brain injury (TBI) often presents clinicians with a complex combination of clinical elements that can confound treatment and make outcome prediction challenging. Predictive models have commonly used acute physiological variables and gross clinical measures to predict mortality and basic outcome endpoints. The primary goal of this study was to consider all clinical elements available concerning a survivor of TBI admitted for inpatient rehabilitation, and identify those factors that predict disability, need for supervision, and productive activity one year after injury. The Traumatic Brain Injury Model Systems (TBIMS) database was used for decision tree analysis using recursive partitioning (n = 3463). Outcome measures included the Functional Independence Measure(), the Disability Rating Scale, the Supervision Rating Scale, and a measure of productive activity. Predictor variables included all physical examination elements, measures of injury severity (initial Glasgow Coma Scale score, duration of post-traumatic amnesia [PTA], length of coma, CT scan pathology), gender, age, and years of education. The duration of PTA, age, and most elements of the physical examination were predictive of early disability. The duration of PTA alone was selected to predict late disability and independent living. The duration of PTA, age, sitting balance, and limb strength were selected to predict productive activity at 1 year. The duration of PTA was the best predictor of outcome selected in this model for all endpoints and elements of the physical examination provided additional predictive value. Valid and reliable measures of PTA and physical impairment after TBI are important for accurate outcome prediction.

  17. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    Science.gov (United States)

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  18. The effect of the fragmentation problem in decision tree learning applied to the search for single top quark production

    International Nuclear Information System (INIS)

    Vilalta, R; Ocegueda-Hernandez, F; Valerio, R; Watts, G

    2010-01-01

    Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.

  19. Determining The Effect of Some Mechanical Properties on Color Maturity of Tomato With K-Star, Random Forest and Decision Tree (C4.5 Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Hande Küçükönder

    2015-02-01

    Full Text Available This study was conducted in order to determine the effect of the mechanical properties such as maximum force at the skin rupture point, energy at the skin rupture point and the skin firmness on color maturity of tomato by supervised learning algorithms of data mining. In the present study, a total of 88 tomato samples were used, and color measurements for each tomato in 4 different equatorial regions were performed and a total of 352 color measurement units were used. In the classification processes performed according to these mechanical properties, K-Star, Random Forest and Decision Tree (C4.5 algorithms of data mining were utilized, and in the comparison of comprising classification models, Root Mean Square Error (RMSE, Mean absolute error (MAE, Root relative squared error (RRSE and Relative absolute error (RAE values, which are some of the criteria of error variance, were considered to be low, while the classification accuracy rate was considered to be high. As a result of the comparison made, the classification model formed according to K-Star instance-based algorithm [MAE: 0.004, RMSE: 0.006, %RAE: 1.73, %RRSE: 1.70] has been found to be a better classifier compared to the others. With the classification made according to K-Star algorithm, the maximum force at the skin rupture point on the degree of maturity of tomato and the skin firmness were found to be green, light red, and their effects are non-significant during the color conversion periods, and found significant during other periods while the energy at the skin rupture point is only pink and has been to be significant during the color conversion stages and non-significant during other stages.

  20. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  1. A Novel Treatment Decision Tree and Literature Review of Retrograde Peri-Implantitis.

    Science.gov (United States)

    Sarmast, Nima D; Wang, Howard H; Soldatos, Nikolaos K; Angelov, Nikola; Dorn, Samuel; Yukna, Raymond; Iacono, Vincent J

    2016-12-01

    Although retrograde peri-implantitis (RPI) is not a common sequela of dental implant surgery, its prevalence has been reported in the literature to be 0.26%. Incidence of RPI is reported to increase to 7.8% when teeth adjacent to the implant site have a previous history of root canal therapy, and it is correlated with distance between implant and adjacent tooth and/or with time from endodontic treatment of adjacent tooth to implant placement. Minimum 2 mm space between implant and adjacent tooth is needed to decrease incidence of apical RPI, with minimum 4 weeks between completion of endodontic treatment and actual implant placement. The purpose of this study is to compile all available treatment modalities and to provide a decision tree as a general guide for clinicians to aid in diagnosis and treatment of RPI. Literature search was performed for articles published in English on the topic of RPI. Articles selected were case reports with study populations ranging from 1 to 32 patients. Any case report or clinical trial that attempted to treat or rescue an implant diagnosed with RPI was included. Predominant diagnostic presentation of a lesion was presence of sinus tract at buccal or facial abscess of apical portion of implant, and subsequent periapical radiographs taken demonstrated a radiolucent lesion. On the basis of case reports analyzed, RPI was diagnosed between 1 week and 4 years after implant placement. Twelve of 20 studies reported that RPI lesions were diagnosed within 6 months after implant placement. A step-by-step decision tree is provided to allow clinicians to triage and properly manage cases of RPI on the basis of recommendations and successful treatments provided in analyzed case reports. It is divided between symptomatic and asymptomatic implants and adjacent teeth with vital and necrotic pulps. Most common etiology of apical RPI is endodontic infection from neighboring teeth, which was diagnosed within 6 months after implant placement. Most

  2. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    Science.gov (United States)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  3. Application of decision trees in the identification of patterns of fatal injuries by external cause in the municipality of Pasto, Colombia

    Directory of Open Access Journals (Sweden)

    Ricardo Timaran-Pereira

    2017-12-01

    Full Text Available Introduction: The Pan American Health Organization (PHO and the World Health Organization (WHO accepted, since the year 1993 and 1996 respectively, that violence is a public health problem, a situation that is corroborated in the report on violence and health, in which Latin America presented a homicide rate of 18 per 100,000 people, and it is considered one of the most violent regions in the world. Objective: To detect criminal patterns with data mining techniques in the Crime Observatory of the municipality of Pasto (Colombia. Materials and methods: Cross Industry Standard Process for Data Mining (CRISP-DM was applied, which is one of the methodologies used in the development of data mining projects in academic and industrial environments. The source of information was the Crime Observatory of the municipality of Pasto, where the historical clean and transformed figures on the injuries of external cause (fatal and nonfatal recorded in 11 years are stored. Results: A decision tree-based classification model was built that allowed the discovery of patterns of deaths from external causes. In the case of homicide, these happened mostly in the commune 5 in Pasto under the following circumstances: during the weekends, in the early morning, in the second semester of the year and in the public thoroughfare; besides, the victims were adult men of various professions; and the cause of the homicides were quarrels and they were produced with a fire gun. Conclusion: The generated knowledge will help government and security agencies make effective decisions regarding the implementation of crime prevention and citizen security plans

  4. New energy opinion leaders' lifestyles and media usage - applying data mining decision tree analysis for UNIDO - ICHET web site users

    International Nuclear Information System (INIS)

    Tsai, M.; Veziroglu, A.; Warren, S.; Que, Y.

    2007-01-01

    According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns

  5. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees.

    Science.gov (United States)

    Huys, Quentin J M; Eshel, Neir; O'Nions, Elizabeth; Sheridan, Luke; Dayan, Peter; Roiser, Jonathan P

    2012-01-01

    When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.

  6. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees.

    Directory of Open Access Journals (Sweden)

    Quentin J M Huys

    Full Text Available When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.

  7. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  8. Prediction of heart disease using apache spark analysing decision trees and gradient boosting algorithm

    Science.gov (United States)

    Chugh, Saryu; Arivu Selvan, K.; Nadesh, RK

    2017-11-01

    Numerous destructive things influence the working arrangement of human body as hypertension, smoking, obesity, inappropriate medication taking which causes many contrasting diseases as diabetes, thyroid, strokes and coronary diseases. The impermanence and horribleness of the environment situation is also the reason for the coronary disease. The structure of Apache start relies on the evolution which requires gathering of the data. To break down the significance of use programming focused on data structure the Apache stop ought to be utilized and it gives various central focuses as it is fast in light as it uses memory worked in preparing. Apache Spark continues running on dispersed environment and chops down the data in bunches giving a high profitability rate. Utilizing mining procedure as a part of the determination of coronary disease has been exhaustively examined indicating worthy levels of precision. Decision trees, Neural Network, Gradient Boosting Algorithm are the various apache spark proficiencies which help in collecting the information.

  9. STRENGTHS AND WEAKNESSES OF SMES LISTED IN ISE: A CHAID DECISION TREE APPLICATION

    Directory of Open Access Journals (Sweden)

    ALİ SERHAN KOYUNCUGİL

    2013-06-01

    Full Text Available The aim of this study is to detect the strength and weakness of SMEs which have a significant position in globalization. 697 SMEs listed in the İstanbul Stock Exchange (ISE during the years 2000-2005 were covered in the study. Data Mining method, which can be describe as a collection of techniques that aim to find useful but undiscovered patterns in collected and  Chi-Square Automatic Interaction Detector (CHAID decision tree algorithms, one of the data mining method was used for segmentation in the study. As a result of the study, SMEs listed in the ISE were categorized in 19 different profiles by the CHAID and it was founded that strengths and weakness of the SMEs were identified by strategies of  the equity and assets productivity, financing fixed assets, management of accounts receivables and liquidity

  10. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  11. Multi-output decision trees for lesion segmentation in multiple sclerosis

    Science.gov (United States)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  12. Models, methods and software for distributed knowledge acquisition for the automated construction of integrated expert systems knowledge bases

    International Nuclear Information System (INIS)

    Dejneko, A.O.

    2011-01-01

    Based on an analysis of existing models, methods and means of acquiring knowledge, a base method of automated knowledge acquisition has been chosen. On the base of this method, a new approach to integrate information acquired from knowledge sources of different typologies has been proposed, and the concept of a distributed knowledge acquisition with the aim of computerized formation of the most complete and consistent models of problem areas has been introduced. An original algorithm for distributed knowledge acquisition from databases, based on the construction of binary decision trees has been developed [ru

  13. On P versus NP \\cap co-NP for Decision Trees and Read-Once Branching Programs

    Czech Academy of Sciences Publication Activity Database

    Jukna, S.; Razborov, A.; Savický, Petr; Wegener, I.

    1999-01-01

    Roč. 8, č. 4 (1999), s. 357-370 ISSN 1016-3328 R&D Projects: GA ČR GA201/95/0976 Institutional research plan: AV0Z1030915 Keywords : computational complexity * Boolean functions * decision trees * branching programs * P versus NP intersection co-NP Subject RIV: BA - General Mathematics Impact factor: 0.161, year: 1999

  14. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study...

  15. Inductive Decision Tree Analysis of the Validity Rank of Construction Parameters of Innovative Gear Pump after Tooth Root Undercutting

    Directory of Open Access Journals (Sweden)

    Deptuła A.

    2017-02-01

    Full Text Available The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.

  16. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  17. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    Science.gov (United States)

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  18. Inductive Decision Tree Analysis of the Validity Rank of Construction Parameters of Innovative Gear Pump after Tooth Root Undercutting

    Science.gov (United States)

    Deptuła, A.; Partyka, M. A.

    2017-02-01

    The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.

  19. Investigation on the Expansion of Urban Construction Land Use Based on the CART-CA Model

    Directory of Open Access Journals (Sweden)

    Yongxiang Yao

    2017-05-01

    Full Text Available Change in urban construction land use is an important factor when studying urban expansion. Many scholars have combined cellular automata (CA with data mining algorithms to perform relevant simulation studies. However, the parameters for rule extraction are difficult to determine and the rules are simplex, and together, these factors tend to introduce excessive fitting problems and low modeling accuracy. In this paper, we propose a method to extract the transformation rules for a CA model based on the Classification and Regression Tree (CART. In this method, CART is used to extract the transformation rules for the CA. This method first adopts the CART decision tree using the bootstrap algorithm to mine the rules from the urban land use while considering the factors that impact the geographic spatial variables in the CART regression procedure. The weights of individual impact factors are calculated to generate a logistic regression function that reflects the change in urban construction land use. Finally, a CA model is constructed to simulate and predict urban construction land expansion. The urban area of Xinyang City in China is used as an example for this experimental research. After removing the spatial invariant region, the overall simulation accuracy is 81.38% and the kappa coefficient is 0.73. The results indicate that by using the CART decision tree to train the impact factor weights and extract the rules, it can effectively increase the simulation accuracy of the CA model. From convenience and accuracy perspectives for rule extraction, the structure of the CART decision tree is clear, and it is very suitable for obtaining the cellular rules. The CART-CA model has a relatively high simulation accuracy in modeling urban construction land use expansion, it provides reliable results, and is suitable for use as a scientific reference for urban construction land use expansion.

  20. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  1. The risk evaluation of difficult substances in USES 2.0 and EUSES. A decision tree for data gap filling of Kow, Koc and BCF

    NARCIS (Netherlands)

    Beelen P van; ECO

    2000-01-01

    This report presents a decision tree for the risk evaluation of the so-called "difficult" substances with the Uniform System for the Evaluation of Substances (USES). The decision tree gives practical guidelines for the regulatory authorities to evaluate notified substances like organometallic

  2. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    Science.gov (United States)

    Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen

    2017-04-01

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241Am-9Be and 252Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions.

  3. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  4. Decision tree analysis as a supplementary tool to enhance histomorphological differentiation when distinguishing human from non-human cranial bone in both burnt and unburnt states: A feasibility study.

    Science.gov (United States)

    Simmons, T; Goodburn, B; Singhrao, S K

    2016-01-01

    This feasibility study was undertaken to describe and record the histological characteristics of burnt and unburnt cranial bone fragments from human and non-human bones. Reference series of fully mineralized, transverse sections of cranial bone, from all variables and specimen states, were prepared by manual cutting and semi-automated grinding and polishing methods. A photomicrograph catalogue reflecting differences in burnt and unburnt bone from human and non-humans was recorded and qualitative analysis was performed using an established classification system based on primary bone characteristics. The histomorphology associated with human and non-human samples was, for the main part, preserved following burning at high temperature. Clearly, fibro-lamellar complex tissue subtypes, such as plexiform or laminar primary bone, were only present in non-human bones. A decision tree analysis based on histological features provided a definitive identification key for distinguishing human from non-human bone, with an accuracy of 100%. The decision tree for samples where burning was unknown was 96% accurate, and multi-step classification to taxon was possible with 100% accuracy. The results of this feasibility study strongly suggest that histology remains a viable alternative technique if fragments of cranial bone require forensic examination in both burnt and unburnt states. The decision tree analysis may provide an additional but vital tool to enhance data interpretation. Further studies are needed to assess variation in histomorphology taking into account other cranial bones, ontogeny, species and burning conditions. © The Author(s) 2015.

  5. Calibrating emergent phenomena in stock markets with agent based models.

    Science.gov (United States)

    Fievet, Lucas; Sornette, Didier

    2018-01-01

    Since the 2008 financial crisis, agent-based models (ABMs), which account for out-of-equilibrium dynamics, heterogeneous preferences, time horizons and strategies, have often been envisioned as the new frontier that could revolutionise and displace the more standard models and tools in economics. However, their adoption and generalisation is drastically hindered by the absence of general reliable operational calibration methods. Here, we start with a different calibration angle that qualifies an ABM for its ability to achieve abnormal trading performance with respect to the buy-and-hold strategy when fed with real financial data. Starting from the common definition of standard minority and majority agents with binary strategies, we prove their equivalence to optimal decision trees. This efficient representation allows us to exhaustively test all meaningful single agent models for their potential anomalous investment performance, which we apply to the NASDAQ Composite index over the last 20 years. We uncover large significant predictive power, with anomalous Sharpe ratio and directional accuracy, in particular during the dotcom bubble and crash and the 2008 financial crisis. A principal component analysis reveals transient convergence between the anomalous minority and majority models. A novel combination of the optimal single-agent models of both classes into a two-agents model leads to remarkable superior investment performance, especially during the periods of bubbles and crashes. Our design opens the field of ABMs to construct novel types of advanced warning systems of market crises, based on the emergent collective intelligence of ABMs built on carefully designed optimal decision trees that can be reversed engineered from real financial data.

  6. Calibrating emergent phenomena in stock markets with agent based models

    Science.gov (United States)

    Sornette, Didier

    2018-01-01

    Since the 2008 financial crisis, agent-based models (ABMs), which account for out-of-equilibrium dynamics, heterogeneous preferences, time horizons and strategies, have often been envisioned as the new frontier that could revolutionise and displace the more standard models and tools in economics. However, their adoption and generalisation is drastically hindered by the absence of general reliable operational calibration methods. Here, we start with a different calibration angle that qualifies an ABM for its ability to achieve abnormal trading performance with respect to the buy-and-hold strategy when fed with real financial data. Starting from the common definition of standard minority and majority agents with binary strategies, we prove their equivalence to optimal decision trees. This efficient representation allows us to exhaustively test all meaningful single agent models for their potential anomalous investment performance, which we apply to the NASDAQ Composite index over the last 20 years. We uncover large significant predictive power, with anomalous Sharpe ratio and directional accuracy, in particular during the dotcom bubble and crash and the 2008 financial crisis. A principal component analysis reveals transient convergence between the anomalous minority and majority models. A novel combination of the optimal single-agent models of both classes into a two-agents model leads to remarkable superior investment performance, especially during the periods of bubbles and crashes. Our design opens the field of ABMs to construct novel types of advanced warning systems of market crises, based on the emergent collective intelligence of ABMs built on carefully designed optimal decision trees that can be reversed engineered from real financial data. PMID:29499049

  7. Proposition of a multicriteria model to select logistics services providers

    Directory of Open Access Journals (Sweden)

    Miriam Catarina Soares Aharonovitz

    2014-06-01

    Full Text Available This study aims to propose a multicriteria model to select logistics service providers by the development of a decision tree. The methodology consists of a survey, which resulted in a sample of 181 responses. The sample was analyzed using statistic methods, descriptive statistics among them, multivariate analysis, variance analysis, and parametric tests to compare means. Based on these results, it was possible to obtain the decision tree and information to support the multicriteria analysis. The AHP (Analytic Hierarchy Process was applied to determine the data influence and thus ensure better consistency in the analysis. The decision tree categorizes the criteria according to the decision levels (strategic, tactical and operational. Furthermore, it allows to generically evaluate the importance of each criterion in the supplier selection process from the point of view of logistics services contractors.

  8. [Identification of subgroups with lower level of stroke knowledge using decision-tree analysis].

    Science.gov (United States)

    Kim, Hyun Kyung; Jeong, Seok Hee; Kang, Hyun Cheol

    2014-02-01

    This study was performed to explore levels of stroke knowledge and identify subgroups with lower levels of stroke knowledge among adults in Korea. A cross-sectional survey was used and data were collected in 2012. A national sample of 990 Koreans aged 20 to 74 years participated in this study. Knowledge of risk factors, warning signs, and first action for stroke were surveyed using face-to-face interviews. Descriptive statistics and decision tree analysis were performed using SPSS WIN 20.0 and Answer Tree 3.1. Mean score for stroke risk factor knowledge was 7.7 out of 10. The least recognized risk factor was diabetes and four subgroups with lower levels of knowledge were identified. Score for knowledge of stroke warning signs was 3.6 out of 6. The least recognized warning sign was sudden severe headache and six subgroups with lower levels of knowledge were identified. The first action for stroke was recognized by 65.7 percent of participants and four subgroups with lower levels of knowledge were identified. Multi-faceted education should be designed to improve stroke knowledge among Korean adults, particularly focusing on subgroups with lower levels of knowledge and less recognition of items in this study.

  9. Sediment source fingerprinting as an aid to catchment management: A review of the current state of knowledge and a methodological decision-tree for end-users

    Science.gov (United States)

    Collins, A.L; Pulley, S.; Foster, I.D.L; Gellis, Allen; Porto, P.; Horowitz, A.J.

    2017-01-01

    The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach.

  10. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  11. Decision tree analysis to stratify risk of de novo non-melanoma skin cancer following liver transplantation.

    Science.gov (United States)

    Tanaka, Tomohiro; Voigt, Michael D

    2018-03-01

    Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2  = 0.971, p risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p risk of developing NMSC in the long-term after LT.

  12. Domain-Based Predictive Models for Protein-Protein Interaction Prediction

    Directory of Open Access Journals (Sweden)

    Chen Xue-Wen

    2006-01-01

    Full Text Available Protein interactions are of biological interest because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Recently, methods for predicting protein interactions using domain information are proposed and preliminary results have demonstrated their feasibility. In this paper, we develop two domain-based statistical models (neural networks and decision trees for protein interaction predictions. Unlike most of the existing methods which consider only domain pairs (one domain from one protein and assume that domain-domain interactions are independent of each other, the proposed methods are capable of exploring all possible interactions between domains and make predictions based on all the domains. Compared to maximum-likelihood estimation methods, our experimental results show that the proposed schemes can predict protein-protein interactions with higher specificity and sensitivity, while requiring less computation time. Furthermore, the decision tree-based model can be used to infer the interactions not only between two domains, but among multiple domains as well.

  13. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    International Nuclear Information System (INIS)

    Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen

    2017-01-01

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241 Am- 9 Be and 252 Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions. - Highlights: • The neutron pulse height distribution was simulated using MCNPX-ESUT. • The energy spectrum of the neutron source was unfolded using GMDH. • The energy spectrum of the neutron source was unfolded using

  14. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Hosseini, Seyed Abolfazl, E-mail: sahosseini@sharif.edu [Department of Energy Engineering, Sharif University of Technology, Tehran 8639-11365 (Iran, Islamic Republic of); Afrakoti, Iman Esmaili Paeen [Faculty of Engineering & Technology, University of Mazandaran, Pasdaran Street, P.O. Box: 416, Babolsar 47415 (Iran, Islamic Republic of)

    2017-04-11

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The {sup 241}Am-{sup 9}Be and {sup 252}Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions. - Highlights: • The neutron pulse height distribution was simulated using MCNPX-ESUT. • The energy spectrum of the neutron source was unfolded using GMDH. • The energy spectrum of the neutron source was

  15. Improving Rolling Bearing Fault Diagnosis by DS Evidence Theory Based Fusion Model

    Directory of Open Access Journals (Sweden)

    Xuemei Yao

    2017-01-01

    Full Text Available Rolling bearing plays an important role in rotating machinery and its working condition directly affects the equipment efficiency. While dozens of methods have been proposed for real-time bearing fault diagnosis and monitoring, the fault classification accuracy of existing algorithms is still not satisfactory. This work presents a novel algorithm fusion model based on principal component analysis and Dempster-Shafer evidence theory for rolling bearing fault diagnosis. It combines the advantages of the learning vector quantization (LVQ neural network model and the decision tree model. Experiments under three different spinning bearing speeds and two different crack sizes show that our fusion model has better performance and higher accuracy than either of the base classification models for rolling bearing fault diagnosis, which is achieved via synergic prediction from both types of models.

  16. Prediction of different types of liver diseases using rule based classification model.

    Science.gov (United States)

    Kumar, Yugal; Sahoo, G

    2013-01-01

    Diagnosing different types of liver diseases clinically is a quite hectic process because patients have to undergo large numbers of independent laboratory tests. On the basis of results and analysis of laboratory test, different liver diseases are classified. Hence to simplify this complex process, we have developed a Rule Base Classification Model (RBCM) to predict different types of liver diseases. The proposed model is the combination of rules and different data mining techniques. The objective of this paper is to propose a rule based classification model with machine learning techniques for the prediction of different types of Liver diseases. A dataset was developed with twelve attributes that include the records of 583 patients in which 441 patients were male and rests were female. Support Vector Machine (SVM), Rule Induction (RI), Decision Tree (DT), Naive Bayes (NB) and Artificial Neural Network (ANN) data mining techniques with K-cross fold technique are used with the proposed model for the prediction of liver diseases. The performance of these data mining techniques are evaluated with accuracy, sensitivity, specificity and kappa parameters as well as statistical techniques (ANOVA and Chi square test) are used to analyze the liver disease dataset and independence of attributes. Out of 583 patients, 416 patients are liver diseases affected and rests of 167 patients are healthy. The proposed model with decision tree (DT) technique provides the better result among all techniques (RI, SVM, ANN and NB) with all parameters (Accuracy 98.46%, Sensitivity 95.7%, Specificity 95.28% and Kappa 0.983) while the SVM exhibits poor performance (Accuracy 82.33%, Sensitivity 68.03%, Specificity 91.28% and Kappa 0.801). It is also found that the best performance of the model without rules (RI, Accuracy 82.68%, Sensitivity 86.34%, Specificity 90.51% and Kappa 0.619) is almost similar to the worst performance of the rule based classification model (SVM, Accuracy 82

  17. Forecasting Reading Anxiety for Promoting English-Language Reading Performance Based on Reading Annotation Behavior

    Science.gov (United States)

    Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao

    2016-01-01

    To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…

  18. A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree

    OpenAIRE

    Suresh Ramakrishnan; Maryam Mirzaei; Mahmoud Bekri

    2015-01-01

    The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector in...

  19. A combined neural network and decision trees model for prognosis of breast cancer relapse.

    Science.gov (United States)

    Jerez-Aragonés, José M; Gómez-Ruiz, José A; Ramos-Jiménez, Gonzalo; Muñoz-Pérez, José; Alba-Conejo, Emilio

    2003-01-01

    The prediction of clinical outcome of patients after breast cancer surgery plays an important role in medical tasks such as diagnosis and treatment planning. Different prognostic factors for breast cancer outcome appear to be significant predictors for overall survival, but probably form part of a bigger picture comprising many factors. Survival estimations are currently performed by clinicians using the statistical techniques of survival analysis. In this sense, artificial neural networks are shown to be a powerful tool for analysing datasets where there are complicated non-linear interactions between the input data and the information to be predicted. This paper presents a decision support tool for the prognosis of breast cancer relapse that combines a novel algorithm TDIDT (control of induction by sample division method, CIDIM), to select the most relevant prognostic factors for the accurate prognosis of breast cancer, with a system composed of different neural networks topologies that takes as input the selected variables in order for it to reach good correct classification probability. In addition, a new method for the estimate of Bayes' optimal error using the neural network paradigm is proposed. Clinical-pathological data were obtained from the Medical Oncology Service of the Hospital Clinico Universitario of Málaga, Spain. The results show that the proposed system is an useful tool to be used by clinicians to search through large datasets seeking subtle patterns in prognostic factors, and that may further assist the selection of appropriate adjuvant treatments for the individual patient.

  20. Identifying Characteristics of High School Dropouts: Data Mining with A Decision Tree Model

    Science.gov (United States)

    Veitch, William Robert.

    2004-01-01

    The notion that all students should finish high school has grown throughout the last century and continues to be an important goal for all educational levels in this new century. Non-completion has been related to all sorts of social, financial, and psychological issues. Many studies have attempted to put together a process that will identify…

  1. Manufacturing Decision Tree Model Optimization for Finishing Additive Manufactured Components, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — This Phase I program addresses the challenge of gaining the necessary knowledge needed to support certification of additive manufacturing (AM) hardware and achieving...

  2. Analisis Perbandingan Teknik Support Vector Regression (SVR) Dan Decision Tree C4.5 Dalam Data Mining

    OpenAIRE

    Astuti, Yuniar Andi

    2011-01-01

    This study examines techniques Support Vector Regression and Decision Tree C4.5 has been used in studies in various fields, in order to know the advantages and disadvantages of both techniques that appear in Data Mining. From the ten studies that use both techniques, the results of the analysis showed that the accuracy of the SVR technique for 59,64% and C4.5 for 76,97% So in this study obtained a statement that C4.5 is better than SVR 097038020

  3. Perbandingan Analisis Klasifikasi Antara Decision Tree Dan Support Vector Machine Multiclass Untuk Penentuan Jurusan Pada Siswa SMA

    OpenAIRE

    Putranto, Rizky Ade; Wuryandari, Triastuti; Sudarno, Sudarno

    2015-01-01

    Data mining is a process that employs one or more of Machine Learning techniques to analyze and extract knowledge automatically. Analysis of data mining is to determine the classification of a new data record into one of several categories that have been defined previously, also known as Supervised Learning. Classification Decision Tree is one of the well-known technique in data mining and is one of the popular methods in the decision making process of a case in which the method is obtained e...

  4. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  5. Using Decision Trees to Examine Relationships between Inter-Annual Vegetation Variability, Topographic Attributes, and Climate Signals

    Science.gov (United States)

    White, A. B.; Kumar, P.

    2003-12-01

    The objective of this research is to develop KDD (knowledge discovery in databases) techniques for spatio-temporal geo-data, and use these techniques to examine inter-annual vegetation health signals. The underlying hypothesis of the research is that the signatures of inter-annual variability of climate on vegetation dynamics as represented by the statistical descriptors of vegetation index variations depend upon a variety of attributes related to the topography, hydrology, physiography, and climate. NDVI (normalized differential vegetation index) is enlisted to represent vegetation health and relationships between this index and topographic attributes such as elevation, slope, aspect, compound topographic index (CTI), and the proximity to a stream, are analyzed. Several scientific questions related to the identification and characterization of the inter-annual variability ensue as a consequence of our hypothesis. Investigations were performed using 13 years of 1-km resolution NDVI data from the AVHRR instrument on NOAA's POES (polar-orbiting operational environmental satellite) over the continental U.S. Various temporal change indices were used in order to identify anomalous inter-annual behavior in the NDVI index, including maximum absolute and relative deviations from the 13-year mean and positive and negative persistence indices (after Zhou et al., 2001). The KDD technique used in this research is the decision tree, which falls under the classification and prediction division of data mining techniques. The algorithm is similar to c4.5 and id3, but can handle continuous input and output values without binning and is optimized to determine the minimum error. Future work will incorporate clustering algorithms (both distance and density-based) and association rule algorithms (constraint-based) adapted for spatial-temporal data. Investigations will also be performed at smaller spatial scales, integrating higher resolution data. Throughout the growing season

  6. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  7. Discrimination of Crop and Weeds on Visible and Visible/Near-Infrared Spectrums Using Support Vector Machine, Artificial Neural Network and Decision Tree

    Directory of Open Access Journals (Sweden)

    Wei Deng

    2014-03-01

    Full Text Available Weeds are regarded as farmers' natural enemy. In order to avoid excessive pesticide residues, the destruction of ecological environment, and to guarantee the quality and safety of agricultural products, it is urgent to develop highly-efficient weed management methods. Amongst, weed discrimination is the key part. There have been a lot of researches on weed detection/discrimination using spectral measurement on plant leaf/canopy. However, as reported so far the spectral ranges from the researches were not consistent and no research was reported to determine more efficient wavelength range for weed classification. Some researchers adopted visible spectrum, some adopted near-infrared spectrum, the others adopted both visible and near-infrared spectrum. The purpose of this study was to compare the classifications of the spectral reflectance in range of 350 ~ 760 nm and in 350 ~ 2500 nm for crop/weed discrimination. Through spectral analysis of these data respectively using three kinds of modeling methods of Support Vector Machines (SVMs, Artificial Neural Network (ANN, and Decision Tree (DT, the results showed that the three classifiers could differentiate crop and weeds better in 350 ~ 760 nm wavelength range than in 350 ~ 2500 nm. Therefore, the visible wavelength range could be good enough to meet the requirement for crop/weed spectral discrimination, which might reduce the cost of weed detect sensors.

  8. Sediment source fingerprinting as an aid to catchment management: A review of the current state of knowledge and a methodological decision-tree for end-users.

    Science.gov (United States)

    Collins, A L; Pulley, S; Foster, I D L; Gellis, A; Porto, P; Horowitz, A J

    2017-06-01

    The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Discrimination of the sensory quality of the Coffea arabica L. (cv. Yellow Bourbon) produced in different altitudes using decision trees obtained by the CHAID method.

    Science.gov (United States)

    Ramos, Mariana Figueira; Ribeiro, Diego Egídio; Cirillo, Marcelo Ângelo; Borém, Flávio Meira

    2016-08-01

    Knowledge of the sensory profile of coffee quality, associated with genetic and environmental factors, is of utmost importance for the international market, as well as for the productive sector. In this context, the goal of this study was to classify the quality of Coffea arabica L., cv. Yellow Bourbon, according to different scores obtained through sensory evaluations based on the Specialty Coffee Association of America protocol (SCAA), and by means of decision trees resulting from applying the CHAID method (chi-square automatic interaction detection). To that end, we used a database with the sensory characteristics of cv. Yellow Bourbon and the environmental characteristics of the Mantiqueira de Minas region, State of Minas Gerais, Brazil. The method used exhibited promising results regarding accuracy and success rates in order to discriminate coffee sensory quality as a function of the production environment. The results obtained clearly show the effect of the coffee growing environment on the Yellow Bourbon variety, resulting in notable sensory differences in the beverage. It was possible to discriminate cv. Yellow Bourbon coffee samples, the sensory evaluations of which resulted in scores of ≥88 points, which are associated with growing environments at altitudes of ≥1200 m. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  10. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    Science.gov (United States)

    Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

    2018-01-01

    Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  11. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    Directory of Open Access Journals (Sweden)

    Iliev Iliycho

    2018-01-01

    Full Text Available Subject of investigation is a new high-powered strontium bromide (SrBr2 vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  12. A Decision Tree Analysis to Support Potential Climate Change Adaptations of Striped Catfish (Pangasianodon hypophthalmus Sauvage) Farming in the Mekong Delta, Vietnam

    NARCIS (Netherlands)

    Nguyen, L.A.; Verreth, J.A.J.; Leemans, H.B.J.; Bosma, R.H.; Silva, De S.

    2016-01-01

    This study uses the decision tree framework to analyse possible climate change impact adaptation options for pangasius (Pangasianodon hypopthalmus Sauvage) farming in the Mekong Delta. Here we present the risks for impacts and the farmers' autonomous and planned public adaptation by using primary

  13. Decline curve based models for predicting natural gas well performance

    Directory of Open Access Journals (Sweden)

    Arash Kamari

    2017-06-01

    Full Text Available The productivity of a gas well declines over its production life as cannot cover economic policies. To overcome such problems, the production performance of gas wells should be predicted by applying reliable methods to analyse the decline trend. Therefore, reliable models are developed in this study on the basis of powerful artificial intelligence techniques viz. the artificial neural network (ANN modelling strategy, least square support vector machine (LSSVM approach, adaptive neuro-fuzzy inference system (ANFIS, and decision tree (DT method for the prediction of cumulative gas production as well as initial decline rate multiplied by time as a function of the Arps' decline curve exponent and ratio of initial gas flow rate over total gas flow rate. It was concluded that the results obtained based on the models developed in current study are in satisfactory agreement with the actual gas well production data. Furthermore, the results of comparative study performed demonstrates that the LSSVM strategy is superior to the other models investigated for the prediction of both cumulative gas production, and initial decline rate multiplied by time.

  14. Cost comparison of MRSA screening and management – a decision tree analysis

    Directory of Open Access Journals (Sweden)

    Tübbicke Andrea

    2012-12-01

    Full Text Available Abstract Background Methicillin-resistant Staphylococcus aureus (MRSA infections represent a serious challenge for health-care institutions. Rapid and precise identification of MRSA carriers can help to reduce both nosocomial transmissions and unnecessary isolations and associated costs. The practical details of MRSA screenings (who, how, when and where to screen remain a controversial issue. Methods Aim of this study was to determine which MRSA screening and management strategy causes the lowest expected cost for a hospital. For this cost analysis a decision analytic cost model was developed, primary based on data from peer-reviewed literature. Single and multiplex sensitivity analyses of the parameters “costs per MRSA case per day”, “costs for pre-emptive isolation per day”, “MRSA rate of transmission not in isolation per day” and “MRSA prevalence” were conducted. Results The omission of MRSA screening was identified as the alternative with the highest risk for the hospital. Universal MRSA screening strategies are by far more cost-intensive than targeted screening approaches. Culture confirmation of positive PCR results in combination with pre-emptive isolation generates the lowest costs for a hospital. This strategy minimizes the chance of false-positive results as well as the possibility of MRSA cross transmissions and therefore contains the costs for the hospital. These results were confirmed by multiplex and single sensitivity analyses. Single sensitivity analyses have shown that the parameters “MRSA prevalence” and the “rate of MRSA of transmission per day of non-isolated patients” exert the greatest influence on the choice of the favorite screening strategy. Conclusions It was shown that universal MRSA screening strategies are far more cost-intensive than the targeted screening approaches. In addition, it was demonstrated that all targeted screening strategies produce lower costs than not performing a screening at

  15. Fast Screening Technology for Drug Emergency Management: Predicting Suspicious SNPs for ADR with Information Theory-based Models.

    Science.gov (United States)

    Liang, Zhaohui; Liu, Jun; Huang, Jimmy Xiangji; Zeng, Xing

    2018-01-14

    The genetic polymorphism of Cytochrome P450 (CYP 450) is considered as one of the main causes for adverse drug reactions (ADRs). In order to explore the latent correlations between ADRs and potentially corresponding single-nucleotide polymorphism (SNPs) in CYP450, three algorithms based on information theory are used as the main method to predict the possible relation. The study uses a retrospective case-control study to explore the potential relation of ADRs to specific genomic locations and single-nucleotide polymorphism (SNP). The genomic data collected from 53 healthy volunteers are applied for the analysis, another group of genomic data collected from 30 healthy volunteers excluded from the study are used as the control group. The SNPs respective on five loci of CYP2D6*2,*10,*14 and CYP1A2*1C, *1F are detected by the Applied Biosystem 3130xl. The raw data is processed by ChromasPro to detected the specific alleles on the above loci from each sample. The secondary data are reorganized and processed by R combined with the reports of ADRs from clinical reports. Three information theory based algorithms are implemented for the screening task: JMI, CMIM, and mRMR. If a SNP is selected by more than two algorithms, we are confident to conclude that it is related to the corresponding ADR. The selection results are compared with the control decision tree + LASSO regression model. In the study group where ADRs occur, 10 SNPs are considered relevant to the occurrence of a specific ADR by the combined information theory model. In comparison, only 5 SNPs are considered relevant to a specific ADR by the decision tree + LASSO regression model. In addition, the new method detects more relevant pairs of SNP and ADR which are affected both by SNP and dosage. This implies that the new information theory based model is effective to discover correlations of ADRs and CYP 450 SNPs and is helpful to predict the potential vulnerable genotype for some ADRs. The newly proposed

  16. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    Science.gov (United States)

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering

  17. Interactive Electronic Decision Trees for the Integrated Primary Care Management of Febrile Children in Low Resource Settings - Review of existing tools.

    Science.gov (United States)

    Keitel, Kristina; D'Acremont, Valérie

    2018-04-20

    The lack of effective, integrated diagnostic tools pose a major challenge to the primary care management of febrile childhood illnesses. These limitations are especially evident in low-resource settings and are often inappropriately compensated by antimicrobial over-prescription. Interactive electronic decision trees (IEDTs) have the potential to close these gaps: guiding antibiotic use and better identifying serious disease. This narrative review summarizes existing IEDTs, to provide an overview of their degree of validation, as well as to identify gaps in current knowledge and prospects for future innovation. Structured literature review in PubMed and Embase complemented by google search and contact with developers. Six integrated IEDTs were identified: three (eIMCI, REC, and Bangladesh digital IMCI) based on Integrated Management of Childhood Illnesses (IMCI); four (SL eCCM, MEDSINC, e-iCCM, and D-Tree eCCM) on Integrated Community Case Management (iCCM); two (ALMANACH, MSFeCARE) with a modified IMCI content; and one (ePOCT) that integrates novel content with biomarker testing. The types of publications and evaluation studies varied greatly: the content and evidence-base was published for two (ALMANACH and ePOCT), ALMANACH and ePOCT were validated in efficacy studies. Other types of evaluations, such as compliance, acceptability were available for D-Tree eCCM, eIMCI, ALMANACH. Several evaluations are still ongoing. Future prospects include conducting effectiveness and impact studies using data gathered through larger studies to adapt the medical content to local epidemiology, improving the software and sensors, and Assessing factors that influence compliance and scale-up. IEDTs are valuable tools that have the potential to improve management of febrile children in primary care and increase the rational use of diagnostics and antimicrobials. Next steps in the evidence pathway should be larger effectiveness and impact studies (including cost analysis) and

  18. Perception Modelling of Visitors in Vargas Museum Using Agent-Based Simulation and Visibility Analysis

    Science.gov (United States)

    Carcellar, B. G., III

    2017-10-01

    Museum exhibit management is one of the usual undertakings of museum facilitators. Art works must be strategically placed to achieve maximum viewing from the visitors. The positioning of the artworks also highly influences the quality of experience of the visitors. One solution in such problems is to utilize GIS and Agent-Based Modelling (ABM). In ABM, persistent interacting objects are modelled as agents. These agents are given attributes and behaviors that describe their properties as well as their motion. In this study, ABM approach that incorporates GIS is utilized to perform analyticcal assessment on the placement of the artworks in the Vargas Museum. GIS serves as the backbone for the spatial aspect of the simulation such as the placement of the artwork exhibits, as well as possible obstructions to perception such as the columns, walls, and panel boards. Visibility Analysis is also done to the model in GIS to assess the overall visibility of the artworks. The ABM is done using the initial GIS outputs and GAMA, an open source ABM software. Visitors are modelled as agents, moving inside the museum following a specific decision tree. The simulation is done in three use cases: the 10 %, 20 %, and 30 % chance of having a visitor in the next minute. For the case of the said museum, the 10 % chance is determined to be the closest simulation case to the actual and the recommended minimum time to achieve a maximum artwork perception is 1 hour and 40 minutes. Initial assessment of the results shows that even after 3 hours of simulation, small parts of the exhibit show lack of viewers, due to its distance from the entrance. A more detailed decision tree for the visitor agents can be incorporated to have a more realistic simulation.

  19. Model-Based Analysis of the Potential of Macroinvertebrates as Indicators for Microbial Pathogens in Rivers

    Directory of Open Access Journals (Sweden)

    Rubén Jerves-Cobo

    2018-03-01

    Full Text Available The quality of water prior to its use for drinking, farming or recreational purposes must comply with several physicochemical and microbiological standards to safeguard society and the environment. In order to satisfy these standards, expensive analyses and highly trained personnel in laboratories are required. Whereas macroinvertebrates have been used as ecological indicators to review the health of aquatic ecosystems. In this research, the relationship between microbial pathogens and macrobenthic invertebrate taxa was examined in the Machangara River located in the southern Andes of Ecuador, in which 33 sites, according to their land use, were chosen to collect physicochemical, microbiological and biological parameters. Decision tree models (DTMs were used to generate rules that link the presence and abundance of some benthic families to microbial pathogen standards. The aforementioned DTMs provide an indirect, approximate, and quick way of checking the fulfillment of Ecuadorian regulations for water use related to microbial pathogens. The models built and optimized with the WEKA package, were evaluated based on both statistical and ecological criteria to make them as clear and simple as possible. As a result, two different and reliable models were obtained, which could be used as proxy indicators in a preliminary assessment of pollution of microbial pathogens in rivers. The DTMs can be easily applied by staff with minimal training in the identification of the sensitive taxa selected by the models. The presence of selected macroinvertebrate taxa in conjunction with the decision trees can be used as a screening tool to evaluate sites that require additional follow up analyses to confirm whether microbial water quality standards are met.

  20. DoD Information Assurance Certification and Accreditation Process (DIACAP) Survey and Decision Tree

    Science.gov (United States)

    2011-07-01

    CVC Compliance and Validation Certification DAA designated accrediting authority DATO denial of authorization to operate DIACAP DoD Information...standard based on implementation of the best practices listed in paragraph 2.3. c. Direct the DSG to rename the Data Protection Committee to the...Information Grid (GIG)- based environment. Figure A-1. DoD IA program management. 1.1.1 DIACAP Background. a. Interim DIACAP signed 6 July 2006

  1. Variable length and context-dependent HMM letter form models for Arabic handwritten word recognition

    Science.gov (United States)

    Bianne-Bernard, Anne-Laure; Menasri, Fares; Likforman-Sulem, Laurence; Mokbel, Chafic; Kermorvant, Christopher

    2012-01-01

    We present in this paper an HMM-based recognizer for the recognition of unconstrained Arabic handwritten words. The recognizer is a context-dependent HMM which considers variable topology and contextual information for a better modeling of writing units. We propose an algorithm to adapt the topology of each HMM to the character to be modeled. For modeling the contextual units, a state-tying process based on decision tree clustering is introduced which significantly reduces the number of parameters. Decision trees are built according to a set of expert-based questions on how characters are written. Questions are divided into global questions yielding larger clusters and precise questions yielding smaller ones. We apply this modeling to the recognition of Arabic handwritten words. Experiments conducted on the OpenHaRT2010 database show that variable length topology and contextual information significantly improves the recognition rate.

  2. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  3. Prediction of financial crises by means of rough sets and decision trees

    Directory of Open Access Journals (Sweden)

    Zuleyka Díaz-Martínez

    2011-03-01

    Full Text Available This paper tries to further investigate the factors behind a financial crisis. By using a large sample of countries in the period 1981 to 1999, it intends to apply two methods coming from the Artificial Intelligence (Rough Sets theory and C4.5 algorithm and analyze the role of a set of macroeconomic and financial variables in explaining banking crises. These variables are both quantitative and qualitative. These methods do not require variables or data used to satisfy any assumptions. Statistical methods traditionally employed call for the explicative variables to satisfy statistical assumptions which is quite difficult to happen. This fact complicates the analysis. We obtained good results based on the classification accuracies (80% of correctly classified countries from an independent sample, which proves the suitability of both methods.

  4. Fuzzy decision trees as a decision-making framework in the public sector

    Directory of Open Access Journals (Sweden)

    Benčina Jože

    2011-01-01

    Full Text Available Systematic approaches to making decisions in the public sector are becoming very common. Most often, these approaches concern expert decision models. The expansion of the idea of the development of e-participation and e-democracy was influenced by the development of technology. All stakeholders are supposed to participate in decision making, so this brings a new feature to the decision-making process, in which amateurs and non-specialists are participating decision making instead of experts. To be able to understand the needs and wishes of stakeholders, it is not enough to vote for alternatives - it is important to participate in solution-finding and to express opinions about the important elements of these matters. The solution presented in this paper concerns fuzzy decision-making framework. This framework combines the advantages of the introduction of the decision-making problem in a tree structure and the possibilities offered by the flexibility of the fuzzy approach. The possibilities of implementation of the framework in practice are introduced by case studies of investment projects appraisal in a community and assessment of efficiency and effectiveness of public institutions.

  5. Classification decision tree in CT imaging: application to the differential diagnosis of solitary pulmonary nodules

    International Nuclear Information System (INIS)

    Ma Hongxia; Guo Yulin; Wang Qiuping; Qiang Yongqian; Liu Min; Guo Xiaojuan; Guo Youmin; Chen Qihang

    2008-01-01

    Objective: To establish classification and regression tree (CART) for differentiating benign from malignant solitary pulmonary nudules (SPN). Methods: One hundred and sixteen consecutive cases with 116 solitary pulmonary nodules, which finally were pathologically proven 54 malignant nodules and 62 benign nodules, were prospectively registered in this research. Twelve clinical presentations and 22 CT findings were collected as predictors. A classification tree was established to distinguish benign SPNs from malignant ones. In the observer test, two groups (one made of junior radiologists and one of senior radiologists) were independently presented with clinical information and CT images without knowing the pathologic and machine-learning results. Performance of observers and CART were compared by receiver operating characteristic analysis. Results: Receiver operating characteristic analysis showed areas under the curve of CART, senior radiologists and junior radiologists respectively were 0.910±0.029, 0.827±0.038, 0.612±0.052. Difference between areas(DBF) between CART and junior radiologists was 0.297(P<0.01). DBF between CART and senior radiologists was 0.083 (P<0.05). DBF between senior and junior radiologists was 0.214 (P<0.01). CART showed a best diagnostic efficiency, followed by junior radiologists, and then senior radiologists. Conclusion: Our data mining techniques using CART prove a high accuracy in differentiating benign from malignant pulmonary nodules based on clinical variables and CT findings. It will be a potentially useful tool in further application of artificial intelligence in the imaging diagnosis. (authors)

  6. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    Science.gov (United States)

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®

  7. Recent advances using rodent models for predicting human allergenicity

    International Nuclear Information System (INIS)

    Knippels, Leon M.J.; Penninks, Andre H.

    2005-01-01

    The potential allergenicity of newly introduced proteins in genetically engineered foods has become an important safety evaluation issue. However, to evaluate the potential allergenicity and the potency of new proteins in our food, there are still no widely accepted and reliable test systems. The best-known allergy assessment proposal for foods derived from genetically engineered plants was the careful stepwise process presented in the so-called ILSI/IFBC decision tree. A revision of this decision tree strategy was proposed by a FAO/WHO expert consultation. As prediction of the sensitizing potential of the novel introduced protein based on animal testing was considered to be very important, animal models were introduced as one of the new test items, despite the fact that non of the currently studied models has been widely accepted and validated yet. In this paper, recent results are summarized of promising models developed in rat and mouse

  8. Decision tree analysis to assess the cost-effectiveness of yttrium microspheres for treatment of hepatic metastases from colorectal cancer

    International Nuclear Information System (INIS)

    Kelley, B.B.; Walker, G.D.; Miles, K.A.

    2002-01-01

    Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  9. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  10. Decision-Tree Program

    Science.gov (United States)

    Buntine, Wray

    1994-01-01

    IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.

  11. Comparison of tree types of models for the prediction of final academic achievement

    Directory of Open Access Journals (Sweden)

    Silvana Gasar

    2002-12-01

    Full Text Available For efficient prevention of inappropriate secondary school choices and by that academic failure, school counselors need a tool for the prediction of individual pupil's final academic achievements. Using data mining techniques on pupils' data base and expert modeling, we developed several models for the prediction of final academic achievement in an individual high school educational program. For data mining, we used statistical analyses, clustering and two machine learning methods: developing classification decision trees and hierarchical decision models. Using an expert system shell DEX, an expert system, based on a hierarchical multi-attribute decision model, was developed manually. All the models were validated and evaluated from the viewpoint of their applicability. The predictive accuracy of DEX models and decision trees was equal and very satisfying, as it reached the predictive accuracy of an experienced counselor. With respect on the efficiency a