Bouissou, M. [Electricite de France (EDF), 75 - Paris (France)
Binary Decision Diagrams (BDD) have recently made a noticeable entry in the RAMS field. This kind of representation for boolean functions makes possible the assessment of complex fault-trees, both qualitatively (minimal cut-sets search) and quantitatively (exact calculation of top event probability). The object of the paper is to present a pre-processing of the fault-tree which ensures that the results given by different heuristics on the `optimized` fault-tree are not too sensitive to the way the tree is written. This property is based on a theoretical proof. In contrast with some well known heuristics, the method proposed is not based only on intuition and practical experiments. (author) 12 refs.
Binary Decision Diagrams (BDD) have recently made a noticeable entry in the RAMS field. This kind of representation for boolean functions makes possible the assessment of complex fault-trees, both qualitatively (minimal cut-sets search) and quantitatively (exact calculation of top event probability). The object of the paper is to present a pre-processing of the fault-tree which ensures that the results given by different heuristics on the 'optimized' fault-tree are not too sensitive to the way the tree is written. This property is based on a theoretical proof. In contrast with some well known heuristics, the method proposed is not based only on intuition and practical experiments. (author)
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Tran Hoai Linh
Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution
Our interest in this lattice stems from its application to binary decision trees. Binary decision trees form a crucial tool for algorithmic time analysis. The lattice properties of Tn are studied and we show that every Tn has a sublattice isomorphic to Tn-1 and prove that Tn is generated by Tn-1. Also we show that the distance from ...
Brodal, Gerth Stølting; Moruz, Gabriel
It is well-known that to minimize the number of comparisons a binary search tree should be perfectly balanced. Previous work has shown that a dominating factor over the running time for a search is the number of cache faults performed, and that an appropriate memory layout of a binary search tree...... can reduce the number of cache faults by several hundred percent. Motivated by the fact that during a search branching to the left or right at a node does not necessarily have the same cost, e.g. because of branch prediction schemes, we in this paper study the class of skewed binary search trees....... For all nodes in a skewed binary search tree the ratio between the size of the left subtree and the size of the tree is a fixed constant (a ratio of 1/2 gives perfect balanced trees). In this paper we present an experimental study of various memory layouts of static skewed binary search trees, where each...
Ibanez-Llano, Cristina, E-mail: firstname.lastname@example.org [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain); Rauzy, Antoine, E-mail: Antoine.RAUZY@3ds.co [Dassault Systemes, 10 rue Marcel Dassault CS 40501, 78946 Velizy Villacoublay, Cedex (France); Melendez, Enrique, E-mail: email@example.com [Consejo de Seguridad Nuclear (CSN), C/Justo Dorado 11, 28040 Madrid (Spain); Nieto, Francisco, E-mail: firstname.lastname@example.org [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain)
Over the last two decades binary decision diagrams have been applied successfully to improve Boolean reliability models. Conversely to the classical approach based on the computation of the MCS, the BDD approach involves no approximation in the quantification of the model and is able to handle correctly negative logic. However, when models are sufficiently large and complex, as for example the ones coming from the PSA studies of the nuclear industry, it begins to be unfeasible to compute the BDD within a reasonable amount of time and computer memory. Therefore, simplification or reduction of the full model has to be considered in some way to adapt the application of the BDD technology to the assessment of such models in practice. This paper proposes a reduction process based on using information provided by the set of the most relevant minimal cutsets of the model in order to perform the reduction directly on it. This allows controlling the degree of reduction and therefore the impact of such simplification on the final quantification results. This reduction is integrated in an incremental procedure that is compatible with the dynamic generation of the event trees and therefore adaptable to the recent dynamic developments and extensions of the PSA studies. The proposed method has been applied to a real case study, and the results obtained confirm that the reduction enables the BDD computation while maintaining accuracy.
Golzari, Fahimeh; Jalili, Saeed
In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full Text Available Binary trees are very useful tools in computer science for estimating the running time of so-called comparison based algorithms, algorithms in which every action is ultimately based on a prior comparison between two elements. For two given algorithms A and B where the decision tree of A is more balanced than that of B, it is known that the average and worst case times of A will be better than those of B, i.e., ₸A(n ≤₸B(n and TWA (n≤TWB (n. Thus the most balanced and the most imbalanced binary trees play a main role. Here we consider them as semilattices and characterize the most balanced and the most imbalanced binary trees by topological and categorical properties. Also we define the composition of binary trees as a commutative binary operation, *, such that for binary trees A and B, A * B is the binary tree obtained by attaching a copy of B to any leaf of A. We show that (T,* is a commutative po-monoid and investigate its properties.
J.F. Groote (Jan Friso); J.C. van de Pol (Jaco)
textabstractWe incorporate equations in binary decision diagrams (BDD). The resulting objects are called EQ-BDDs. A straightforward notion of ordered EQ-BDDs (EQ-OBDD) is defined, and it is proved that each EQ-BDD is logically equivalent to an EQ-OBDD. Moreover, on EQ-OBDDs satisfiability and
Hansen, Esben Rune; Satti, Srinivasa Rao; Tiedemann, Peter
The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to ......-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances......The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1...
Rune Hansen, Esben; Srinivasa Rao, S.; Tiedemann, Peter
The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to ......-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances.......The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1...
Nusbaumer, O. P. M.
This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these deficiencies
8 2.4 Irrigation, Agronomic Inputs, and...documents will provide the reader in-depth background on the science and engineering mechanisms of phytoremediation. Using the decision tree and the...ITRC – Phytoremediation Decision Tree December 1999 8 • Contaminant levels • Plant selection • Treatability • Irrigation, agronomic
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula . Representing a function in the form of decision tree allows applying graph algorithms for various transformations . Decision trees and branching programs are used for effective hardware  and software  implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.
A.M. Silva (Alexandra); J.J.M.M. Rutten (Jan)
htmlabstractWe study the set T_A of infinite binary trees with nodes labelled in a semiring A from a coalgebraic perspective. We present coinductive definition and proof principles based on the fact that T_A carries a final coalgebra structure. By viewing trees as formal power series, we develop a
Nusbaumer, O. P. M
This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly
Full Text Available PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets
To illustrate the use of decision trees with a utility index in clinical decision making. A decision tree was created related to whether or not to perform a tonsillectomy. Data from the literature were applied to a common hypothetical clinical scenario. A decision tree graphically represents the typical decision-making process that many clinicians use. The addition of utility functions permitted consideration of the adverse or beneficial effects of outcomes, altering the treatment decision. Quantitative tools such as decision trees may quantify outcome preferences and aid in clinical decision making, but the proper tool and background data are essential.
Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision
IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Full Text Available Abstract Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART technique and the newer Conditional Inference tree (CTree technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Putora, Paul Martin; Panje, Cedric M; Papachristofilou, Alexandros; Dal Pra, Alan; Hundsberger, Thomas; Plasswilm, Ludwig
Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.
The Dyson relations between renormalized and bare photon and electron propagators Z 3 anti D(q)=D(q) and Z 2 anti S(q)=S(q) are expanded over planar binary trees. This yields explicit recursive relations for the terms of the expansions. When all the trees corresponding to a given power of the electron charge are summed, recursive relations are obtained for the finite coefficients of the renormalized photon and electron propagators. These relations significantly decrease the number of integrals to carry out, as compared to the standard Feynman diagram technique. In the case of massless quantum electrodynamics (QED), the relation between renormalized and bare coefficients of the perturbative expansion is given in terms of a Hopf algebra structure. (orig.)
The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham  showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in  proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in  showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.
Zhang, Quanshi; Yang, Yu; Wu, Ying Nian; Zhu, Song-Chun
This paper presents a method to learn a decision tree to quantitatively explain the logic of each prediction of a pre-trained convolutional neural networks (CNNs). Our method boosts the following two aspects of network interpretability. 1) In the CNN, each filter in a high conv-layer must represent a specific object part, instead of describing mixed patterns without clear meanings. 2) People can explain each specific prediction made by the CNN at the semantic level using a decision tree, i.e....
We completely characterise the complexity in the decision tree model of computing composite relations of the form h = g(f^1,...,f^n), where each relation f^i is boolean-valued. Immediate corollaries include a direct sum theorem for decision tree complexity and a tight characterisation of the decision tree complexity of iterated boolean functions.
This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.
Valero Valbuena, Silvia
Premi extraordinari doctorat curs 2011-2012, àmbit Enginyeria de les TIC The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image processing tools. Therefore, under the title Hyperspectral image representation and Processing with Binary Partition Trees, this PhD thesis proposes the construction and the processing of a new region-based hierarchical hyperspectral image representation: the Binary Partition Tree (BPT). This hierarc...
We study decision trees which are totally optimal relative to different sets of complexity parameters for Boolean functions. A totally optimal tree is an optimal tree relative to each parameter from the set simultaneously. We consider the parameters characterizing both time (in the worst- and average-case) and space complexity of decision trees, i.e., depth, total path length (average depth), and number of nodes. We have created tools based on extensions of dynamic programming to study totally optimal trees. These tools are applicable to both exact and approximate decision trees, and allow us to make multi-stage optimization of decision trees relative to different parameters and to count the number of optimal trees. Based on the experimental results we have formulated the following hypotheses (and subsequently proved): for almost all Boolean functions there exist totally optimal decision trees (i) relative to the depth and number of nodes, and (ii) relative to the depth and average depth.
The book focuses on different variants of decision tree induction but also describes the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...
Van Pelt, J; Uylings, H B; Verwer, R W; Pentney, R J; Woldenberg, M J
The topological structure of a binary tree is characterized by a measure called tree asymmetry, defined as the mean value of the asymmetry of its partitions. The statistical properties of this tree-asymmetry measure have been studied using a growth model for binary trees. The tree-asymmetry measure appears to be sensitive for topological differences and the tree-asymmetry expectation for the growth model that we used appears to be almost independent of the size of the trees. These properties and the simple definition make the measure suitable for practical use, for instance for characterizing, comparing and interpreting sets of branching patterns. Examples are given of the analysis of three sets of neuronal branching patterns. It is shown that the variance in tree-asymmetry values for these observed branching patterns corresponds perfectly with the variance predicted by the used growth model.
Davoodi, Pooya; Raman, Rajeev; Satti, Srinivasa
We provide two succinct representations of binary trees that can be used to represent the Cartesian tree of an array A of size n. Both the representations take the optimal 2n + o(n) bits of space in the worst case and support range minimum queries (RMQs) in O(1) time. The first one is a modificat......We provide two succinct representations of binary trees that can be used to represent the Cartesian tree of an array A of size n. Both the representations take the optimal 2n + o(n) bits of space in the worst case and support range minimum queries (RMQs) in O(1) time. The first one...
Yildiz, Olcay Taner
In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
Brodal, G.S.; Fagerberg, R.; Jacob, R.
We propose a version of cache oblivious search trees which is simpler than the previous proposal of Bender, Demaine and Farach-Colton and has the same complexity bounds. In particular, our data structure avoids the use of weight balanced B-trees, and can be implemented as just a single array......, and range queries in worst case O(logB n + k/B) memory transfers, where k is the size of the output.The basic idea of our data structure is to maintain a dynamic binary tree of height log n+O(1) using existing methods, embed this tree in a static binary tree, which in turn is embedded in an array in a cache...... oblivious fashion, using the van Emde Boas layout of Prokop.We also investigate the practicality of cache obliviousness in the area of search trees, by providing an empirical comparison of different methods for laying out a search tree in memory....
Redig, F.; Ruszel, W.M.; Saada, E.
We study the abelian sandpile model on a random binary tree. Using a transfer matrix approach introduced by Dhar and Majumdar, we prove exponential decay of correlations, and in a small supercritical region (i.e., where the branching process survives with positive probability) exponential decay of
Wu, C.; Landgrebe, D. A.; Swain, P. H.
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Giesen, R. J.; Huynen, A. L.; Aarnink, R. G.; de la Rosette, J. J.; Debruyne, F. M.; Wijkstra, H.
A non-parametric algorithm is described for the construction of a binary decision tree classifier. This tree is used to correlate textural features, computed from ultrasonographic prostate images, with the histopathology of the imaged tissue. The algorithm consists of two parts; growing and pruning.
Li, Yuanhong; Dong, Ming; Kothari, Ravi
Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle each of the resulting partitions and the process continues. A node is considered terminal if it satisfies some stopping criteria (for example, purity, i.e., all patterns at the node are from a single class). Decision trees may be univariate, linear multivariate, or nonlinear multivariate depending on whether a single attribute, a linear function of all the attributes, or a nonlinear function of all the attributes is used for the partitioning at each node of the decision tree. Though nonlinear multivariate decision trees are the most powerful, they are more susceptible to the risks of overfitting. In this paper, we propose to perform model selection at each decision node to build omnivariate decision trees. The model selection is done using a novel classifiability measure that captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of the subproblem at each node. The proposed approach is fast and does not suffer from as high a computational burden as that incurred by typical model selection algorithms. Empirical results over 26 data sets indicate that our approach is faster and achieves better classification accuracy compared to statistical model select algorithms.
The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.
Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations
.... This implies that we recognize words as units, without recognizing their subcomponents. Multiple randomized decision trees are used to access the large pool of acoustic events in a systematic manner and are aggregated to produce the classifier.
EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.
Safavian, S. R.; Landgrebe, David
Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA
A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.
Thompson, David R.
Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory
A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.
The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.
Kordi, Misagh; Bansal, Mukul S
Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.
A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The
Pol, J. van de; Zantema, H.
BDDs provide an established technique for propositional formula manipulation. In this paper we re-develope the basic BDD theory using standard rewriting techniques. Since a BDD is a DAG instead of a tree we need a notion of shared rewriting and develope appropriate theory. A rewriting system is
Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong
To evaluate a different decision tree for safe liver resection and verify its efficiency. A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time 50 μmol/L (the "50-50" criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P decision trees. The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma.
Full Text Available In the context of growing ubiquity of sensors, surveillance equipment and other mobile devices, a shift in the data processing paradigm was necessary. New systems are required to be capable of processing data streams of infinite length, having a high throughput, that cannot be stored and processed using classical Database Management Systems (DBMSs. These are called Data Stream Management Systems (DSMSs within the scientific community. A first step performed by them is time synchronization between events arriving on different timestamped data streams. Within this paper an event synchronization method that makes use of binary trees to achieve its task is introduced and compared with other approaches in order to emphasize its strengths. Furthermore the integration with DSCPE (our Data Stream Continuous Processing Engine is proposed.
Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.
The binary decision tree method is used to separate between several multi-jet topologies in e/sup +/e/sup -/ collisions. Instead of the univariate process usually taken, a new design procedure for constructing multivariate decision trees is proposed. The segmentation is obtained by considering some features functions, where linear and nonlinear discriminant functions and a minimal distance method are used. The classification focuses on ALEPH simulated events, with multi-jet topologies. Compared to a standard univariate tree, the multivariate decision trees offer significantly better performance. (30 refs).
Presents a "Decision Tree" process for structuring team decision making and problem solving about specific student behavioral goals. The Decision Tree involves a sequence of questions/decisions that can be answered in "yes/no" terms. Questions address reasonableness of the goal, time factors, importance of the goal, responsibilities, safety,…
Full Text Available Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer. Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they intend to shop or not. Methods: Data sample consists of 305 respondents, who are persons older than 18 years involved in buying groceries for their household. The research was conducted in February 2017. In order to create a model, the decision trees method was used with its several classification algorithms. Results: All models, except the one that used RandomTree algorithm, achieved relatively high classification rate (over the 80%. The highest classification accuracy of 84.75% gave J48 and RandomForest algorithms. Since there is no statistically significant difference between those two algorithms, authors decided to choose J48 algorithm and build a decision tree. Conclusions: The value for money and price level in the store were the most significant variables for classification of shopping intention. Future study plans to compare this model with some other data mining techniques, such as neural networks or support vector machines since these techniques achieved very good accuracy in some previous research in this field.
Spinka, T.; Carpenter, T.; Brunner, R. J.; Aydt, R.; Auvil, L.; Redman, T.; Tcheng, D.
The massive amounts of data flooding into the astronomy field hold many answers to important problems in contemporary astrophysics. The biggest problem is sifting through massive amounts of data to uncover these secrets. In this presentation, we identify an approach in which we apply data-mining techniques to the problem of photometric quasar identification. We employ decision trees to quickly and robustly identify potential quasars to a high degree of accuracy. We emphasize computational scalability due to the high volume of data and complexity of the data-mining algorithms.
AbouEisha, Hassan M.
We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.
Full Text Available A new combining criterion, the Multiplicative Proportional Deviative Influence (MPDI is presented for combining or aggregating multi-expert numerical judgments in Yes-or-No type ill-structured group decision making situations. This newly proposed criterion performs well in comparison with the widely used aggregation means: the Arithmetic Mean (AM, and Geometric Mean (GM, especially in better reflecting the degree of agreement between criteria levels or numerical experts’ judgments. The MPDI can be considered as another class of combining criteria that make effect of the degree of agreement among multiple numerical judgments. The MPDI is applicable in integrating several collaborative or synergistic decision making systems through combining final numerical decision outputs. A discussion and generalization of the proposed MPDI is discussed withnumerical example.
The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.
Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald
Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
We study problems of optimization of decision and inhibitory trees for decision tables with many-valued decisions. As cost functions, we consider depth, average depth, number of nodes, and number of terminal/nonterminal nodes in trees. Decision tables with many-valued decisions (multi-label decision tables) are often more accurate models for real-life data sets than usual decision tables with single-valued decisions. Inhibitory trees can sometimes capture more information from decision tables than decision trees. In this paper, we create dynamic programming algorithms for multi-stage optimization of trees relative to a sequence of cost functions. We apply these algorithms to prove the existence of totally optimal (simultaneously optimal relative to a number of cost functions) decision and inhibitory trees for some modified decision tables from the UCI Machine Learning Repository.
The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in  to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository . © 2010 Springer-Verlag Berlin Heidelberg.
A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository . Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.
Barros, Rodrigo C; Freitas, Alex A
Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o
Karafet, Tatiana M; Mendez, Fernando L; Meilerman, Monica B; Underhill, Peter A; Zegura, Stephen L; Hammer, Michael F
Markers on the non-recombining portion of the human Y chromosome continue to have applications in many fields including evolutionary biology, forensics, medical genetics, and genealogical reconstruction. In 2002, the Y Chromosome Consortium published a single parsimony tree showing the relationships among 153 haplogroups based on 243 binary markers and devised a standardized nomenclature system to name lineages nested within this tree. Here we present an extensively revised Y chromosome tree containing 311 distinct haplogroups, including two new major haplogroups (S and T), and incorporating approximately 600 binary markers. We describe major changes in the topology of the parsimony tree and provide names for new and rearranged lineages within the tree following the rules presented by the Y Chromosome Consortium in 2002. Several changes in the tree topology have important implications for studies of human ancestry. We also present demography-independent age estimates for 11 of the major clades in the new Y chromosome tree.
Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Cochener, Béatrice; Roux, Christian
In this paper, we present a Case Based Reasoning (CBR) system for the retrieval of medical cases made up of a series of images with contextual information (such as the patient age, sex and medical history). Indeed, medical experts generally need varied sources of information (which might be incomplete) to diagnose a pathology. Consequently, we derive a retrieval framework from decision trees, which are well suited to process heterogeneous and incomplete information. To be integrated in the system, images are indexed by their digital content. The method is evaluated on a classified diabetic retinopathy database. On this database, results are promising: the retrieval sensitivity reaches 79.5% for a window of 5 cases, which is almost twice as good as the retrieval of single images alone. As a comparison, the retrieval sensitivity is 52.3% for a standard multimodal case retrieval using a linear combination of heterogeneous distances.
Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Cochener, Béatrice; Roux, Christian
In this paper, we present a Case Based Reasoning (CBR) system for the retrieval of medical cases made up of a series of images with contextual information (such as the patient age, sex and medical history). Indeed, medical experts generally need varied sources of information (which might be incomplete) to diagnose a pathology. Consequently, we derive a retrieval framework from decision trees, which are well suited to process heterogeneous and incomplete information. To be integrated in the system, images are indexed by their digital content. The method is evaluated on a classified diabetic retinopathy database. On this database, results are promising: the retrieval sensitivity reaches 79.5% for a window of 5 cases, which is almost twice as good as the retrieval of single images alone. As a comparison, the retrieval sensitivity is 52.3% for a standard multimodal case retrieval using a linear combination of heterogeneous distances. PMID:18003014
This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository . © Springer-Verlag Berlin Heidelberg 2013.
An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.
... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 ...
It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.
A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive
Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.
In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.
Full Text Available This paper will make an analysis of decision tree at first, and then offer a further analysis of CLS based on it. As CLS contains the most substantial and most primitive decision-making idea, it can provide the basis of decision tree establishment. Due to certain limitation in details, the ID3 decision tree algorithm is introduced to offer more details. It applies information gain as attribute selection metrics to provide reference for seeking the optimal segmentation point. At last, the ID3 algorithm is applied in football training. Verification is made on this algorithm and it has been proved effectively and reasonably.
McGrath, Robert E
Professional psychologists are often confronted with the task of making binary decisions about individuals, such as predictions about future behavior or employee selection. Test users familiar with linear models and Bayes's theorem are likely to assume that the accuracy of decisions is consistently improved by combination of outcomes across valid predictors. However, neither statistical method accurately estimates the increment in accuracy that results from use of additional predictors in the typical applied setting. It was demonstrated that the best single predictor often can perform better than do multiple predictors when the predictors are combined using methods common in applied settings. This conclusion is consistent with previous findings concerning G. Gigerenzer and D. Goldstein's (1996) "take the best" heuristic. Furthermore, the information needed to ensure an increment in fit over the best single predictor is rarely available. (c) 2008 APA, all rights reserved.
In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.
The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.
Kleinhans, Sonja; Herrmann, Eva; Kohnen, Thomas; Bühren, Jens
Background Iatrogenic keratectasia is one of the most dreaded complications of refractive surgery. In most cases, keratectasia develops after refractive surgery of eyes suffering from subclinical stages of keratoconus with few or no signs. Unfortunately, there has been no reliable procedure for the early detection of keratoconus. In this study, we used binary decision trees (recursive partitioning) to assess their suitability for discrimination between normal eyes and eyes with subclinical keratoconus. Patients and Methods The method of decision tree analysis was compared with discriminant analysis which has shown good results in previous studies. Input data were 32 eyes of 32 patients with newly diagnosed keratoconus in the contralateral eye and preoperative data of 10 eyes of 5 patients with keratectasia after laser in-situ keratomileusis (LASIK). The control group was made up of 245 normal eyes after LASIK and 12-month follow-up without any signs of iatrogenic keratectasia. Results Decision trees gave better accuracy and specificity than did discriminant analysis. The sensitivity of decision trees was lower than the sensitivity of discriminant analysis. Conclusion On the basis of the patient population of this study, decision trees did not prove to be superior to linear discriminant analysis for the detection of subclinical keratoconus. Georg Thieme Verlag KG Stuttgart · New York.
Argentiero, P.; Chin, R.; Beaudet, P.
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Song, Yan-Yan; Lu, Ying
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Full Text Available The convertible bonds usually have multiple additional provisions that make their pricing problem more difficult than straight bonds and options. This paper uses the binary tree method to model the finance market. As the underlying stock prices and the interest rates are important to the convertible bonds, we describe their dynamic processes by different binary tree. Moreover, we consider the influence of the credit risks on the convertible bonds that is described by the default rate and the recovery rate; then the two-factor binary tree model involving the credit risk is established. On the basis of the theoretical analysis, we make numerical simulation and get the pricing results when the stock prices are CRR model and the interest rates follow the constant volatility and the time-varying volatility, respectively. This model can be extended to other financial derivative instruments.
In this paper we describe a waveform recognition method that extracts characteristic parameters from wave- forms and a method of automated sleep stage scoring using decision tree learning that is in...
DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using. WEKA, open source ...
DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source ...
Kushi, Yusuke; Inazumi, Hiroshige
A decision tree is one of the machine learning techniques and also one of the major knowledge representations of data mining results.This is because it is easy to understand its meaning for human analysts.Even ID3, the representative algorithm, is known to exhibit remarkable performance deterioration under certain circumstances, particularly due to strong correlation between attributes representing the class of examples. One of the approaches to get more preferable decision trees is pre-processing the training data to extend its description, such as attributes generation and attribute selection. There is also the idea of decision trees with a region rule. In this paper, we consider two approaches, i.e., decision trees with a region rule allowing multiple attributes, and a pre-processing method of a region rule to enabling any suitable number of attributes to correspond to branch nodes, where an optimal division condition with arbitrarily multiple attributes is acquired. By using this method, we propose a new decision tree generation algorithm guaranteeing to select effective compound attributes with each branch node, where an MDL-based new evaluation criterion is also defined for determining the optimal number of compound attributes specified to each node.This algorithm is applied to datasets containing only nominal values. It consists of three processes: compound attributes selection, parent node integration, and pruning. We call this new decision trees DTMACC (Decision Trees with Multiple Attributes Concept Clustering). The effectiveness and comprehensiveness of the proposed algorithm are confirmed through experiments comparing to the ordinary decision trees and an effective pre-processing method.
To examine the application of the decision tree approach to collaborative clinical decision-making in mental health care in the United Kingdom (UK). While this approach to decision-making has been examined in the acute care setting, there is little published evidence of its use in clinical decision-making within the mental health setting. The complexities of dual diagnosis (schizophrenia and substance misuse in this case example) and the varied viewpoints of different professionals often hamper the decision-making process. This paper highlights how the approach was used successfully as a multiprofessional collaborative approach to decision-making in the context of British community mental health care. A selective review of the relevant literature and a case study application of the decision tree framework. The process of applying the decision tree framework to clinical decision-making in mental health practice can be time consuming and client inclusion within the process is not always appropriate. The approach offers a method of assigning numerical values to support complex multiprofessional decision-making as well as considering underpinning literature to inform the final decision. Use of the decision tree offers a common framework that can assist professionals to examine the options available to them in depth, while considering the complex variables that influence decision-making in collaborative mental health practice. Use of the decision tree warrants further consideration in mental health care in terms of practice and education.
Zhang, Liyuan; Li, Tao; Xu, Xuanhua
The aim of this paper is to develop a methodology for intuitionistic trapezoidal fuzzy multiple criteria group decision making problems based on binary relation. Firstly, the similarity measure between two vectors based on binary relation is defined, which can be utilized to aggregate preference information. Some desirable properties of the similarity measure based on fuzzy binary relation are also studied. Then, a methodology for fuzzy multiple criteria group decision making is proposed, in ...
Chang, Chi-Yung (Inventor); Fang, Wai-Chi (Inventor); Curlander, John C. (Inventor)
A system for data compression utilizing systolic array architecture for Vector Quantization (VQ) is disclosed for both full-searched and tree-searched. For a tree-searched VQ, the special case of a Binary Tree-Search VQ (BTSVQ) is disclosed with identical Processing Elements (PE) in the array for both a Raw-Codebook VQ (RCVQ) and a Difference-Codebook VQ (DCVQ) algorithm. A fault tolerant system is disclosed which allows a PE that has developed a fault to be bypassed in the array and replaced by a spare at the end of the array, with codebook memory assignment shifted one PE past the faulty PE of the array.
Knaggs, Sara J.; And Others
A 'decision tree' consists of an outline of the patient's symptoms and a logic for decision and action. It is felt that this approach to the decisionmaking process better facilitates each learner's application of his own level of knowledge and skills. (Author)
Beck, Kirk A.
This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…
Dahan, Haim; Rokach, Lior; Maimon, Oded
This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite
In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory , coding theory , design and analysis of algorithms (e.g., sorting) . For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code  and a blood test study in assumption that exactly one patient is ill ). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable"  (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.
Viikki, K; Kentala, E; Juhola, M; Pyykkö, I
Expert systems have been applied in medicine as diagnostic aids and education tools. The construction of a knowledge base for an expert system may be a difficult task; to automate this task several machine learning methods have been developed. These methods can be also used in the refinement of knowledge bases for removing inconsistencies and redundancies, and for simplifying decision rules. In this study, decision tree induction was employed to acquire diagnostic knowledge for otoneurological diseases and to extract relevant parameters from the database of an otoneurological expert system ONE. The records of patients with benign positional vertigo, Meniere's disease, sudden deafness, traumatic vertigo, vestibular neuritis and vestibular schwannoma were retrieved from the database of ONE, and for each disease, decision trees were constructed. The study shows that decision tree induction is a useful technique for acquiring diagnostic knowledge for otoneurological diseases and for extracting relevant parameters from a large set of parameters.
Frohwein, H.I.; Lambert, J.H.; Haimes, Y.Y.
A need for a methodology to control the extreme events, defined as low-probability, high-consequence incidents, in sequential decisions is identified. A variety of alternative and complementary measures of the risk of extreme events are examined for their usability as objective functions in sequential decisions, represented as single- or multiple-objective decision trees. Earlier work had addressed difficulties, related to non-separability, with the minimization of some measures of the risk of extreme events in sequential decisions. In an extension of these results, it is shown how some non-separable measures of the risk of extreme events can be interpreted in terms of separable constituents of risk, thereby enabling a wider class of measures of the risk of extreme events to be handled in a straightforward manner in a decision tree. Also for extreme events, results are given to enable minimax- and Hurwicz-criterion analyses in decision trees. An example demonstrates the incorporation of different measures of the risk of extreme events in a multi-objective decision tree. Conceptual formulations for optimizing non-separable measures of the risk of extreme events are identified as an important area for future investigation
Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.
Full Text Available Background: Type 2 Diabetes Mellitus (T2DM is one of the most important risk factors in cardiovascular disorders considered as a common clinical and public health problem. Early diagnosis can reduce the burden of the disease. Decision tree, as an advanced data mining method, can be used as a reliable tool to predict T2DM. Objectives: This study aimed to present a simple model for predicting T2DM using decision tree modeling. Materials and Methods: This analytical model-based study used a part of the cohort data obtained from a database in Healthy Heart House of Shiraz, Iran. The data included routine information, such as age, gender, Body Mass Index (BMI, family history of diabetes, and systolic and diastolic blood pressure, which were obtained from the individuals referred for gathering baseline data in Shiraz cohort study from 2014 to 2015. Diabetes diagnosis was used as binary datum. Decision tree technique and J48 algorithm were applied using the WEKA software (version 3.7.5, New Zealand. Additionally, Receiver Operator Characteristic (ROC curve and Area Under Curve (AUC were used for checking the goodness of fit. Results: The age of the 11302 cases obtained after data preparation ranged from 18 to 89 years with the mean age of 48.1 ± 11.4 years. Additionally, 51.1% of the cases were male. In the tree structure, blood pressure and age were placed where most information was gained. In our model, however, gender was not important and was placed on the final branch of the tree. Total precision and AUC were 87% and 89%, respectively. This indicated that the model had good accuracy for distinguishing patients from normal individuals. Conclusions: The results showed that T2DM could be predicted via decision tree model without laboratory tests. Thus, this model can be used in pre-clinical and public health screening programs.
We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).
Full Text Available Character recognition in a document image captured by a digital camera requires a good binary image as the input for the separation the text from the background. Global binarization method does not provide such good separation because of the problem of uneven levels of lighting in images captured by cameras. Local binarization method overcomes the problem but requires a method to partition the large image into local windows properly. In this paper, we propose a local binariation method with dynamic image partitioning using integral image and decision tree for the binarization decision. The integral image is used to estimate the number of line in the document image. The number of line in the document image is used to devide the document into local windows. The decision tree makes a decision for threshold in every local window. The result shows that the proposed method can separate the text from the background better than using global thresholding with the best OCR result of the binarized image is 99.4%. Pengenalan karakter pada sebuah dokumen citra yang diambil menggunakan kamera digital membutuhkan citra yang terbinerisasi dengan baik untuk memisahkan antara teks dengan background. Metode binarisasi global tidak memberikan hasil pemisahan yang bagus karena permasalahan tingkat pencahayaan yang tidak seimbang pada citra hasil kamera digital. Metode binarisasi lokal dapat mengatasi permasalahan tersebut namun metode tersebut membutuhkan metode untuk membagi citra ke dalam bagian-bagian window lokal. Pada paper ini diusulkan sebuah metode binarisasi lokal dengan pembagian citra secara dinamis menggunakan integral image dan decision tree untuk keputusan binarisasi lokalnya. Integral image digunakan untuk mengestimasi jumlah baris teks dalam dokumen citra. Jumlah baris tersebut kemudian digunakan untuk membagi citra dokumen ke dalam window lokal. Keputusan nilai threshold untuk setiap window lokal ditentukan dengan decisiontree. Hasilnya menunjukkan
Jensen, Rune Møller; Leknes, Eilif; Bebbington, Thomas
Low cost containerized shipping requires high quality stowage plans. Scalable stowage planning optimization algorithms have been developed recently. All of these algorithms, however, produce monolithic solutions that are hard for stowage coordinators to modify, which is necessary in practice due...... to the application of approximate optimization models. This paper introduces an approach for modifying a stowage plan interactively without breaking its constraints. We focus on re-arranging the containers in a single bay section and use a symbolic conﬁguration technique based on binary decision diagrams to provide...
Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun
This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.
Detector of an n bit binary sequence code within a serial binary data system assigns states to memory elements of a code sequence detector by employing the same order of states for the sequence detector as that of the sequence generator when the linear recursion relationship employed by the sequence generator is given.
Ibanez-Llano, Cristina, E-mail: email@example.com [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain); Rauzy, Antoine, E-mail: Antoine.RAUZY@3ds.co [Dassault Systemes, 10 rue Marcel Dassault CS 40501, 78946 Velizy Villacoublay Cedex (France); Melendez, Enrique, E-mail: firstname.lastname@example.org [Consejo de Seguridad Nuclear (CSN), C/Justo Dorado 11, 28040 Madrid (Spain); Nieto, Francisco, E-mail: email@example.com [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain)
Binary decision diagrams are a well-known alternative to the minimal cutsets approach to assess the reliability Boolean models. They have been applied successfully to improve the fault trees models assessment. However, its application to solve large models, and in particular the event trees coming from the PSA studies of the nuclear industry, remains to date out of reach of an exact evaluation. For many real PSA models it may be not possible to compute the BDD within reasonable amount of time and memory without considering the truncation or simplification of the model. This paper presents a new approach to estimate the exact probabilistic quantification results (probability/frequency) based on combining the calculation of the MCS and the truncation limits, with the BDD approach, in order to have a better control on the reduction of the model and to properly account for the success branches. The added value of this methodology is that it is possible to ensure a real confidence interval of the exact value and therefore an explicit knowledge of the error bound. Moreover, it can be used to measure the acceptability of the results obtained with traditional techniques. The new method was applied to a real life PSA study and the results obtained confirm the applicability of the methodology and open a new viewpoint for further developments.
I. A. Bessmertny
Full Text Available The paper considers the problem of mutual payment organization between business entities by means of clearing that is solved by search of graph paths. To reduce the decision tree complexity a method of precedents is proposed that consists in saving the intermediate solution during the moving along decision tree. An algorithm and example are presented demonstrating solution complexity coming close to a linear one. The tests carried out in civil aviation settlement system demonstrate approximately 30 percent shortage of real money transfer. The proposed algorithm is planned to be implemented also in other clearing organizations of the Russian Federation.
Bentayeb, Fadila; Darmont, Jérôme
International audience; Data mining is a useful decision support technique that can be used to discover production rules in warehouses or corporate data. Data mining research has made much effort to apply various mining algorithms efficiently on large databases. However, a serious problem in their practical application is the long processing time of such algorithms. Nowadays, one of the key challenges is to integrate data mining methods within the framework of traditional database systems. In...
Nadezhda Astakhova; Liliya Demidova; Evgeny Nikulchev
The optimization problem dealing with the development of the forecasting models on the base of strictly binary trees has been considered. The aim of paper is the comparative analysis of two optimization variants which are applied for the development of the forecasting models. Herewith the first optimization variant assumes the application of one quality indicator of the forecasting model named as the affinity indicator and the second variant realizes the application of two quality indicators ...
BAKIRLI, GÖZDE; BİRANT, DERYA
A number of recent studies have used a decision tree approach as a data mining technique; some of them needed to evaluate the similarity of decision trees to compare the knowledge reflected in different trees or datasets. There have been multiple perspectives and multiple calculation techniques to measure the similarity of two decision trees, such as using a simple formula or an entropy measure. The main objective of this study is to compute the similarity of decision trees using ...
Šebalj, Dario; Franjković, Jelena; Hodak, Kristina
Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer.Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they inten...
Behzadi, Naghi; Ahansaz, Bahram
We propose a mechanism for quantum state transfer (QST) over a binary tree spin network on the basis of incomplete collapsing measurements. To this aim, we perform initially a weak measurement (WM) on the central qubit of the binary tree network where the state of our concern has been prepared on that qubit. After the time evolution of the whole system, a quantum measurement reversal (QMR) is performed on a chosen target qubit. By taking optimal value for the strength of QMR, it is shown that the QST quality from the sending qubit to any typical target qubit on the binary tree is considerably improved in terms of the WM strength. Also, we show that how high-quality entanglement distribution over the binary tree network is achievable by using this approach.
Doubravsky, Karel; Dohnal, Mirko
Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.
Kamiński, Bogumił; Jakubczyk, Michał; Szufel, Przemysław
In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expected-value-maximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/mode-favoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.
Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek
The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type-$I$ splitting criteria guarantee, with high probability, the highest expected value of split measure. Type-$II$ criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.
Shiffman, Smadar; Nemani, Ramakrishna
Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.
de Hoogh, Sebastiaan; Schoenmakers, Berry; Chen, Ping; op den Akker, Harm
In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our
Langley, Natalie R; Dudzik, Beatrix; Cloutier, Alesia
This study uses five well-documented cranial nonmetric traits (glabella, mastoid process, mental eminence, supraorbital margin, and nuchal crest) and one additional trait (zygomatic extension) to develop a validated decision tree for sex assessment. The decision tree was built and cross-validated on a sample of 293 U.S. White individuals from the William M. Bass Donated Skeletal Collection. Ordinal scores from the six traits were analyzed using the partition modeling option in JMP Pro 12. A holdout sample of 50 skulls was used to test the model. The most accurate decision tree includes three variables: glabella, zygomatic extension, and mastoid process. This decision tree yielded 93.5% accuracy on the training sample, 94% on the cross-validated sample, and 96% on a holdout validation sample. Linear weighted kappa statistics indicate acceptable agreement among observers for these variables. Mental eminence should be avoided, and definitions and figures should be referenced carefully to score nonmetric traits. © 2017 American Academy of Forensic Sciences.
The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result ...
This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository . © 2011 Springer-Verlag.
A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.
Liu, Leo; Rather, Zakir Hussain; Chen, Zhe
The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression...... Tree (CART), has gained increasing interests because of its high performance in terms of computational efficiency, uncertainty manageability, and interpretability. This paper presents an overview of a variety of DT applications to power systems for better interfacing of power systems with data...... analytics. The fundamental knowledge of CART algorithm is also introduced which is then followed by examples of both classification tree and regression tree with the help of case study for security assessment of Danish power system....
Full Text Available Data mining plays an important role in analyzing the massive amount of data collected in today’s world. However, due to the public’s rising awareness of privacy and lack of trust in organizations, suitable Privacy Preserving Data Mining (PPDM techniques have become vital. A PPDM technique provides individual privacy while allowing useful data mining. We present a novel noise addition technique called Forest Framework, two novel data quality evaluation techniques called EDUDS and EDUSC, and a security evaluation technique called SERS. Forest Framework builds a decision forest from a dataset and preserves all the patterns (logic rules of the forest while adding noise to the dataset. We compare Forest Framework to its predecessor, Framework, and another established technique, GADP. Our comparison is done using our three evaluation criteria, as well as Prediction Accuracy. Our experimental results demonstrate the success of our proposed extensions to Framework and the usefulness of our evaluation criteria.
The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.
In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.
Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.
Delgado-Gomez, D; Baca-Garcia, E; Aguado, D; Courtet, P; Lopez-Castroman, J
Several Computerized Adaptive Tests (CATs) have been proposed to facilitate assessments in mental health. These tests are built in a standard way, disregarding useful and usually available information not included in the assessment scales that could increase the precision and utility of CATs, such as the history of suicide attempts. Using the items of a previously developed scale for suicidal risk, we compared the performance of a standard CAT and a decision tree in a support decision system to identify suicidal behavior. We included the history of past suicide attempts as a class for the separation of patients in the decision tree. The decision tree needed an average of four items to achieve a similar accuracy than a standard CAT with nine items. The accuracy of the decision tree, obtained after 25 cross-validations, was 81.4%. A shortened test adapted for the separation of suicidal and non-suicidal patients was developed. CATs can be very useful tools for the assessment of suicidal risk. However, standard CATs do not use all the information that is available. A decision tree can improve the precision of the assessment since they are constructed using a priori information. Copyright © 2016 Elsevier B.V. All rights reserved.
Ito, Yuki; Shiraishi, Eriko; Kato, Atsuko; Haino, Takayuki; Sugimoto, Kouhei; Okamoto, Aikou; Suzuki, Nao
To identify the utility and issues associated with the use of decision trees in oncofertility patient care in Japan. A total of 35 women who had been diagnosed with cancer, but had not begun anticancer treatment, were enrolled. We applied the oncofertility decision tree for women published by Gardino et al. to counsel a consecutive series of women on fertility preservation (FP) options following cancer diagnosis. Percentage of women who decided to undergo oocyte retrieval for embryo cryopreservation and the expected live-birth rate for these patients were calculated using the following equation: expected live-birth rate = pregnancy rate at each age per embryo transfer × (1 - miscarriage rate) × No. of cryopreserved embryos. Oocyte retrieval was performed for 17 patients (48.6%; mean ± standard deviation [SD] age, 36.35 ± 3.82 years). The mean ± SD number of cryopreserved embryos was 5.29 ± 4.63. The expected live-birth rate was 0.66. The expected live-birth rate with FP indicated that one in three oncofertility patients would not expect to have a live birth following oocyte retrieval and embryo cryopreservation. While the decision trees were useful as decision-making tools for women contemplating FP, in the context of the current restrictions on oocyte donation and the extremely small number of adoptions in Japan, the remaining options for fertility after cancer are limited. In order for cancer survivors to feel secure in their decisions, the decision tree may need to be adapted simultaneously with improvements to the social environment, such as greater support for adoption.
Mutasem Sh. Alkhasawneh
Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.
In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.
complexity theory, is then reviewed. The various findings are regrouped in a short summary of the "state-of-the- art " knowledge about decision trees. 3.2...tables, and tables incorporating calls to subtables in place of accions (each of which is beyond the reach of published analyses). The extension to...I, 135-143. Knuth, D. E. (1973). The Art of Computer Programming. Volume 1: Fundamental Alzorithms. Addison-Wesley, Reading, Mass. (2nd ed.). 122
Schetinin, V.; Fieldsend, J. E.; Partridge, D.; Krzanowski, W. J.; Everson, R. M.; Bailey, T. C.; Hernandez, A.
The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information. Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable. The use of the Markov C...
Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.
Landim, C.; Portugal, R. D.; Svaiter, B. F.
Inspired by biological dynamics, we consider a growth Markov process taking values on the space of rooted binary trees, similar to the Aldous-Shields (Probab. Theory Relat. Fields 79(4):509-542, 1988) model. Fix n≥1 and β>0. We start at time 0 with the tree composed of a root only. At any time, each node with no descendants, independently from the other nodes, produces two successors at rate β( n- k)/ n, where k is the distance from the node to the root. Denote by Z n ( t) the number of nodes with no descendants at time t and let T n = β -1 nln( n/ln4)+(ln2)/(2 β). We prove that 2- n Z n ( T n + nτ), τ∈ℝ, converges to the Gompertz curve exp(-(ln2) e - βτ ). We also prove a central limit theorem for the martingale associated to Z n ( t).
Phan, Thanh G; Chen, Jian; Beare, Richard; Ma, Henry; Clissold, Benjamin; Van Ly, John; Srikanth, Velandai
Prognostication following intracerebral hemorrhage (ICH) has focused on poor outcome at the expense of lumping together mild and moderate disability. We aimed to develop a novel approach at classifying a range of disability following ICH. The Virtual International Stroke Trial Archive collaboration database was searched for patients with ICH and known volume of ICH on baseline CT scans. Disability was partitioned into mild [modified Rankin Scale (mRS) at 90 days of 0-2], moderate (mRS = 3-4), and severe disabilities (mRS = 5-6). We used binary and trichotomy decision tree methodology. The data were randomly divided into training (2/3 of data) and validation (1/3 data) datasets. The area under the receiver operating characteristic curve (AUC) was used to calculate the accuracy of the decision tree model. We identified 957 patients, age 65.9 ± 12.3 years, 63.7% males, and ICH volume 22.6 ± 22.1 ml. The binary tree showed that lower ICH volume (27.9 ml), older age (>69.5 years), and low Glasgow Coma Scale (tree showed that ICH volume, age, and serum glucose can separate mild, moderate, and severe disability groups with AUC 0.79 (95% CI 0.71-0.87). Both the binary and trichotomy methods provide equivalent discrimination of disability outcome after ICH. The trichotomy method can classify three categories at once, whereas this action was not possible with the binary method. The trichotomy method may be of use to clinicians and trialists for classifying a range of disability in ICH.
Cezarina Adina TOFAN
Full Text Available The decision can be defined as the way chosen from several possible to achieve an objective. An important role in the functioning of the decisional-informational system is held by the decision-making methods. Decision trees are proving to be very useful tools for taking financial decisions or regarding the numbers, where a large amount of complex information must be considered. They provide an effective structure in which alternative decisions and the implications of their choice can be assessed, and help to form a correct and balanced vision of the risks and rewards that may result from a certain choice. For these reasons, the content of this communication will review a series of decision-making criteria. Also, it will analyse the benefits of using the decision tree method in the decision-making process by providing a numerical example. On this basis, it can be concluded that the procedure may prove useful in making decisions for companies operating on markets where competition intensity is differentiated.
Neumann, Anke; Holstein, Josiane; Le Gall, Jean-Roger; Lepage, Eric
The purpose of this paper is to investigate the suitability of boosted decision trees for the case-mix adjustment involved in comparing the performance of various health care entities. First, we present logistic regression, decision trees, and boosted decision trees in a unified framework. Second, we study in detail their application for two common performance indicators, the mortality rate in intensive care and the rate of potentially avoidable hospital readmissions. For both examples the technique of boosting decision trees outperformed standard prognostic models, in particular linear logistic regression models, with regard to predictive power. On the other hand, boosting decision trees was computationally demanding and the resulting models were rather complex and needed additional tools for interpretation. Boosting decision trees represents a powerful tool for case-mix adjustment in health care performance measurement. Depending on the specific priorities set in each context, the gain in predictive power might compensate for the inconvenience in the use of boosted decision trees.
Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.
Full Text Available Prior to the organization of health education begin the new school year, then the first step will be carried out selection of new admissions from general secondary education graduates and vocational. In this study, predicting new students to take multiple data attributes. The model is a decision tree classification prediction method to create a tree consisting of a root node, internal nodes and terminal nodes. While the root node and internal nodes are variables / features, the terminal node. Based on the experimental results and evaluations are done, it can be concluded that algorithm C4.5 with 80.39% accuracy obtained Uncertainty, Precision 94.44%, Recall of 75.00 % while the C4.5 algorithm with Information Gain Accuracy Ratio 88.24%, 98.28% Precision, 83.82% Recall.
Goetz, W.W.J.; Seebregts, A.J.; Bedford, T.J.
A review of relevent methodologies based on Influence Diagrams (IDs), Decision Trees (DTs), and Containment Event Trees (CETs) was conducted to assess the practicality of these methods for the selection of effective strategies for Severe Accident Management (SAM). The review included an evaluation of some software packages for these methods. The emphasis was on possible pitfalls of using IDs and on practical aspects, the latter by performance of a case study that was based on an existing Level 2 Probabilistic Safety Assessment (PSA). The study showed that the use of a combined ID/DT model has advantages over CET models, in particular when conservatisms in the Level 2 PSA have been identified and replaced by fair assessments of the uncertainties involved. It is recommended to use ID/DT models as complementary to CET models. (orig.)
Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L
Demonstrate the application of decision trees--classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)--to understand structure in missing data. Data taken from employees at 3 different industrial sites in Australia. 7915 observations were included. The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the 'rpart' and 'gbm' packages for CART and BRT analyses, respectively, from the statistical software 'R'. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Researchers are encouraged to use CART and BRT models to explore and understand missing data. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.
Full Text Available The aim of this paper is to develop a methodology for intuitionistic trapezoidal fuzzy multiple criteria group decision making problems based on binary relation. Firstly, the similarity measure between two vectors based on binary relation is defined, which can be utilized to aggregate preference information. Some desirable properties of the similarity measure based on fuzzy binary relation are also studied. Then, a methodology for fuzzy multiple criteria group decision making is proposed, in which the criteria values are in the terms of intuitionistic trapezoidal fuzzy numbers (ITFNs. Simple and exact formulas are also proposed to determine the vector of the aggregation and group set. According to the weighted expected values of group set, it is easy to rank the alternatives and select the best one. Finally, we apply the proposed method and the Cosine similarity measure method to a numerical example; the numerical results show that our method is effective and practical.
The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut sets according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure. (80 figures, 20 tables)
Background Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Results Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Conclusions Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data. PMID:24884964
Quellec, Gwénolé; Lamard, Mathieu; Bekri, Lynda; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice
A novel content-based information retrieval framework, designed to cover several medical applications, is presented in this paper. The presented framework allows the retrieval of possibly incomplete medical cases consisting of several images together with semantic information. It relies on a committee of decision trees, decision support tools well suited to process this type of information. In our proposed framework, images are characterized by their digital content. It was applied to two heterogeneous medical datasets for computer-aided diagnoses: a diabetic retinopathy follow-up dataset (DRD) and a mammography-screening dataset (DDSM). Measure of precision among the top five retrieved results of 0.788 + or - 0.137 and 0.869 + or - 0.161 was obtained on DRD and DDSM, respectively. On DRD, for instance, it increases by half the retrieval of single images.
This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.
Hammann, Felix; Drewe, Juergen
Decision tree induction (DTI) is a powerful means of modeling data without much prior preparation. Models are readable by humans, robust and easily applied in real-world applications, features that are mutually exclusive in other commonly used machine learning paradigms. While DTI is widely used in disciplines ranging from economics to medicine, they are an intriguing option in pharmaceutical research, especially when dealing with large data stores. This review covers the automated technologies available for creating decision trees and other rules efficiently, even from large datasets such as chemical libraries. The authors discuss the need for properly documented and validated models. Lastly, the authors cover several case studies in hit discovery, drug metabolism and toxicology, and drug surveillance, and compare them with other established techniques. DTI is a competitive and easy-to-use tool in basic research as well as in hit and drug discovery. Its strengths lie in its ability to handle all sorts of different data formats, the visual nature of the models, and the small computational effort needed for implementation in real-world systems. Limitations include lack of robustness and over-fitted models for certain types of data. As with any modeling technique, proper validation and quality measures are of utmost importance. © 2012 Informa UK, Ltd.
Full Text Available Diagnosis of peripheral oral exophytic lesions might be quite challenging. This review article aimed to introduce a decision tree for oral exophytic lesions according to their clinical features. General search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of keywords such as “oral soft tissue lesion,” “oral tumor like lesion,” “oral mucosal enlargement,” and “oral exophytic lesion.” Related English-language articles published since 1988 to 2016 in both medical and dental journals were appraised. Upon compilation of data, peripheral oral exophytic lesions were categorized into two major groups according to their surface texture: smooth (mesenchymal or nonsquamous epithelium-originated and rough (squamous epithelium-originated. Lesions with smooth surface were also categorized into three subgroups according to their general frequency: reactive hyperplastic lesions/inflammatory hyperplasia, salivary gland lesions (nonneoplastic and neoplastic, and mesenchymal lesions (benign and malignant neoplasms. In addition, lesions with rough surface were summarized in six more common lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by a stepwise progression method.
Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi
Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.
Yoshikawa, Nobuyuki; Koshiyama, J.
We have proposed a top-down design methodology for RSFQ logic circuits using a binary decision diagram (BDD), The BDD is a way to represent a logical function by a directed graph, which consists of binary switches having one input and two outputs. The important features of the BDD RSFQ logic circuits are a small number of primitives, dual rail and non-clocked logic style, and a small gate count. We have constructed a cell library for the BDD RSFQ logic design, which is composed of five square...
Hassan, Md; Kotagiri, Ramamohanarao
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Garcia, Ryan J B; von Winterfeldt, Detlof
We propose a methodology, called defender-attacker decision tree analysis, to evaluate defensive actions against terrorist attacks in a dynamic and hostile environment. Like most game-theoretic formulations of this problem, we assume that the defenders act rationally by maximizing their expected utility or minimizing their expected costs. However, we do not assume that attackers maximize their expected utilities. Instead, we encode the defender's limited knowledge about the attacker's motivations and capabilities as a conditional probability distribution over the attacker's decisions. We apply this methodology to the problem of defending against possible terrorist attacks on commercial airplanes, using one of three weapons: infrared-guided MANPADS (man-portable air defense systems), laser-guided MANPADS, or visually targeted RPGs (rocket propelled grenades). We also evaluate three countermeasures against these weapons: DIRCMs (directional infrared countermeasures), perimeter control around the airport, and hardening airplanes. The model includes deterrence effects, the effectiveness of the countermeasures, and the substitution of weapons and targets once a specific countermeasure is selected. It also includes a second stage of defensive decisions after an attack occurs. Key findings are: (1) due to the high cost of the countermeasures, not implementing countermeasures is the preferred defensive alternative for a large range of parameters; (2) if the probability of an attack and the associated consequences are large, a combination of DIRCMs and ground perimeter control are preferred over any single countermeasure. © 2016 Society for Risk Analysis.
This paper is devoted to the study of bi-criteria optimization problems for decision trees. We consider different cost functions such as depth, average depth, and number of nodes. We design algorithms that allow us to construct the set of Pareto optimal points (POPs) for a given decision table and the corresponding bi-criteria optimization problem. These algorithms are suitable for investigation of medium-sized decision tables. We discuss three examples of applications of the created tools: the study of relationships among depth, average depth and number of nodes for decision trees for corner point detection (such trees are used in computer vision for object tracking), study of systems of decision rules derived from decision trees, and comparison of different greedy algorithms for decision tree construction as single- and bi-criteria optimization algorithms.
Kamath, Chandrika; Cantu-Paz, Erick; Littau, David
A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.
Full Text Available The development status of human body motion gesture data fusion domestic and overseas has been analyzed. A triaxial accelerometer is adopted to develop a wearable human body motion gesture monitoring system aimed at old people healthcare. On the basis of a brief introduction of decision tree algorithm, the WEKA workbench is adopted to generate a human body motion gesture decision tree. At last, the classification quality of the decision tree has been validated through experiments. The experimental results show that the decision tree algorithm could reach an average predicting accuracy of 97.5 % with lower time cost.
Full Text Available Systematic approaches to making decisions in the public sector are becoming very common. Most often, these approaches concern expert decision models. The expansion of the idea of the development of e-participation and e-democracy was influenced by the development of technology. All stakeholders are supposed to participate in decision making, so this brings a new feature to the decision-making process, in which amateurs and non-specialists are participating decision making instead of experts. To be able to understand the needs and wishes of stakeholders, it is not enough to vote for alternatives - it is important to participate in solution-finding and to express opinions about the important elements of these matters. The solution presented in this paper concerns fuzzy decision-making framework. This framework combines the advantages of the introduction of the decision-making problem in a tree structure and the possibilities offered by the flexibility of the fuzzy approach. The possibilities of implementation of the framework in practice are introduced by case studies of investment projects appraisal in a community and assessment of efficiency and effectiveness of public institutions.
Ömür Yaşar SAATÇİOĞLU
Full Text Available Ships may encounter undesirable conditions during operations. In consequence of a casualty, fire, explosion, flooding, grounding, injury even death may occur. Besides, these results can be avoidable with precautions and preventive operating processes. In maritime transportation, casualties depend on various factors. These were listed as misuse of the engine equipment and tools, defective machinery or equipment, inadequacy of operational procedure and measure of safety and force majeure effects. Casualty reports which were published in Australia, New Zealand, United Kingdom, Canada and United States until 2015 were examined and the probable causes and consequences of casualties were determined with their occurrence percentages. In this study, 89 marine investigation reports regarding engine room casualties were analyzed. Casualty factors were analyzed with their frequency percentages and also their main causes were constructed. This study aims to investigate engine room based casualties, frequency of each casualty type and main causes by using decision tree method.
Scalzo, Fabien; Hamilton, Robert; Asgari, Shadnaz; Kim, Sunghan; Hu, Xiao
Intracranial pressure (ICP) elevation (intracranial hypertension, IH) in neurocritical care is typically treated in a reactive fashion; it is only delivered after bedside clinicians notice prolonged ICP elevation. A proactive solution is desirable to improve the treatment of intracranial hypertension. Several studies have shown that the waveform morphology of the intracranial pressure pulse holds predictors about future intracranial hypertension and could therefore be used to alert the bedside clinician of a likely occurrence of the elevation in the immediate future. In this paper, a computational framework is proposed to predict prolonged intracranial hypertension based on morphological waveform features computed from the ICP. A key contribution of this work is to exploit an ensemble classifier method based on extremely randomized decision trees (Extra-Trees). Experiments on a representative set of 30 patients admitted for various intracranial pressure related conditions demonstrate the effectiveness of the predicting framework on ICP pulses acquired under clinical conditions and the superior results of the proposed approach in comparison to linear and AdaBoost classifiers. Copyright © 2011 IPEM. Published by Elsevier Ltd. All rights reserved.
Bamber, J H; Evans, S A
The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Thuillard, Marc; Fraix-Burnet, Didier
This article presents an innovative approach to phylogenies based on the reduction of multistate characters to binary-state characters. We show that the reduction to binary characters’ approach can be applied to both character- and distance-based phylogenies and provides a unifying framework to explain simply and intuitively the similarities and differences between distance- and character-based phylogenies. Building on these results, this article gives a possible explanation on why phylogenetic trees obtained from a distance matrix or a set of characters are often quite reasonable despite lateral transfers of genetic material between taxa. In the presence of lateral transfers, outer planar networks furnish a better description of evolution than phylogenetic trees. We present a polynomial-time reconstruction algorithm for perfect outer planar networks with a fixed number of states, characters, and lateral transfers. PMID:26508826
Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim
Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.
Gessesse, B.; Bewket, W.; Bräuning, A.
Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, Plabour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.
We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.
The major aspire of this paper is to build a model to predict the chances of occurrences of disease in an area. This paper mainly concentrating the data mining technique-Decision tree model to identify the significant parameters for prediction process. The decision tree model created with the help of ID3 algorithm.
Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.
The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…
Poe, Retta E.
Outlines the development of a psychology careers decision tree to help faculty advise students plan their program. States that students using the decision tree may benefit by learning more about their career options and by acquiring better question-asking skills. (GEA)
Graham, Kate J.
When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…
In this paper, we present three approaches for construction of decision rules for decision tables with many-valued decisions. We construct decision rules directly for rows of decision table, based on paths in decision tree, and based on attributes contained in a test (super-reduct). Experimental results for the data sets taken from UCI Machine Learning Repository, contain comparison of the maximum and the average length of rules for the mentioned approaches.
This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.
Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek
The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.
Hauska, H.; Swain, P. H.
A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.
Background There are several common ways to encode a tree as a matrix, such as the adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of the natural random walk), and the matrix of pairwise distances between leaves. Such representations involve a specific labeling of the vertices or at least the leaves, and so it is natural to attempt to identify trees by some feature of the associated matrices that is invariant under relabeling. An obvious candidate is the spectrum of eigenvalues (or, equivalently, the characteristic polynomial). Results We show for any of these choices of matrix that the fraction of binary trees with a unique spectrum goes to zero as the number of leaves goes to infinity. We investigate the rate of convergence of the above fraction to zero using numerical methods. For the adjacency and Laplacian matrices, we show that the a priori more informative immanantal polynomials have no greater power to distinguish between trees. Conclusion Our results show that a generic large binary tree is highly unlikely to be identified uniquely by common spectral invariants. PMID:22613173
Full Text Available A method of blind recognition of the coding parameters for binary Bose-Chaudhuri-Hocquenghem (BCH codes is proposed in this paper. We consider an intelligent communication receiver which can blindly recognize the coding parameters of the received data stream. The only knowledge is that the stream is encoded using binary BCH codes, while the coding parameters are unknown. The problem can be addressed on the context of the non-cooperative communications or adaptive coding and modulations (ACM for cognitive radio networks. The recognition processing includes two major procedures: code length estimation and generator polynomial reconstruction. A hard decision method has been proposed in a previous literature. In this paper we propose the recognition approach in soft decision situations with Binary-Phase-Shift-Key modulations and Additive-White-Gaussian-Noise (AWGN channels. The code length is estimated by maximizing the root information dispersion entropy function. And then we search for the code roots to reconstruct the primitive and generator polynomials. By utilizing the soft output of the channel, the recognition performance is improved and the simulations show the efficiency of the proposed algorithm.
Full Text Available , direct 7-class prediction results in high misclassification rates. We therefore construct binary classifiers for all possible binary classification problems and combine them using Error Correcting Output Codes (ECOC) to form a 7-class predictor. ECOC...
Vasconcellos, Eduardo Charles
Vasconcellos et al  study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set.  Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R.. Decision Tree Classifiers for Star/Galaxy Separation. The Astronomical Journal, Volume 141, Issue 6, 2011.
Narusci S. Bastos
Full Text Available Even with emerging technologies, such as Brain-Computer Interfaces (BCI systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain’s behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain’s behaviour.
Bastos, Narusci S; Adamatti, Diana F; Billa, Cleo Z
Even with emerging technologies, such as Brain-Computer Interfaces (BCI) systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain's behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT) to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain's behaviour.
Abuhaiba, Ibrahim S.I.
In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)
The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.
Jerebko, Anna K.; Summers, Ronald M.; Malley, James D.; Franaszek, Marek; Johnson, C. Daniel
Detection of colonic polyps in CT colonography is problematic due to complexities of polyp shape and the surface of the normal colon. Published results indicate the feasibility of computer-aided detection of polyps but better classifiers are needed to improve specificity. In this paper we compare the classification results of two approaches: neural networks and recursive binary trees. As our starting point we collect surface geometry information from three-dimensional reconstruction of the colon, followed by a filter based on selected variables such as region density, Gaussian and average curvature and sphericity. The filter returns sites that are candidate polyps, based on earlier work using detection thresholds, to which the neural nets or the binary trees are applied. A data set of 39 polyps from 3 to 25 mm in size was used in our investigation. For both neural net and binary trees we use tenfold cross-validation to better estimate the true error rates. The backpropagation neural net with one hidden layer trained with Levenberg-Marquardt algorithm achieved the best results: sensitivity 90% and specificity 95% with 16 false positives per study
In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.
Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore
Full Text Available Altitude exposure increases the body’s need for iron (Gassmann and Muckenthaler, 2015, primarily to support accelerated erythropoiesis, yet clear supplementation guidelines do not exist. Athletes are typically recommended to ingest a daily oral iron supplement to facilitate altitude adaptations, and to help maintain iron balance. However, there is some debate as to whether athletes with otherwise healthy iron stores should be supplemented, due in part to concerns of iron overload. Excess iron in vital organs is associated with an increased risk of a number of conditions including cancer, liver disease and heart failure. Therefore clear guidelines are warranted and athletes should be discouraged from ‘self-prescribing” supplementation without medical advice. In the absence of prospective-controlled studies, decision tree analysis can be used to describe a data set, with the resultant regression tree serving as guide for clinical decision making. Here, we present a regression tree in the context of iron supplementation during altitude exposure, to examine the association between pre-altitude ferritin (Ferritin-Pre and the haemoglobin mass (Hbmass response, based on daily iron supplement dose. De-identified ferritin and Hbmass data from 178 athletes engaged in altitude training were extracted from the Australian Institute of Sport (AIS database. Altitude exposure was predominantly achieved via normobaric Live high: Train low (n = 147 at a simulated altitude of 3000 m for 2 to 4 weeks. The remaining athletes engaged in natural altitude training at venues ranging from 1350 to 2800 m for 3-4 weeks. Thus, the “hypoxic dose” ranged from ~890 km.h to ~1400 km.h. Ethical approval was granted by the AIS Human Ethics Committee, and athletes provided written informed consent. An in depth description and traditional analysis of the complete data set is presented elsewhere (Govus et al., 2015. Iron supplementation was prescribed by a sports physician
Nair, Shalini Rajandran; Tan, Li Kuo; Mohd Ramli, Norlisah; Lim, Shen Yang; Rahmat, Kartini; Mohd Nor, Hazman
To develop a decision tree based on standard magnetic resonance imaging (MRI) and diffusion tensor imaging to differentiate multiple system atrophy (MSA) from Parkinson's disease (PD). 3-T brain MRI and DTI (diffusion tensor imaging) were performed on 26 PD and 13 MSA patients. Regions of interest (ROIs) were the putamen, substantia nigra, pons, middle cerebellar peduncles (MCP) and cerebellum. Linear, volumetry and DTI (fractional anisotropy and mean diffusivity) were measured. A three-node decision tree was formulated, with design goals being 100 % specificity at node 1, 100 % sensitivity at node 2 and highest combined sensitivity and specificity at node 3. Nine parameters (mean width, fractional anisotropy (FA) and mean diffusivity (MD) of MCP; anteroposterior diameter of pons; cerebellar FA and volume; pons and mean putamen volume; mean FA substantia nigra compacta-rostral) showed statistically significant (P decision tree. Threshold values were 14.6 mm, 21.8 mm and 0.55, respectively. Overall performance of the decision tree was 92 % sensitivity, 96 % specificity, 92 % PPV and 96 % NPV. Twelve out of 13 MSA patients were accurately classified. Formation of the decision tree using these parameters was both descriptive and predictive in differentiating between MSA and PD. • Parkinson's disease and multiple system atrophy can be distinguished on MR imaging. • Combined conventional MRI and diffusion tensor imaging improves the accuracy of diagnosis. • A decision tree is descriptive and predictive in differentiating between clinical entities. • A decision tree can reliably differentiate Parkinson's disease from multiple system atrophy.
Bekena, Sisay Menji
In this study Random Forest Classifier machine learning algorithm is applied to predict income levels of individuals based on attributes including education, marital status, gender, occupation, country and others. Income levels are defined as a binary variable 0 for income
Chen, Gwo-Dong; Liu, Chen-Chung; Ou, Kuo-Liang; Liu, Baw-Jhiune
Discusses the use of Web logs to record student behavior that can assist teachers in assessing performance and making curriculum decisions for distance learning students who are using Web-based learning systems. Adopts decision tree and data cube information processing methodologies for developing more effective pedagogical strategies. (LRW)
Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.
The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
Kim, Jong Kyu; Kim, Nam Soo
In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
T.Miranda Lakshmi; A.Martin; R.Mumtaj Begum; V.Prasanna Venkatesan
Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classif...
Full Text Available This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering computation cost through the selection of data as well as the application and optimization of ID3 algorithm model.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Sand, Andreas; Holt, Morten Kragelund; Johansen, Jens
Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves, respecti......Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves...
National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...
Kim, Joungbum; Schwarm, Sarah E; Ostendorf, Mari
.... Specifically, combinations of decision trees and language models are used to predict sentence ends and interruption points and given these events transformation based learning is used to detect edit...
.... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...
Symone Maria de Melo Figueiredo
Full Text Available This study evaluated the accuracy of mapping land cover in Capixaba, state of Acre, Brazil, using decision trees. Elevenattributes were used to build the decision trees: TM Landsat datafrom bands 1, 2, 3, 4, 5, and 7; fraction images derived from linearspectral unmixing; and the normalized difference vegetation index (NDVI. The Kappa values were greater than 0,83, producingexcellent classification results and demonstrating that the technique is promising for mapping land cover in the study area.
REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
Lee, Anna; Joynt, Gavin M; Ho, Anthony M H; Keitz, Sheri; McGinn, Thomas; Wyer, Peter C
Decision analysis is a tool that clinicians can use to choose an option that maximizes the overall net benefit to a patient. It is an explicit, quantitative, and systematic approach to decision making under conditions of uncertainty. In this article, we present two teaching tips aimed at helping clinical learners understand the use and relevance of decision analysis. The first tip demonstrates the structure of a decision tree. With this tree, a clinician may identify the optimal choice among complicated options by calculating probabilities of events and incorporating patient valuations of possible outcomes. The second tip demonstrates how to address uncertainty regarding the estimates used in a decision tree. We field tested the tips twice with interns and senior residents. Teacher preparatory time was approximately 90 minutes. The field test utilized a board and a calculator. Two handouts were prepared. Learners identified the importance of incorporating values into the decision-making process as well as the role of uncertainty. The educational objectives appeared to be reached. These teaching tips introduce clinical learners to decision analysis in a fashion aimed to illustrate principles of clinical reasoning and how patient values can be actively incorporated into complex decision making.
Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.
Gorum, Tolga; Celal Tunusluoglu, M.; Sezer, Ebru; Nefeslioglu, Hakan A.; Bozkir, A. Selman; Gokceoglu, Candan
The landslides are accepted as one of the important natural hazards throughout the world. Besides, the regional landslide susceptibility assessments is one of the first stages of the landslide hazard mitigation efforts. For this purpose, various methods have been applied to produce landslide susceptibility maps for many years. However, application of decision tree to landslide susceptibility mapping, one of data mining methods, is not common. Considering this lack in the landslide literature,an application of decision tree method to landslide susceptibility mapping is the main purpose of the present study. As the study area, the Inegol region (Northwestern Turkey) is selected. In the first stage of the study, a landslide inventory is produced by aerial-photo interpretations and field studies. Employing 16 topographic and lithologic variables, the landslide susceptibility analyses are performed by decision tree method. The AUC (Area Under Curve) values for ROC (Receiver-Operating Characteristics) curves are calculated as 0.942 for the landslide susceptibility model obtained from the decision tree analysis. According to the AUC values, the decision tree analysis presents a considerable performance. As a result of the present study, it may be concluded that the decision tree method presents promising results for the regional landslide susceptibility assessment. However, the technique should be studied for different landslide-prone areas and compared with other prediction techniques such as logistic regression, artificial neural networks, fuzzy approaches, etc.
Wang, Ting; Li, Weiying; Zheng, Xiaofeng; Lin, Zhifen; Kong, Deyang
During the last past decades, there is an increasing number of studies about estrogenic activities of the environmental pollutants on amphibians and many determination methods have been proposed. However, these determination methods are time-consuming and expensive, and a rapid and simple method to screen and test the chemicals for estrogenic activities to amphibians is therefore imperative. Herein is proposed a new decision tree formulated not only with physicochemical parameters but also a biological parameter that was successfully used to screen estrogenic activities of the chemicals on amphibians. The biological parameter, CDOCKER interaction energy (Ebinding ) between chemicals and the target proteins was calculated based on the method of molecular docking, and it was used to revise the decision tree formulated by Hong only with physicochemical parameters for screening estrogenic activity of chemicals in rat. According to the correlation between Ebinding of rat and Xenopus laevis, a new decision tree for estrogenic activities in Xenopus laevis is finally proposed. Then it was validated by using the randomly 8 chemicals which can be frequently exposed to Xenopus laevis, and the agreement between the results from the new decision tree and the ones from experiments is generally satisfactory. Consequently, the new decision tree can be used to screen the estrogenic activities of the chemicals, and combinational use of the Ebinding and classical physicochemical parameters can greatly improves Hong's decision tree. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert
Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination. © 2015 American Academy of Forensic Sciences.
Full Text Available Decision tree classification is one of the most efficient methods for obtaining land use/land cover (LULC information from remotely sensed imageries. However, traditional decision tree classification methods cannot effectively eliminate the influence of mixed pixels. This study aimed to integrate pixel unmixing and decision tree to improve LULC classification by removing mixed pixel influence. The abundance and minimum noise fraction (MNF results that were obtained from mixed pixel decomposition were added to decision tree multi-features using a three-dimensional (3D Terrain model, which was created using an image fusion digital elevation model (DEM, to select training samples (ROIs, and improve ROI separability. A Landsat-8 OLI image of the Yunlong Reservoir Basin in Kunming was used to test this proposed method. Study results showed that the Kappa coefficient and the overall accuracy of integrated pixel unmixing and decision tree method increased by 0.093% and 10%, respectively, as compared with the original decision tree method. This proposed method could effectively eliminate the influence of mixed pixels and improve the accuracy in complex LULC classifications.
This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.
Full Text Available In this paper, we use the decision-making tree to explain the impact attendance has on students’ final success. The paper analyses the results of 56 students in 3 subjects during the academic year 2016/2017 (first, second and third- year students of Business Mathematics, Statistics and Managerial Economics at the SEE University in Tetovo . The results show that attendance is the most important of the 5 attributes in this study, placing itat the root of the tree. In constructing the Decision-making Tree, we have used the ID3 Algorithm within the Weka software package.
Esther I. Metting
Full Text Available The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215. Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma–COPD overlap syndrome (ACOS patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%. Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool.
Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin
The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.
Full Text Available A decision tree is one of the famous classifiers based on a recursive partitioning algorithm. This paper introduces the Boundary Expansion Algorithm (BEA to improve a decision tree induction that deals with an imbalanced dataset. BEA utilizes all attributes to define non-splittable ranges. The computed means of all attributes for minority instances are used to find the nearest minority instance, which will be expanded along all attributes to cover a minority region. As a result, BEA can successfully cope with an imbalanced dataset comparing with C4.5, Gini, asymmetric entropy, top-down tree, and Hellinger distance decision tree on 25 imbalanced datasets from the UCI Repository.
Kim, Yong Hee; Kim, Myung-Joon; Shin, Hyun Joo; Yoon, Haesung; Han, Seok Joo; Koh, Hong; Roh, Yun Ho; Lee, Mi-Jung
To evaluate MRI findings and to generate a decision tree model for diagnosis of biliary atresia (BA) in infants with jaundice. We retrospectively reviewed features of MRI and ultrasonography (US) performed in infants with jaundice between January 2009 and June 2016 under approval of the institutional review board, including the maximum diameter of periportal signal change on MRI (MR triangular cord thickness, MR-TCT) or US (US-TCT), visibility of common bile duct (CBD) and abnormality of gallbladder (GB). Hepatic subcapsular flow was reviewed on Doppler US. We performed conditional inference tree analysis using MRI findings to generate a decision tree model. A total of 208 infants were included, 112 in the BA group and 96 in the non-BA group. Mean age at the time of MRI was 58.7 ± 36.6 days. Visibility of CBD, abnormality of GB and MR-TCT were good discriminators for the diagnosis of BA and the MRI-based decision tree using these findings with MR-TCT cut-off 5.1 mm showed 97.3 % sensitivity, 94.8 % specificity and 96.2 % accuracy. MRI-based decision tree model reliably differentiates BA in infants with jaundice. MRI can be an objective imaging modality for the diagnosis of BA. • MRI-based decision tree model reliably differentiates biliary atresia in neonatal cholestasis. • Common bile duct, gallbladder and periportal signal changes are the discriminators. • MRI has comparable performance to ultrasonography for diagnosis of biliary atresia.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Lee, Saro; Park, Inhye
Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
Berry, E.A.; Hogeveen, H.; Hillerton, J.E.
Economic decisions on animal health strategies address the cost-benefit aspect along with animal welfare and public health concerns. Decision tree analysis at an individual cow level highlighted that there is little economic difference between the use of either dry cow antibiotic or an internal teat
Barger, Sara E.
Questions in a decision-tree address mission, faculty interest, administrative support, and practice plan as a way of assessing arrangements for nursing faculty's clinical practice. Decisions should be based on congruence between the human resource allocation and the reward systems. (SK)
Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node
Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto
This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Full Text Available Abstract Introduction: Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousands genetic markers were identified as factors associated with breast cancer. The objective of this study is to evaluate the training data on decision tree predictor error of the risk of breast cancer by using single nucleotide polymorphism genotype. Methods: The risk of breast cancer were calculated associated with the use of SNP formula:xj = fo * In human, The decision tree can be used To predict the probability of disease using single nucleotide polymorphisms .Seven SNP with different odds ratio associated with breast cancer considered and coding and design of decision tree model, C4.5, by Csharp2013 programming language were done. In the decision tree created with the coding, the four important associated SNP was considered. The decision tree error in two case of coding and using WEKA were assessment and percentage of decision tree accuracy in prediction of breast cancer were calculated. The number of trained samples was obtained with systematic sampling. With coding, two scenarios as well as software WEKA, three scenarios with different sets of data and the number of different learning and testing, were evaluated. Results: In both scenarios of coding, by increasing the training percentage from 66/66 to 86/42, the error reduced from 55/56 to 9/09. Also by running of WEKA on three scenarios with different sets of data, the number of different education, and different tests by increasing records number from 81 to 2187, the error rate decreased from 48/15 to 13
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657
Víctor Hugo eMasías
Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
HANG YANG; SIMON FONG
Big data has become a popular research topic since the data explosion in the past decade. An efficient analytical methodology provides a way of discovering the potential value from big data. Sampling technique is unsuitable any more that the full data will tell the truths. To this end, the data mining algorithm shall be robust to imperfect data, which may lead to tree size explosion and detrimental accuracy problems. In this paper, we propose an incremental optimization mechanism to solve the...
Zaitseva, Elena; Levashenko, Vitaly; Kostolny, Jozef
System availability evaluation, sensitivity analysis, Importance Measures, and optimal design are important issues that have become research topics for reliability engineering. There are different mathematical approaches to the development of these topics. The structure function based approach is one of them. Structure function enables one to analyse a system of any complexity. But computational complexity of structure function based methods is time consuming for large-scale networks. We propose to use two mathematical approaches for decision to this problem for system importance analysis. The first of them is Direct Partial Boolean Derivative. New equations for calculating the Importance Measures are developed in terms of these derivatives. The second is Binary Decision Diagram (BDD), that supports efficient manipulation of Boolean algebra. Two algorithms for calculating Direct Partial Boolean Derivative based on BDD of structure function are proposed in this paper. The experimental results show the efficiency of new algorithms for calculating Direct Partial Boolean Derivative and Importance Measures. - Highlights: • New approach for calculation of Importance Measures is proposed. • Direct Partial Boolean Derivatives are used for calculation of Importance Measures. • New equations for Importance Measures are obtained. • New algorithm to calculate Direct Partial Boolean Derivatives by BDD is developed
Full Text Available ). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V  Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...
Hall, R.E.; Fragola, J.; Wreathall, J.
This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations
Bo Suk Yang
Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.
Hall, R E; Fragola, J; Wreathall, J
This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.
In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.
Liu, Dong-sheng; Fan, Shu-jiang
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.
Kucheryavskiy, Sergey V.
Advanced machine learning methods, like convolutional neural networks and decision trees, became extremely popular in the last decade. This, first of all, is directly related to the current boom in Big data analysis, where traditional statistical methods are not efficient. According to the kaggle.......com — the most popular online resource for Big data problems and solutions — methods based on decision trees and their ensembles are most widely used for solving the problems. It can be noted that the decision trees and convolutional neural networks are not very popular in Chemometrics. One of the reasons...... for that is the landscape of the data matrix: the modern machine learning methods need number of measurements much larger than the number of variables to avoid overfitting, which is opposite to the layout of the data we usually deal with. Another drawback is a lack of interactive instruments for exploring...
Busbait, Monther I.
We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.
Wang, Jing; Li, Man; Hu, Yun-tao; Zhu, Yu
In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN) and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608) and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806). The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.
Full Text Available Abstract Background In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Methods Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. Results After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608 and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806. The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. Conclusion ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.
Surucu, Murat; Shah, Karan K; Mescioglu, Ibrahim; Roeske, John C; Small, William; Choi, Mehee; Emami, Bahman
To develop decision trees predicting for tumor volume reduction in patients with head and neck (H&N) cancer using pretreatment clinical and pathological parameters. Forty-eight patients treated with definitive concurrent chemoradiotherapy for squamous cell carcinoma of the nasopharynx, oropharynx, oral cavity, or hypopharynx were retrospectively analyzed. These patients were rescanned at a median dose of 37.8 Gy and replanned to account for anatomical changes. The percentages of gross tumor volume (GTV) change from initial to rescan computed tomography (CT; %GTVΔ) were calculated. Two decision trees were generated to correlate %GTVΔ in primary and nodal volumes with 14 characteristics including age, gender, Karnofsky performance status (KPS), site, human papilloma virus (HPV) status, tumor grade, primary tumor growth pattern (endophytic/exophytic), tumor/nodal/group stages, chemotherapy regimen, and primary, nodal, and total GTV volumes in the initial CT scan. The C4.5 Decision Tree induction algorithm was implemented. The median %GTVΔ for primary, nodal, and total GTVs was 26.8%, 43.0%, and 31.2%, respectively. Type of chemotherapy, age, primary tumor growth pattern, site, KPS, and HPV status were the most predictive parameters for primary %GTVΔ decision tree, whereas for nodal %GTVΔ, KPS, site, age, primary tumor growth pattern, initial primary GTV, and total GTV volumes were predictive. Both decision trees had an accuracy of 88%. There can be significant changes in primary and nodal tumor volumes during the course of H&N chemoradiotherapy. Considering the proposed decision trees, radiation oncologists can select patients predicted to have high %GTVΔ, who would theoretically gain the most benefit from adaptive radiotherapy, in order to better use limited clinical resources. © The Author(s) 2015.
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
Bauernfeind, V.; Ding, Y.
An analytical vibration model of the primary system of a 1300 MW PWR was used for simulating mechanical faults. Deviations in the calculated power density spectra and coherence functions are determined and classified. The decision tree technique is then used for a personal computer supported knowledge presentation and for optimizing the logical relationships between the simulated faults and the observed symptoms. The optimized decision tree forms the knowledge base and can be used to diagnose known cases as well as to include new data into the knowledge base if new faults occur. (author)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Roeloffzen, C.G.H.; Horst, F.; Horst, F.; Offrein, B.J.; Offrein, B.J.; Germann, R.; Bona, G.L.; Salemink, H.W.M.; de Ridder, R.M.
A tunable, flat-passband, 1-from-16 add/drop multiplexer for wavelength-division-multiplexing networks is presented. The device is realized in high-index-contrast silicon-oxynitride waveguide technology and is based on cascaded resonant coupler alters in the form of a mirrored binary tree.
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
Kim, Jaekwon; Lee, Jongsik; Lee, Youngho
The importance of the prediction of coronary heart disease (CHD) has been recognized in Korea; however, few studies have been conducted in this area. Therefore, it is necessary to develop a method for the prediction and classification of CHD in Koreans. A model for CHD prediction must be designed according to rule-based guidelines. In this study, a fuzzy logic and decision tree (classification and regression tree [CART])-driven CHD prediction model was developed for Koreans. Datasets derived from the Korean National Health and Nutrition Examination Survey VI (KNHANES-VI) were utilized to generate the proposed model. The rules were generated using a decision tree technique, and fuzzy logic was applied to overcome problems associated with uncertainty in CHD prediction. The accuracy and receiver operating characteristic (ROC) curve values of the propose systems were 69.51% and 0.594, proving that the proposed methods were more efficient than other models.
The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…
This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository  and datasets representing Boolean functions with 10 variables.
The aim of the paper is to present the characteristics of certain dynamic programming strategies on the decision tree hidden behind the optimizing problems and thus to offer such a clear tool for their study and classification which can help in the comprehension of the essence of this programming technique.
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
Sharon M. Hermann; John S. Kush; John C. Gilbert
We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....
Kamphuis, C.; Mollenhorst, H.; Feelders, A.; Pietersma, D.; Hogeveen, H.
a b s t r a c t This study explored the potential of using decision-tree induction to develop models for the detection of clinical mastitis with automatic milking. Sensor data (including electrical conductivity and colour) of over 711,000 quarter milkings were collected from December 2006 till
Metting, Esther I; In 't Veen, Johannes C C M; Dekhuijzen, P N Richard; van Heijst, Ellen; Kocks, Janwillem W H; Muilwijk-Kroes, Jacqueline B; Chavannes, Niels H; van der Molen, Thys
The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an
Rey S. Ofren; Edward Harvey
A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...
Metting, E.I.; Veen, J.C. In 't; Dekhuijzen, P.N.R.; Heijst, E. van; Kocks, J.W.; Muilwijk-Kroes, J.B.; Chavannes, N.H.; Molen, T. van der
The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53+/-17 years) with suspicion of an obstructive pulmonary disease was derived from an
Liu, Leo; Bak, Claus Leth; Chen, Zhe
With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...
This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository . © Springer-Verlag Berlin Heidelberg 2014.
Purpose: The purpose of this paper is to propose a new method to find the appropriate leadership styles based on the followers' preferences using the decision tree technique. Design/methodology/approach: Statistical population includes the students of the University of Isfahan. In total, 750 questionnaires were distributed; out of which, 680…
Lee, So Mi; Cheon, Jung Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun Hye; Kim, In One; You, Sun Kyoung [Dept. of Radiology, Seoul National University College of Medicine, Seoul (Korea, Republic of)
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Tayefi, Maryam; Esmaeili, Habibollah; Saberi Karimian, Maryam; Amirabadi Zadeh, Alireza; Ebrahimi, Mahmoud; Safarian, Mohammad; Nematy, Mohsen; Parizadeh, Seyed Mohammad Reza; Ferns, Gordon A; Ghayour-Mobarhan, Majid
Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mohammadzadeh, F; Noorkojuri, H; Pourhoseingholi, M A; Saadat, S; Baghestani, A R
Gastric cancer is the fourth most common cancer worldwide. This reason motivated us to investigate and introduce gastric cancer risk factors utilizing statistical methods. The aim of this study was to identify the most important factors influencing the mortality of patients who suffer from gastric cancer disease and to introduce a classification approach according to decision tree model for predicting the probability of mortality from this disease. Data on 216 patients with gastric cancer, who were registered in Taleghani hospital in Tehran,Iran, were analyzed. At first, patients were divided into two groups: the dead and alive. Then, to fit decision tree model to our data, we randomly selected 20% of dataset to the test sample and remaining dataset considered as the training sample. Finally, the validity of the model examined with sensitivity, specificity, diagnosis accuracy and the area under the receiver operating characteristic curve. The CART version 6.0 and SPSS version 19.0 softwares were used for the analysis of the data. Diabetes, ethnicity, tobacco, tumor size, surgery, pathologic stage, age at diagnosis, exposure to chemical weapons and alcohol consumption were determined as effective factors on mortality of gastric cancer. The sensitivity, specificity and accuracy of decision tree were 0.72, 0.75 and 0.74 respectively. The indices of sensitivity, specificity and accuracy represented that the decision tree model has acceptable accuracy to prediction the probability of mortality in gastric cancer patients. So a simple decision tree consisted of factors affecting on mortality of gastric cancer may help clinicians as a reliable and practical tool to predict the probability of mortality in these patients.
Tjortjis, C; Saraee, M; Theodoulidis, B; Keane, J A
Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical management and investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.
Wong G William
Full Text Available Abstract Background Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. Results This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost. We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. Conclusion In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection
Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh
Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According
Lorenzo, Daniel; Ochoa, María; Piulats, Josep Maria; Gutiérrez, Cristina; Arias, Luis; Català, Jaum; Grau, María; Peñafiel, Judith; Cobos, Estefanía; Garcia-Bru, Pere; Rubio, Marcos Javier; Padrón-Pérez, Noel; Dias, Bruno; Pera, Joan; Caminal, Josep Maria
The purpose of this study was to demonstrate the existence of a bimodal survival pattern in metastatic uveal melanoma. Secondary aims were to identify the characteristics and prognostic factors associated with long-term survival and to develop a clinical decision tree. The medical records of 99 metastatic uveal melanoma patients were retrospectively reviewed. Patients were classified as either short (≤ 12 months) or long-term survivors (> 12 months) based on a graphical interpretation of the survival curve after diagnosis of the first metastatic lesion. Ophthalmic and oncological characteristics were assessed in both groups. Of the 99 patients, 62 (62.6%) were classified as short-term survivors, and 37 (37.4%) as long-term survivors. The multivariate analysis identified the following predictors of long-term survival: age ≤ 65 years (p=0.012) and unaltered serum lactate dehydrogenase levels (p=0.018); additionally, the size (smaller vs. larger) of the largest liver metastasis showed a trend towards significance (p=0.063). Based on the variables significantly associated with long-term survival, we developed a decision tree to facilitate clinical decision-making. The findings of this study demonstrate the existence of a bimodal survival pattern in patients with metastatic uveal melanoma. The presence of certain clinical characteristics at diagnosis of distant disease is associated with long-term survival. A decision tree was developed to facilitate clinical decision-making and to counsel patients about the expected course of disease.
Full Text Available In an effort to improve the quality of customer service, especially in terms of feasibility assessment of borrowers due to the increasing number of new prospective borrowers loans financing the purchase of a motor vehicle, then the company needs a decision making tool allowing you to easily and quickly estimate Where the debtor is able to pay off the loans. This study discusses the process generates C4.5 decision tree algorithm and utilizing the learning group of debtor financing dataset motorcycle. The decision tree is then interpreted into the form of decision rules that can be understood and used as a reference in processing the data of borrowers in determining the feasibility of prospective new borrowers. Feasibility value refers to the value of the destination parameter credit status. If the value of the credit is paid off status mean estimated prospective borrower is able to repay the loan in question, but if the credit status parameters estimated worth pull means candidates concerned debtor is unable to pay loans.. System testing is done by comparing the results of the testing data by learning data in three scenarios with the decision that the data is valid at over 70% for all case scenarios. Moreover, in generated tree and generate rules takes fairly quickly, which is no more than 15 minutes for each test scenario
Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun
Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using
Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael
The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. © 2015 Society for Conservation Biology.
Charlot, Philippe; Marimoutou, Vêlayoudom
This study examines the volatility and correlation and their relationships among the euro/US dollar exchange rates, the S and P500 equity indices, and the prices of WTI crude oil and the precious metals (gold, silver, and platinum) over the period 2005 to 2012. Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. The ensuing Hidden Markov Decision Tree (HMDT) model is in fact an extension of the Hidden Markov Model (HMM) introduced by Jordan et al. (1997). The architecture of this model is the opposite that of the classical deterministic approach based on a binary decision tree and, it allows a probabilistic vision of the relationship between univariate volatility and correlation. Our results are categorized into three groups, namely (1) exchange rates and oil, (2) S and P500 indices, and (3) precious metals. A switching dynamics is seen to characterize the volatilities, while, in the case of the correlations, the series switch from one regime to another, this movement touching a peak during the period of the Subprime crisis in the US, and again during the days following the Tohoku earthquake in Japan. Our findings show that the relationships between volatility and correlation are dependent upon the nature of the series considered, sometimes corresponding to those found in econometric studies, according to which correlation increases in bear markets, at other times differing from them. - Highlights: • This study examines the volatility and correlation and their relationships of precious metals and crude oil. • Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. • This model allows a probabilistic point of view of the relationship between univariate volatility and correlation. • Results show the relationships between volatility and correlation are dependent upon the nature of the series considered
Full Text Available For the sustainable use of groundwater, this study analyzed groundwater productivity-potential using a decision-tree approach in a geographic information system (GIS in Boryeong and Pohang cities, Korea. The model was based on the relationship between groundwater-productivity data, including specific capacity (SPC, and its related hydrogeological factors. SPC data which is measured and calculated for groundwater productivity and data about related factors, including topography, lineament, geology, forest and soil data, were collected and input into a spatial database. A decision-tree model was applied and decision trees were constructed using the chi-squared automatic interaction detector (CHAID and the quick, unbiased, and efficient statistical tree (QUEST algorithms. The resulting groundwater-productivity-potential (GPP maps were validated using area-under-the-curve (AUC analysis with the well data that had not been used for training the model. In the Boryeong city, the CHAID and QUEST algorithms had accuracies of 83.31% and 79.47%, and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 86.18% and 80.00%. As another validation, the GPP maps were validated by comparing the actual SPC data. As the result, in the Boryeong city, the CHAID and QUEST algorithms had accuracies of 96.55% and 94.92% and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 87.88% and 87.50%. These results indicate that decision-tree models can be useful for development of groundwater resources.
image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software “R”; the generation of the dense and accurate digital surface model by the “Match-T DSM” program of the Trimble Company. A practical...... like buildings, roads, grassland, trees, hedges, and walls from such an ‘intelligent’ point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using...
Shin, Hoyoung; Jae, Moosung
In this study, a probabilistic assessment of the severe accident management strategy through a filtered containment venting system was performed by using decision tree models. In Korea, the filtered containment venting system has been installed for the first time in Wolsong unit 1 as a part of Fukushima follow-up steps, and it is planned to be applied gradually for all the remaining reactors. Filtered containment venting system, one of severe accident countermeasures, prevents a gradual pressurization of the containment building exhausting noncondensable gas and vapor to the outside of the containment building. In this study, a probabilistic assessment of the filtered containment venting strategy, one of the severe accident management strategies, was performed by using decision tree models. Containment failure frequencies of each decision were evaluated by the developed decision tree model. The optimum accident management strategies were evaluated by comparing the results. Various strategies in severe accident management guidelines (SAMG) could be improved by utilizing the methodology in this study and the offsite risk analysis methodology
Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank
Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Diagnostic accuracy study validating a clinical prediction rule. Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Physicians were asked to score the decision tree in every child. The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. NCT02024282. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.
Ma, Yu-gang; Bi, Yu-xue; Yan, Hong; Deng, Li-na; Liang, Wei-feng; Wang, Bei; Zhang, Xue-li
To study the application of decision tree in the research of anemia among rural children. In the Enterprise Miner module of software SAS 8.2, 3000 observations were sampled from database and the decision tree model was built. The model using decision tree of CART bases on Gini impurity index. The misclassification rate of decision tree model was, training set 21.2%, validation set 21.9%. The Root ASE of decision tree model was, training set 0.399, validation set 0.404. The area under the ROC curve was larger than the reference line. The diagnostic chart showed that the corresponding percentage was higher than the other. The decision tree model selected 9 important factors and ranked them by their power, among which mother of anemia (1.00) was the most important factor. Others were children's age (0.75), time of ablactation (0.53), mother's age (0.32), the time of egg supplementation (0.26), category of the project county (0.26), the time of milk supplementation (0.16), number of people in the family (0.13), the education status of the mother (0.12). Decision tree produced simple and easy rules that might be used to classify and predict in the same research. Decision tree could screen out the important factors of anemia and identify the cutting-points for factors. With the wide application of decision tree, it would exhibit important application values in the research of the rural children health care.
Holm Atkins, Tara E; Öhman, Malin C; Brabrand, Mikkel
INTRODUCTION: Early warning scores (EWS) have been developed to identify the degree of illness severity among acutely ill patients. One system, The Laboratory Decision Tree Early Warning Score (LDT-EWS) is wholly laboratory data based. Laboratory data was used in the development of a rare...... computerized method, developing a decision tree analysis. This article externally validates LDT-EWS, which is obligatory for an EWS before clinical use. METHOD: We conducted a retrospective review of prospectively collected data based on a time limited sample of all patients admitted through the medical...... with a goodness-of-fit test of X2=5.37 (7 degrees of freedom), p=0.62. CONCLUSION: LDT-EWS has acceptable ability to identify patients at high risk of dying during hospitalization with good precision. Further studies performing impact analysis are required before this score should be implemented in clinical...
Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.
Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.
In this thesis, I studied sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. I investigated on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. In doing so, I obtained a consistent and anytime algorithm, that can easily be combined with existing strong heuristic solvers. (author)
Oral, L. O.; Tecim, V.
Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.
L. O. Oral
Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.
Baumont, G.; Menage, F.; Schneiter, J.R.; Spurgin, A.; Vogel, A
In the framework of the level 2 Probabilistic Safety Study (PSA 2) project, the Institute for Nuclear Safety and Protection (IPSN) has developed a method for taking into account Human and Organizational Reliability Aspects during accident management. Actions are taken during very degraded installation operations by teams of experts in the French framework of Crisis Organization (ONC). After describing the background of the framework of the Level 2 PSA, the French specific Crisis Organization and the characteristics of human actions in the Accident Progression Event Tree, this paper describes the method developed to introduce in PSA the Human and Organizational Reliability Analysis in Accident Management (HORAAM). This method is based on the Decision Tree method and has gone through a number of steps in its development. The first one was the observation of crisis center exercises, in order to identify the main influence factors (IFs) which affect human and organizational reliability. These IFs were used as headings in the Decision Tree method. Expert judgment was used in order to verify the IFs, to rank them, and to estimate the value of the aggregated factors to simplify the quantification of the tree. A tool based on Mathematica was developed to increase the flexibility and the efficiency of the study.
Full Text Available The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.
Deptuła, A.; Partyka, M. A.
The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.
Thanh G. Phan
Full Text Available BackgroundPrognostication following hypoxic ischemic encephalopathy (brain injury is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data.MethodThe inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003–2011. Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS, features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume associates of severe disability and death. We used the area under the ROC (auROC to determine accuracy of model. There were 41 (63.7% males patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82–1.00. At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86–1.00.ConclusionOur findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.
Almeida, Leandro S.; Gomes, Cristiano Mauro Assis
Predictive studies have been widely undertaken in the field of education to provide strategic information about the extensive set of processes related to teaching and learning, as well as about what variables predict certain educational outcomes, such as academic achievement or dropout. As in any other area, there is a set of standard techniques that is usually used in predictive studies in the field education. Even though the Decision Tree Method is a well-known and standard approach in Data...
Frida Seyedmir; Kamal Mirzaie; Morteza Bitaraf Sani
Abstract Introduction: Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousan...
High-level triggering is a vital component in many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called "bonsai" BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.
Gligorov, V. V.; Williams, M.
High-level triggering is a vital component of many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called bonsai BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.
Gligorov, V V; Williams, M
High-level triggering is a vital component of many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called bonsai BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.
Razavi, Amir R; Gill, Hans; Åhlfeldt, Hans; Shahsavar, Nosrat
Background: The guideline for postmastectomy radiotherapy (PMRT), which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Methods: Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI) found patterns of non-compliance with the guideline. The PMRT guideline was us...
Sasser, Howell; Nussbaum, Marcy; Beuhler, Michael; Ford, Marsha
Identification of predictors of potential mass poisonings may increase the speed and accuracy with which patients are recognized, potentially reducing the number ultimately exposed and the degree to which they are affected. This analysis used a decision-tree method to sort such potential predictors. Data from the Toxic Exposure Surveillance System were used to select cyanide and botulism cases from 1993 to 2005 for analysis. Cases of other poisonings from a single poison center were used as controls. After duplication was omitted and removal of cases from the control sample was completed, there remained 1,122 cyanide cases, 262 botulism cases, and 70,804 controls available for both analyses. Classification trees for each poisoning type were constructed, using 131 standardized clinical effects. These decision rules were compared with the current case surveillance definitions of one active poison center and the American Association of Poison Control Centers (AAPCC). The botulism analysis produced a 4-item decision rule with sensitivity (Se) of 68% and specificity (Sp) of 90%. Use of the single poison center and AAPCC definitions produced Se of 19.5% and 16.8%, and Sp of 99.5% and 83.2%, respectively. The cyanide analysis produced a 9-item decision rule with Se of 74% and Sp of 77%. The single poison center and AAPCC case definitions produced Se of 10.2% and 8.6%, and Sp of 99.8% and 99.8%, respectively. These results suggest the possibility of improved poisoning case surveillance sensitivity using classification trees. This method produced substantially higher sensitivities, but not specificities, for both cyanide and botulism. Despite limitations, these results show the potential of a classification-tree approach in the detection of poisoning events.
Hellmuth, Marc; Stadler, Peter F; Wieseke, Nicolas
The concepts of orthology, paralogy, and xenology play a key role in molecular evolution. Orthology and paralogy distinguish whether a pair of genes originated by speciation or duplication. The corresponding binary relations on a set of genes form complementary cographs. Allowing more than two types of ancestral event types leads to symmetric symbolic ultrametrics. Horizontal gene transfer, which leads to xenologous gene pairs, however, is inherent asymmetric since one offspring copy "jumps" into another genome, while the other continues to be inherited vertically. We therefore explore here the mathematical structure of the non-symmetric generalization of symbolic ultrametrics. Our main results tie non-symmetric ultrametrics together with di-cographs (the directed generalization of cographs), so-called uniformly non-prime ([Formula: see text]) 2-structures, and hierarchical structures on the set of strong modules. This yields a characterization of relation structures that can be explained in terms of trees and types of ancestral events. This framework accommodates a horizontal-transfer relation in terms of an ancestral event and thus, is slightly different from the the most commonly used definition of xenology. As a first step towards a practical use, we present a simple polynomial-time recognition algorithm of [Formula: see text] 2-structures and investigate the computational complexity of several types of editing problems for [Formula: see text] 2-structures. We show, finally that these NP-complete problems can be solved exactly as Integer Linear Programs.
Riggs, George A.; Hall, Dorothy K.
Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.
Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica
The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release. Copyright © 2012 Elsevier B.V. All rights reserved.
Drakakis, Georgios; Moledina, Saadiq; Chomenidis, Charalampos; Doganis, Philip; Sarimveis, Haralambos
Decision trees are renowned in the computational chemistry and machine learning communities for their interpretability. Their capacity and usage are somewhat limited by the fact that they normally work on categorical data. Improvements to known decision tree algorithms are usually carried out by increasing and tweaking parameters, as well as the post-processing of the class assignment. In this work we attempted to tackle both these issues. Firstly, conditional mutual information was used as the criterion for selecting the attribute on which to split instances. The algorithm performance was compared with the results of C4.5 (WEKA's J48) using default parameters and no restrictions. Two datasets were used for this purpose, DrugBank compounds for HRH1 binding prediction and Traditional Chinese Medicine formulation predicted bioactivities for therapeutic class annotation. Secondly, an automated binning method for continuous data was evaluated, namely Scott's normal reference rule, in order to allow any decision tree to easily handle continuous data. This was applied to all approved drugs in DrugBank for predicting the RDKit SLogP property, using the remaining RDKit physicochemical attributes as input.
Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots
Recently, concern has been expressed about potential toxic effects of both radon emission and release of toxic elements in leachate from inactive uranium mill tailings piles. Remedial action may be required to meet disposal standards set by the states and the US Environmental Protection Agency (EPA). In some cases, a possible disposal option is the exhumation and reburial (either on site or at a new location) of tailings and reliance on engineered barriers to satisfy the objectives established for remedial actions. Liners under disposal pits are the major engineered barrier for preventing contaminant release to ground and surface water. The purpose of this report is to provide a logical sequence of action, in the form of a decision tree, which could be followed to show whether a selected tailings disposal design meets the objectives for subsurface contaminant release without a liner. This information can be used to determine the need and type of liner for sites exhibiting a potential groundwater problem. The decision tree is based on the capability of hydrologic and mass transport models to predict the movement of water and contaminants with time. The types of modeling capabilities and data needed for those models are described, and the steps required to predict water and contaminant movement are discussed. A demonstration of the decision tree procedure is given to aid the reader in evaluating the need for the adequacy of a liner
Full Text Available Background: Data mining is known as a process of discovering and analysing large amounts of data in order to find meaningful rules and trends. In healthcare, data mining offers numerous opportunities to study the unknown patterns in a data set. These patterns can be used to diagnosis, prognosis and treatment of patients by physicians. The main objective of this study was to predict the level of serum ferritin in women with anemia and to specify the basic predictive factors of iron deficiency anemia using data mining techniques. Methods: In this research 690 patients and 22 variables have been studied in women population with anemia. These data include 11 laboratories and 11 clinical variables of patients related to the patients who have referred to the laboratory of Imam Hossein and Shohada-E- Haft Tir hospitals from April 2013 to April 2014. Decision tree technique has been used to build the model. Results: The accuracy of the decision tree with all the variables is 75%. Different combinations of variables were examined in order to determine the best model to predict. Regarding the optimum obtained model of the decision tree, the RBC, MCH, MCHC, gastrointestinal cancer and gastrointestinal ulcer were identified as the most important predictive factors. The results indicate if the values of MCV, MCHC and MCH variables are normal and the value of RBC variable is lower than normal limitation, it is diagnosed that the patient is likely 90% iron deficiency anemia. Conclusion: Regarding the simplicity and the low cost of the complete blood count examination, the model of decision tree was taken into consideration to diagnose iron deficiency anemia in patients. Also the impact of new factors such as gastrointestinal hemorrhoids, gastrointestinal surgeries, different gastrointestinal diseases and gastrointestinal ulcers are considered in this paper while the previous studies have been limited only to assess laboratory variables. The rules of the
Loukis Euripides N
Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that
Huys, Quentin J M; Eshel, Neir; O'Nions, Elizabeth; Sheridan, Luke; Dayan, Peter; Roiser, Jonathan P
When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.
Quentin J M Huys
Full Text Available When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.
Liu, K E; Lo, C-L; Hu, Y-H
Due to the narrow therapeutic range and high drug-to-drug interactions (DDIs), improving the adequate use of warfarin for the elderly is crucial in clinical practice. This study examines whether the effectiveness of using warfarin among elderly inpatients can be improved when machine learning techniques and data from the laboratory information system are incorporated. Having employed 288 validated clinical cases in the DDI group and 89 cases in the non-DDI group, we evaluate the prediction performance of seven classification techniques, with and without an Adaptive Boosting (AdaBoost) algorithm. Measures including accuracy, sensitivity, specificity and area under the curve are used to evaluate model performance. Decision tree-based classifiers outperform other investigated classifiers in all evaluation measures. The classifiers supplemented with AdaBoost can generally improve the performance. In addition, weight, congestive heart failure, and gender are among the top three critical variables affecting prediction accuracy for the non-DDI group, while age, ALT, and warfarin doses are the most influential factors for the DDI group. Medical decision support systems incorporating decision tree-based approaches improve predicting performance and thus may serve as a supplementary tool in clinical practice. Information from laboratory tests and inpatients' history should not be ignored because related variables are shown to be decisive in our prediction models, especially when the DDIs exist.
Matusik, Stanisław; Laska-Mierzejewska, Teresa; Chrzanowska, Maria
The aim of this study was to assess the usefulness of the decision trees method as a research method of multidimensional associations between menarche and socioeconomic variables. The article is based on data collected from the rural area of Choszczno in the West Pomerania district of Poland between 1987 and 2001. Girls were asked about the appearance of first menstruation (a yes/no method). The average menarchal age was estimated by the probit analysis method, using second grade polynomials. The socioeconomic status of the girls' families was determined using five qualitative variables: fathers' and mothers' educational level, source of income, household appliances and the number of children in a family. For classification based on five socioeconomic variables, one of the most effective algorithms CART (Classification and Regression Trees) was used. In 2001 the menarchal age in 66% of examined girls was properly classified, while a higher efficiency of 70% was obtained for girls examined in 1987. The decision trees method enabled the definition of the hierarchy of socioeconomic variables influencing girls' biological development level. The strongest discriminatory power was attributed to the number of children in a family, and the mother's and then father's educational level. Using this method it is possible to detect differences in strength of socioeconomic variables associated with girls' pubescence before 1987 and after 2001 during the transformation of the economic and political systems in Poland. However, the decision trees method is infrequently applied in social sciences and constitutes a novelty; this article proves its usefulness in examining relations between biological processes and a population's living conditions.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B
To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions.
Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh
The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as "Inputs" were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as "Class". We applied the "Decision Tree" technique and "J48" algorithm in the WEKA (3.6.10 version) software to develop the model. After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.
Jeng, Albert; Chang, Li-Chung; Chen, Sheng-Hui
There are many protocols proposed for protecting Radio Frequency Identification (RFID) system privacy and security. A number of these protocols are designed for protecting long-term security of RFID system using symmetric key or public key cryptosystem. Others are designed for protecting user anonymity and privacy. In practice, the use of RFID technology often has a short lifespan, such as commodity check out, supply chain management and so on. Furthermore, we know that designing a long-term security architecture to protect the security and privacy of RFID tags information requires a thorough consideration from many different aspects. However, any security enhancement on RFID technology will jack up its cost which may be detrimental to its widespread deployment. Due to the severe constraints of RFID tag resources (e. g., power source, computing power, communication bandwidth) and open air communication nature of RFID usage, it is a great challenge to secure a typical RFID system. For example, computational heavy public key and symmetric key cryptography algorithms (e. g., RSA and AES) may not be suitable or over-killed to protect RFID security or privacy. These factors motivate us to research an efficient and cost effective solution for RFID security and privacy protection. In this paper, we propose a new effective generic binary tree based key agreement protocol (called BKAP) and its variations, and show how it can be applied to secure the low cost and resource constraint RFID system. This BKAP is not a general purpose key agreement protocol rather it is a special purpose protocol to protect privacy, un-traceability and anonymity in a single RFID closed system domain.
Oh, Hyo-Sook; Park, Hyeoun-Ae
This study was performed to develop and test a decision-tree model of treatment-seeking behaviors about when Korean patients visit a doctor after experiencing stroke symptoms. The study used methodological triangulation. The model was developed based on qualitative data collected from in-depth interviews with 18 stroke patients. The model was tested using quantitative data collected from interviews and a structured questionnaire involving 150 stroke patients. The predictability of the decision-tree model was quantified as the proportion of participants who followed the pathway predicted by the model. Decision outcomes of the model were categorized into immediate and delayed treatment-seeking behavior. The model was influenced by lowered consciousness, social-group influences, perceived seriousness of symptoms, past history of hypertension or stroke, and barriers to hospital visits. The predictability of the model was found to be 90.7%. The results from this study can help healthcare personnel understand the education needs of stroke patients regarding treatment-seeking behaviors, and hence aid in the development of educational strategies for stroke patients.
Jones, Vincent P; Brunner, Jay F; Grove, Gary G; Petit, Brad; Tangren, Gerald V; Jones, Wendy E
Integrated pest management (IPM) decision-making has become more information intensive in Washington State tree crops in response to changes in pesticide availability, the development of new control tactics (such as mating disruption) and the development of new information on pest and natural enemy biology. The time-sensitive nature of the information means that growers must have constant access to a single source of verified information to guide management decisions. The authors developed a decision support system for Washington tree fruit growers that integrates environmental data [140 Washington State University (WSU) stations plus weather forecasts from NOAA], model predictions (ten insects, four diseases and a horticultural model), management recommendations triggered by model status and a pesticide database that provides information on non-target impacts on other pests and natural enemies. A user survey in 2008 found that the user base was providing recommendations for most of the orchards and acreage in the state, and that users estimated the value at $ 16 million per year. The design of the system facilitates education on a range of time-sensitive topics and will make it possible easily to incorporate other models, new management recommendations or information from new sensors as they are developed.
A boosted decision tree is used to identify unique jets in a recently released conference note describing a search for long lived particles decaying to hadrons in the ATLAS Calorimeter. Neutral Long lived particles decaying to hadrons are “typical” signatures in a lot of models including Hidden Valley models, Higgs Portal Models, Baryogenesis, Stealth SUSY, etc. Long lived neutral particles that decay in the calorimeter leave behind an object that looks like a regular Standard Model jet, with subtle differences. For example, the later in the calorimeter it decays, the less energy will be deposited in the early layers of the calorimeter. Because the jet does not originate at the interaction point, it will likely be more narrow as reconstructed by the standard Anti-kT jet reconstruction algorithm used by ATLAS. To separate the jets due to neutral long lived decays from the standard model jets we used a boosted decision tree with thirteen variables as inputs. We used the information from the boosted decision...
Rao, Ping; Chen, Shengbo; Sun, Ke
Soil Salinity, caused by natural or human-induced processes, is not only a major cause of soil degradation but also a major environmental hazard all over the world. This results in increasing impact on crop yields and agricultural production in both dry and irrigated areas due to poor land and water management. Multi-temporal optical and microwave remote sensing can significantly contribute to detecting spatial-temporal changes of salt-related surface features. The study area is located in the west of Jilin Province, Northeast China, which is one of most important saline-alkalized areas in semi-arid and arid area in North China. Decision tree classifiers are used to improve the classification of soil salinity on Landsat Thematic Mapper (TM) images in later autumn of 1996. The Kauth-Thomas (K-T) transformation was performed after TM image preprocessing including image registration, mosaic and resizing for the study area. Then the first component of KT transformation, TM 6 imagery (thermal infrared imagery), and NDVI (Normalized Difference Vegetation Index) from TM 4 and TM 3 images, were density-sliced respectively to establish suitable feature classes of soil salinity as the decision nodes. Thus, the classification of soil salinity was improved using decision trees based on these feature classes. Compared with the conventional maximum likelihood classification, this method is more effective to distinguish soil salinity from mixed residential and sand areas in the west of Jilin Province, China.
Birjandi, Mehdi; Ayatollahi, Seyyed Mohammad Taghi; Pourahmad, Saeedeh
Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes) is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD), considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered.
Full Text Available Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD, considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered.
The purpose of this study was to identify risk factors affecting laryngeal pathology in the Korean population and to evaluate the derived prediction model. Cross-sectional study. Data were drawn from the 2008 Korea National Health and Nutritional Examination Survey. The subjects were 3135 persons (1508 male and 2114 female) aged 19 years and older living in the community. The independent variables were age, sex, occupation, smoking, alcohol drinking, and self-reported voice problems. A decision tree analysis was done to identify risk factors for predicting a model of laryngeal pathology. The significant risk factors of laryngeal pathology were age, gender, occupation, smoking, and self-reported voice problem in decision tree model. Four significant paths were identified in the decision tree model for the prediction of laryngeal pathology. Those identified as high risk groups for laryngeal pathology included those who self-reported a voice problem, those who were males in their 50s who did not recognize a voice problem, those who were not economically active males in their 40s, and male workers aged 19 and over and under 50 or 60 and over who currently smoked. The results of this study suggest that individual risk factors, such as age, sex, occupation, health behavior, and self-reported voice problem, affect the onset of laryngeal pathology in a complex manner. Based on the results of this study, early management of the high-risk groups is needed for the prevention of laryngeal pathology. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Firouzi, Farzad; Rashidi, Marjan; Hashemi, Sattar; Kangavari, Mohammadreza; Bahari, Ali; Daryani, Naser Ebrahimi; Emam, Mohammad Mehdi; Naderi, Nosratollah; Shalmani, Hamid Mohaghegh; Farnood, Alma; Zali, Mohammadreza
Decision tree classification is a standard machine learning technique that has been used for a wide range of applications. Patients with inflammatory bowel disease (IBD) are at increased risk of developing low bone mineral density (BMD). This study aimed at developing a new approach to select truly affected IBD patients who are indicated for densitometry, hence, subjecting fewer patients for bone densitometry and reducing expenses. Simple decision trees have been developed by means of WEKA (Waikato Environment for Knowledge Analysis) package of machine learning algorithms to predict factors influencing the bone density among IBD patients. The BMD status was the outcome variable whereas age, sex, duration of disease, smoking status, corticosteroid use, oral contraceptive use, calcium or vitamin D supplementation, menstruation, milk abstinence, BMI, and levels of calcium, phosphorous, alkaline phosphatase, and 25-OH vitamin D were all attributes. Testing showed the decision trees to have sensitivities of 65.7-82.8%, specificities of 95.2-96.3%, accuracies of 86.2-89.8%, and Matthews correlation coefficients of 0.68-0.79. Smoking status was the most significant node (root) for ulcerative colitis and IBD-associated trees whereas calcium status was the root of Crohn's disease patients' decision tree. BD specialists could use such decision trees to reduce substantially the number of patients referred for bone densitometry and potentially save resources.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Poulos, H M; Camp, A E
Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.
Jae, Moosung [Hanyang Univ., Seoul (Korea, Republic of). Dept. of Nuclerar Engineering; Lee, Yongjin; Jerng, Dong Wook [Chung-Ang Univ., Seoul (Korea, Republic of). School of Energy Systems Engineering
Accident management strategies are defined to innovative actions taken by plant operators to prevent core damage or to maintain the sound containment integrity. Such actions minimize the chance of offsite radioactive substance leaks that lead to and intensify core damage under power plant accident conditions. Accident management extends the concept of Defense in Depth against core meltdown accidents. In pressurized water reactors, emergency operating procedures are performed to extend the core cooling time. The effectiveness of Severe Accident Management Guidance (SAMG) became an important issue. Severe accident management strategies are evaluated with a methodology utilizing the decision tree technique.
E. I. Kobysh
Full Text Available In this paper was developed the control system of group of hot blast stoves, which operates on the basis of the packing heating control subsystem and subsystem of forecasting of modes duration in the hot blast stoves APCS of iron smelting in a blast furnace. With the use of multi-criteria optimization methods, implemented the adjustment of control system conduct, which takes into account the current production situation that has arisen in the course of the heating packing of each hot blast stove group. Developed a situation recognition algorithm and the choice of scenarios of control based on a decision tree.
Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory) and uses decision tree a...
Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy
A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.
Rather, Zakir Hussain; Liu, Leo; Chen, Zhe
Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented......The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... significant impact on dynamic security of Danish power system in future, if alternative measures are not considered seriously....
Deptuła, A.; Partyka, M. A.
The method of minimization of complex partial multi-valued logical functions determines the degree of importance of construction and exploitation parameters playing the role of logical decision variables. Logical functions are taken into consideration in the issues of modelling machine sets. In multi-valued logical functions with weighting products, it is possible to use a modified Quine - McCluskey algorithm of multi-valued functions minimization. Taking into account weighting coefficients in the logical tree minimization reflects a physical model of the object being analysed much better
Full Text Available The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all the variables used in the study had a significant effect on mathematically gifted students’ learning styles, but the most effective attribute found was intelligence type.
Auld, Garry W; Diker, Ann; Bock, M Ann; Boushey, Carol J; Bruhn, Christine M; Cluskey, Mary; Edlefsen, Miriam; Goldberg, Dena L; Misner, Scottie L; Olson, Beth H; Reicks, Marla; Wang, Changzheng; Zaghloul, Sahar
A decision tree was developed to determine when NVivo is an appropriate tool for qualitative analysis. NVivo, a qualitative analysis software package, was used to analyze interviews of 204 Asian, Hispanic, and white parents in 12 states. The experience provided insight into issues that should be considered when deciding to use the software. NVivo can enhance the qualitative research process, quickly process queries, and expand analytical avenues. Before using, however, the following must be considered: training time, establishing inter-coder reliability, number and length of documents, coding time, coding structure, use of automated coding, and possible need for separate databases or additional supporting software.
Dexter H. Locke; J. Morgan Grove; Michael Galvin; Jarlath P.M. ONeil-Dunne; Charles. Murphy
Urban Tree Canopy (UTC) Prioritizations can be both a set of geographic analysis tools and a planning process for collaborative decision-making. In this paper, we describe how UTC Prioritizations can be used as a planning process to provide decision support to multiple government agencies, civic groups and private businesses to aid in reaching a canopy target. Linkages...
joko popo minardi
Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%. Keywords: Expert system; Diseases of pregnancy; Dempster Shafer
Full Text Available Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the
Kose, Semir; Altunyurt, Sabahattin; Yıldırım, Nuri; Keskinoğlu, Pembe; Çankaya, Tufan; Bora, Elçin; Erçal, Derya; Özer, Erdener
By looking through our ethical committee cases, we demonstrate the main arguments we use for making a judgment in face of fetal abnormalities. Our decision making model is a simplified algorithm of the arguments and concepts we use in scientific-ethic discussion. A retrospective analysis was conducted from single, tertiary referral center of patients evaluated for fetal abnormalities from 2004 to 2014. We hypothesized that all our judgments would fit into a decision-tree model. 553 fetal abnormality cases were discussed, 348 (63%) were given termination of pregnancy (TOP) proposal. When detected genetic disorders (n:100) and with mental retardation risk (n:93) ended up with TOP proposal. For incompatibility with life cases (n:111) and the multimorbidity cases (n:44) the committee suggest TOP, regardless of gestational age. The highest family approval ratios were in chromosomal abnormalities/genetic disorders group (93%), and the lowest figures were in mental retardation risk group (80%). Continuously changing literature on prenatal and postnatal therapy options and the long term outcome of various fetal abnormalities influence committee decisions. Theoretical high success rates and inconsistent data on long term prognosis of some anomaly groups resulted in heterogenous decisions and various approval ratios. © 2015 John Wiley & Sons, Ltd.
Alfonso L. Palmer
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.
Nourani, Vahid; Molajou, Amir
The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.
Camila Viana Vieira Farhate
Full Text Available ABSTRACT: The use of data mining is a promising alternative to predict soil respiration from correlated variables. Our objective was to build a model using variable selection and decision tree induction to predict different levels of soil respiration, taking into account physical, chemical and microbiological variables of soil as well as precipitation in renewal of sugarcane areas. The original dataset was composed of 19 variables (18 independent variables and one dependent (or response variable. The variable-target refers to soil respiration as the target classification. Due to a large number of variables, a procedure for variable selection was conducted to remove those with low correlation with the variable-target. For that purpose, four approaches of variable selection were evaluated: no variable selection, correlation-based feature selection (CFS, chisquare method (χ2 and Wrapper. To classify soil respiration, we used the decision tree induction technique available in the Weka software package. Our results showed that data mining techniques allow the development of a model for soil respiration classification with accuracy of 81 %, resulting in a knowledge base composed of 27 rules for prediction of soil respiration. In particular, the wrapper method for variable selection identified a subset of only five variables out of 18 available in the original dataset, and they had the following order of influence in determining soil respiration: soil temperature > precipitation > macroporosity > soil moisture > potential acidity.
Edrian E. Gonzales
Full Text Available – This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.
St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.
One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.
Chen, C.-W.; Hsieh, P.-H.; Lai, W.-H.
The purpose of the research is to build a collision avoidance system with decision tree algorithm used for quadcopters. While the ultrasonic range finder judges the distance is in collision avoidance interval, the access will be replaced from operator to the system to control the altitude of the UAV. According to the former experiences on operating quadcopters, we can obtain the appropriate pitch angle. The UAS implement the following three motions to avoid collisions. Case1: initial slow avoidance stage, Case2: slow avoidance stage and Case3: Rapid avoidance stage. Then the training data of collision avoidance test will be transmitted to the ground station via wireless transmission module to further analysis. The entire decision tree algorithm of collision avoidance system, transmission data, and ground station have been verified in some flight tests. In the flight test, the quadcopter can implement avoidance motion in real-time and move away from obstacles steadily. In the avoidance area, the authority of the collision avoidance system is higher than the operator and implements the avoidance process. The quadcopter can successfully fly away from the obstacles in 1.92 meter per second and the minimum distance between the quadcopter and the obstacle is 1.05 meters.
Full Text Available The purpose of the research is to build a collision avoidance system with decision tree algorithm used for quadcopters. While the ultrasonic range finder judges the distance is in collision avoidance interval, the access will be replaced from operator to the system to control the altitude of the UAV. According to the former experiences on operating quadcopters, we can obtain the appropriate pitch angle. The UAS implement the following three motions to avoid collisions. Case1: initial slow avoidance stage, Case2: slow avoidance stage and Case3: Rapid avoidance stage. Then the training data of collision avoidance test will be transmitted to the ground station via wireless transmission module to further analysis. The entire decision tree algorithm of collision avoidance system, transmission data, and ground station have been verified in some flight tests. In the flight test, the quadcopter can implement avoidance motion in real-time and move away from obstacles steadily. In the avoidance area, the authority of the collision avoidance system is higher than the operator and implements the avoidance process. The quadcopter can successfully fly away from the obstacles in 1.92 meter per second and the minimum distance between the quadcopter and the obstacle is 1.05 meters.
Kang, Dae Il; Han, Sang Hoon; Lim, Jae Won
KAERI developed the method, called a mapping technique, for the quantification of external events PSA models with one top model for an internal events PSA. The mapping technique can be implemented by the construction of mapping tables. The mapping tables include initiating events and transfer events of fire, and internal PSA basic events affected by a fire. This year, KAERI is making mapping tables for the one top model for Ulchin Unit 3 and 4 fire PSA with previously conducted Fire PSA results for Ulchin Unit 3 and 4. A Fire PSA requires a PSA analyst to determine component failure modes affected by a fire. The component failure modes caused by a fire depend on several factors. These several factors are whether components are located at fire initiation and propagation areas or not, fire effects on control and power cables for components, designed failure modes of components, success criteria in a PSA model, etc. Thus, it is not easy to manually determine component failure modes caused by a fire. In this paper, we propose the use of decision trees for the determination of component failure modes affected by a fire and the selection of internal PSA basic events. Section 2 presents the procedure for previously performed the Ulchin Unit 3 and 4 fire PSA and mapping technique. Section 3 presents the process for identification of basic events and decision trees. Section 4 presents the concluding remarks
Full Text Available Aim To show the benefits of data mining in health care management.In this example, we are going to show a way to raise awarenessof women in terms of contraceptive methods they use (do notuse.Methods Goal of the data mining analysis was to determine ifthere are common characteristics of the women according to theirchoice of contraception (typical classification problem. Therefore,we decided to use decision trees. We have generated a CHAIDmodel in “Statistica”, based on the database that was formed as aresult of an Indonesian research that was conducted in 1987. Thesample contains married women who were either not pregnant ordid not know if they were pregnant at the time of the interview.The database consists of 1473 cases. Also, an extensive internetsearch was conducted in order to detect a number of articles citedin scientific databases published on the subject of data mining inhealth care management.Results It has shown that the most important variable in case ofwomen’s choice of contraceptive methods is – a husband’s profession.Also we retrieved 221 articles published on the application ofdata mining in health care.Conclusion The goal of the paper is achieved in two ways: first,retrieving 221 articles published on the subject we have proved thebenefits of data mining in the health care management. Second,the decision tree method is successfully applied in explanation ofwomen’s choice of contraceptive methods.
Kadi, Ilham; Idri, Ali
Decision trees (DTs) are one of the most popular techniques for learning classification systems, especially when it comes to learning from discrete examples. In real world, many data occurred in a fuzzy form. Hence a DT must be able to deal with such fuzzy data. In fact, integrating fuzzy logic when dealing with imprecise and uncertain data allows reducing uncertainty and providing the ability to model fine knowledge details. In this paper, a fuzzy decision tree (FDT) algorithm was applied on a dataset extracted from the ANS (Autonomic Nervous System) unit of the Moroccan university hospital Avicenne. This unit is specialized on performing several dynamic tests to diagnose patients with autonomic disorder and suggest them the appropriate treatment. A set of fuzzy classifiers were generated using FID 3.4. The error rates of the generated FDTs were calculated to measure their performances. Moreover, a comparison between the error rates obtained using crisp and FDTs was carried out and has proved that the results of FDTs were better than those obtained using crisp DTs.
Christopher Sean Greene
Full Text Available Management decisions grounded in ecological understanding are essential to the maintenance of a healthy urban forest. Decisions about where and what tree species to plant have both short and long-term consequences for the future function and resilience of city trees. Through the construction of a theoretical damage index, this study examines the legacy effects of a street tree planting program in a densely populated North American city confronting an invasion of emerald ash borer (Agrilus planipennis. An investigation of spatial autocorrelation for locations of high damage potential across the City of Toronto, Canada was then conducted using Getis-Ord Gi*. Significant spatial clustering of high damage index values affirmed that past urban tree planting practices placing little emphasis on species diversity have created time-lagged consequences of enhanced vulnerability of trees to insect pests. Such consequences are observed at the geographically local scale, but can easily cascade to become multi-scalar in their spatial extent. The theoretical damage potential index developed in this study provides a framework for contextualizing historical urban tree planting decisions where analysis of damage index values for Toronto reinforces the importance of urban forest management that prioritizes proactive tree planting strategies that consider species diversity in the context of planting location.
Malueka, Rusdy Ghazali; Takaoka, Yutaka; Yagi, Mariko; Awano, Hiroyuki; Lee, Tomoko; Dwianingsih, Ery Kus; Nishida, Atsushi; Takeshima, Yasuhiro; Matsuo, Masafumi
Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a) identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.
Full Text Available The aim of the study was to investigate factors associated with coliform mastitis in sows, determined at herd level, by applying the decision-tree technique. Coliform mastitis represents an economically important disease in sows after farrowing that also affects the health, welfare and performance of the piglets. The decision-tree technique, a data mining method, may be an effective tool for making large datasets accessible and different sow herd information comparable. It is based on the C4.5-algorithm which generates trees in a top-down recursive strategy. The technique can be used to detect weak points in farm management. Two datasets of two farms in Germany, consisting of sow-related parameters, were analysed and compared by decision-tree algorithms. Data were collected over the period of April 2007 to August 2010 from 987 sows (499 CM-positive sows and 488 CM-negative sows and 596 sows (322 CM-positive sows and 274 CM-negative sows, respectively. Depending on the dataset, different graphical trees were built showing relevant factors at the herd level which may lead to coliform mastitis. To our understanding, this is the first time decision-tree modeling was used to assess risk factors for coliform mastitis. Herd specific risk factors for the disease were illustrated what could prove beneficial in disease and herd management.
Vilalta, R; Ocegueda-Hernandez, F; Valerio, R; Watts, G
Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.
Sarmast, Nima D; Wang, Howard H; Soldatos, Nikolaos K; Angelov, Nikola; Dorn, Samuel; Yukna, Raymond; Iacono, Vincent J
Although retrograde peri-implantitis (RPI) is not a common sequela of dental implant surgery, its prevalence has been reported in the literature to be 0.26%. Incidence of RPI is reported to increase to 7.8% when teeth adjacent to the implant site have a previous history of root canal therapy, and it is correlated with distance between implant and adjacent tooth and/or with time from endodontic treatment of adjacent tooth to implant placement. Minimum 2 mm space between implant and adjacent tooth is needed to decrease incidence of apical RPI, with minimum 4 weeks between completion of endodontic treatment and actual implant placement. The purpose of this study is to compile all available treatment modalities and to provide a decision tree as a general guide for clinicians to aid in diagnosis and treatment of RPI. Literature search was performed for articles published in English on the topic of RPI. Articles selected were case reports with study populations ranging from 1 to 32 patients. Any case report or clinical trial that attempted to treat or rescue an implant diagnosed with RPI was included. Predominant diagnostic presentation of a lesion was presence of sinus tract at buccal or facial abscess of apical portion of implant, and subsequent periapical radiographs taken demonstrated a radiolucent lesion. On the basis of case reports analyzed, RPI was diagnosed between 1 week and 4 years after implant placement. Twelve of 20 studies reported that RPI lesions were diagnosed within 6 months after implant placement. A step-by-step decision tree is provided to allow clinicians to triage and properly manage cases of RPI on the basis of recommendations and successful treatments provided in analyzed case reports. It is divided between symptomatic and asymptomatic implants and adjacent teeth with vital and necrotic pulps. Most common etiology of apical RPI is endodontic infection from neighboring teeth, which was diagnosed within 6 months after implant placement. Most
Kusmartsev, F. V.; Kürten, Karl E.
We propose a new theory of the human mind. The formation of human mind is considered as a collective process of the mutual interaction of people via exchange of opinions and formation of collective decisions. We investigate the associated dynamical processes of the decision making when people are put in different conditions including risk situations in natural catastrophes when the decision must be made very fast or at national elections. We also investigate conditions at which the fast formation of opinion is arising as a result of open discussions or public vote. Under a risk condition the system is very close to chaos and therefore the opinion formation is related to the order disorder transition. We study dramatic changes which may happen with societies which in physical terms may be considered as phase transitions from ordered to chaotic behavior. Our results are applicable to changes which are arising in various social networks as well as in opinion formation arising as a result of open discussions. One focus of this study is the determination of critical parameters, which influence a formation of stable mind, public opinion and where the society is placed "at the edge of chaos". We show that social networks have both, the necessary stability and the potential for evolutionary improvements or self-destruction. We also show that the time needed for a discussion to take a proper decision depends crucially on the nature of the interactions between the entities as well as on the topology of the social networks.
Satomi, Junichiro; Ghaibeh, A Ammar; Moriguchi, Hiroki; Nagahiro, Shinji
The severity of clinical signs and symptoms of cranial dural arteriovenous fistulas (DAVFs) are well correlated with their pattern of venous drainage. Although the presence of cortical venous drainage can be considered a potential predictor of aggressive DAVF behaviors, such as intracranial hemorrhage or progressive neurological deficits due to venous congestion, accurate statistical analyses are currently not available. Using a decision tree data mining method, the authors aimed at clarifying the predictability of the future development of aggressive behaviors of DAVF and at identifying the main causative factors. Of 266 DAVF patients, 89 were eligible for analysis. Under observational management, 51 patients presented with intracranial hemorrhage/infarction during the follow-up period. The authors created a decision tree able to assess the risk for the development of aggressive DAVF behavior. Evaluated by 10-fold cross-validation, the decision tree's accuracy, sensitivity, and specificity were 85.28%, 88.33%, and 80.83%, respectively. The tree shows that the main factor in symptomatic patients was the presence of cortical venous drainage. In its absence, the lesion location determined the risk of a DAVF developing aggressive behavior. Decision tree analysis accurately predicts the future development of aggressive DAVF behavior.
Chugh, Saryu; Arivu Selvan, K.; Nadesh, RK
Numerous destructive things influence the working arrangement of human body as hypertension, smoking, obesity, inappropriate medication taking which causes many contrasting diseases as diabetes, thyroid, strokes and coronary diseases. The impermanence and horribleness of the environment situation is also the reason for the coronary disease. The structure of Apache start relies on the evolution which requires gathering of the data. To break down the significance of use programming focused on data structure the Apache stop ought to be utilized and it gives various central focuses as it is fast in light as it uses memory worked in preparing. Apache Spark continues running on dispersed environment and chops down the data in bunches giving a high profitability rate. Utilizing mining procedure as a part of the determination of coronary disease has been exhaustively examined indicating worthy levels of precision. Decision trees, Neural Network, Gradient Boosting Algorithm are the various apache spark proficiencies which help in collecting the information.
Amir Arzy Soltan
Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.
ALİ SERHAN KOYUNCUGİL
Full Text Available The aim of this study is to detect the strength and weakness of SMEs which have a significant position in globalization. 697 SMEs listed in the İstanbul Stock Exchange (ISE during the years 2000-2005 were covered in the study. Data Mining method, which can be describe as a collection of techniques that aim to find useful but undiscovered patterns in collected and Chi-Square Automatic Interaction Detector (CHAID decision tree algorithms, one of the data mining method was used for segmentation in the study. As a result of the study, SMEs listed in the ISE were categorized in 19 different profiles by the CHAID and it was founded that strengths and weakness of the SMEs were identified by strategies of the equity and assets productivity, financing fixed assets, management of accounts receivables and liquidity
Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.
Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.
Sancak, Eyup Burak; Kılınç, Muhammet Fatih; Yücebaş, Sait Can
The decision on the choice of proximal ureteral stone therapy depends on many factors, and sometimes urologists have difficulty in choosing the treatment option. This study is aimed at evaluating the factors affecting the success of semirigid ureterorenoscopy (URS) using the "decision tree" method. From January 2005 to November 2015, the data of consecutive patients treated for proximal ureteral stone were retrospectively analyzed. A total of 920 patients with proximal ureteral stone treated with semirigid URS were included in the study. All statistically significant attributes were tested using the decision tree method. The model created using decision tree had a sensitivity of 0.993 and an accuracy of 0.857. While URS treatment was successful in 752 patients (81.7%), it was unsuccessful in 168 patients (18.3%). According to the decision tree method, the most important factor affecting the success of URS is whether the stone is impacted to the ureteral wall. The second most important factor affecting treatment was intramural stricture requiring dilatation if the stone is impacted, and the size of the stone if not impacted. Our study suggests that the impacted stone, intramural stricture requiring dilatation and stone size may have a significant effect on the success rate of semirigid URS for proximal ureteral stone. Further studies with population-based and longitudinal design should be conducted to confirm this finding. © 2017 S. Karger AG, Basel.
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Kim, Hyun Kyung; Jeong, Seok Hee; Kang, Hyun Cheol
This study was performed to explore levels of stroke knowledge and identify subgroups with lower levels of stroke knowledge among adults in Korea. A cross-sectional survey was used and data were collected in 2012. A national sample of 990 Koreans aged 20 to 74 years participated in this study. Knowledge of risk factors, warning signs, and first action for stroke were surveyed using face-to-face interviews. Descriptive statistics and decision tree analysis were performed using SPSS WIN 20.0 and Answer Tree 3.1. Mean score for stroke risk factor knowledge was 7.7 out of 10. The least recognized risk factor was diabetes and four subgroups with lower levels of knowledge were identified. Score for knowledge of stroke warning signs was 3.6 out of 6. The least recognized warning sign was sudden severe headache and six subgroups with lower levels of knowledge were identified. The first action for stroke was recognized by 65.7 percent of participants and four subgroups with lower levels of knowledge were identified. Multi-faceted education should be designed to improve stroke knowledge among Korean adults, particularly focusing on subgroups with lower levels of knowledge and less recognition of items in this study.
Weinberg Abraham Itzhak
Full Text Available When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Pyo, J-S; Sohn, J H; Kang, G
The aim of this study was to elucidate the cytological characteristics and the diagnostic usefulness of intraoperative cytology (IOC) for papillary thyroid carcinoma (PTC). In addition, using decision tree analysis, effective features for accurate cytological diagnosis were sought. We investigated cellularity, cytological features and diagnosis based on the Bethesda System for Reporting Thyroid Cytopathology in IOC of 240 conventional PTCs. The cytological features were evaluated in terms of nuclear score with nuclear features, and additional figures such as presence of swirling sheets, psammoma bodies, and multinucleated giant cells. The nuclear score (range 0-7) was made via seven nuclear features, including (1) enlarged, (2) oval or irregularly shaped nuclei, (3) longitudinal nuclear grooves, (4) intranuclear cytoplasmic pseudoinclusion, (5) pale nuclei with powdery chromatin, (6) nuclear membrane thickening, and (7) marginally placed micronucleoli. Nuclear scores in PTC, suspicious for malignancy, and atypia of undetermined significance cases were 6.18 ± 0.80, 4.48 ± 0.82, and 3.15 ± 0.67, respectively. Additional figures more frequent in PTC than in other diagnostic categories were identified. Cellularity of IOC significantly correlated with tumor size, nuclear score, and presence of additional figures. Also, IOCs with higher nuclear scores (4-7) significantly correlated with larger tumor size and presence of additional figures. In decision tree analysis, IOCs with nuclear score >5 and swirling sheets could be considered diagnostic for PTCs. Our study suggests that IOCs using nuclear features and additional figures could be useful with decreasing the likelihood of inconclusive results.
Full Text Available Fog computing, as the supplement of cloud computing, can provide low-latency services between mobile users and the cloud. However, fog devices may encounter security challenges as a result of the fog nodes being close to the end users and having limited computing ability. Traditional network attacks may destroy the system of fog nodes. Intrusion detection system (IDS is a proactive security protection technology and can be used in the fog environment. Although IDS in tradition network has been well investigated, unfortunately directly using them in the fog environment may be inappropriate. Fog nodes produce massive amounts of data at all times, and, thus, enabling an IDS system over big data in the fog environment is of paramount importance. In this study, we propose an IDS system based on decision tree. Firstly, we propose a preprocessing algorithm to digitize the strings in the given dataset and then normalize the whole data, to ensure the quality of the input data so as to improve the efficiency of detection. Secondly, we use decision tree method for our IDS system, and then we compare this method with Naïve Bayesian method as well as KNN method. Both the 10% dataset and the full dataset are tested. Our proposed method not only completely detects four kinds of attacks but also enables the detection of twenty-two kinds of attacks. The experimental results show that our IDS system is effective and precise. Above all, our IDS system can be used in fog computing environment over big data.
Suner, A; Karakülah, G; Dicle, O; Sökmen, S; Çelikoğlu, C C
The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians' decision making. The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratiodecisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options.
Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I
Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value.
Trefz Florian M
Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed
Jukna, S.; Razborov, A.; Savický, Petr; Wegener, I.
Roč. 8, č. 4 (1999), s. 357-370 ISSN 1016-3328 R&D Projects: GA ČR GA201/95/0976 Institutional research plan: AV0Z1030915 Keywords : computational complexity * Boolean functions * decision trees * branching programs * P versus NP intersection co-NP Subject RIV: BA - General Mathematics Impact factor: 0.161, year: 1999
Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.
Pedersen, Mona H; Hansen, Tine K; Sten, Eva
All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study...
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung
A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…
Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
Full Text Available In this study we use visible, short-wave infrared and thermal Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER data validated with high-resolution Quickbird (QB and Worldview2 (WV2 for mapping debris cover in the eastern Himalaya using two independent approaches: (a a decision tree algorithm, and (b texture analysis. The decision tree algorithm was based on multi-spectral and topographic variables, such as band ratios, surface reflectance, kinetic temperature from ASTER bands 10 and 12, slope angle, and elevation. The decision tree algorithm resulted in 64 km2 classified as debris-covered ice, which represents 11% of the glacierized area. Overall, for ten glacier tongues in the Kangchenjunga area, there was an area difference of 16.2 km2 (25% between the ASTER and the QB areas, with mapping errors mainly due to clouds and shadows. Texture analysis techniques included co-occurrence measures, geostatistics and filtering in spatial/frequency domain. Debris cover had the highest variance of all terrain classes, highest entropy and lowest homogeneity compared to the other classes, for example a mean variance of 15.27 compared to 0 for clouds and 0.06 for clean ice. Results of the texture image for debris-covered areas were comparable with those from the decision tree algorithm, with 8% area difference between the two techniques.
Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid
Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang
Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.
Beelen P van; ECO
This report presents a decision tree for the risk evaluation of the so-called "difficult" substances with the Uniform System for the Evaluation of Substances (USES). The decision tree gives practical guidelines for the regulatory authorities to evaluate notified substances like organometallic
Moon, Mikyung; Lee, Soo-Kyoung
The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89 * ). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
Borysiewicz, Mieczysław; Kowal, Karol; Potempski, Sławomir
A new framework of integrated risk informed decision making (IRIDM) has been recently developed in order to improve the risk management of the nuclear facilities. IRIDM is a process in which qualitatively different inputs, corresponding to different types of risk, are jointly taken into account. However, the relative importance of the IRIDM inputs and their influence on the decision to be made is difficult to be determined quantitatively. An improvement of this situation can be achieved by application of the Value Tree Analysis (VTA) methods. The aim of this article is to present the VTA methodology in the context of its potential usage in the decision making on nuclear facilities. The benefits of the VTA application within the IRIDM process were identified while making the decision on fuel conversion of the research reactor MARIA. - Highlights: • New approach to risk informed decision making on nuclear facilities was postulated. • Value tree diagram was developed for decision processes on nuclear installations. • An experiment was performed to compare the new approach with the standard one. • Benefits of the new approach were reached in fuel conversion of a research reactor. • The new approach makes the decision making process more transparent and auditable
Sankari, E Siva; Manimegalai, D
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
McFadden, F. Lee
A self-instructional program on decision making was used in conjunction with workshops to introduce the staff of an instructional materials company to the decision tree process as they used it to study their own film production problem. (Author/MS)
Khosravi, Khabat; Pham, Binh Thai; Chapi, Kamran; Shirzadi, Ataollah; Shahabi, Himan; Revhaug, Inge; Prakash, Indra; Tien Bui, Dieu
Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas. Copyright © 2018 Elsevier B.V. All rights reserved.
Sand, Andreas; Holt, Morten K; Johansen, Jens; Brodal, Gerth Stølting; Mailund, Thomas; Pedersen, Christian N S
tqDist is a software package for computing the triplet and quartet distances between general rooted or unrooted trees, respectively. The program is based on algorithms with running time [Formula: see text] for the triplet distance calculation and [Formula: see text] for the quartet distance calculation, where n is the number of leaves in the trees and d is the degree of the tree with minimum degree. These are currently the fastest algorithms both in theory and in practice. tqDist can be installed on Windows, Linux and Mac OS X. Doing this will install a set of command-line tools together with a Python module and an R package for scripting in Python or R. The software package is freely available under the GNU LGPL licence at http://birc.au.dk/software/tqDist. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Tanaka, Tomohiro; Voigt, Michael D
Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2 = 0.971, p risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p risk of developing NMSC in the long-term after LT.
El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Molinari, Nicolas; Taourel, Patrice
To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. • Appendiceal diameter and marked left tubal thickening allow differentiating PID from AA. • PID should be considered if appendiceal diameter is < 7 mm. • Marked left tubal diameter indicates PID rather than AA when enlarged appendix. • No pathological CT findings were identified in 5 % of PID patients.
El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Taourel, Patrice; Molinari, Nicolas
To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. (orig.)
Harwati; Sudiya, Amby
The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.
Low, Jia Fu; Busch, Elena Laura; Carnes, Andrew Mathew; Furic, Ivan-Kresimir; Gleyzer, Sergei; Kotov, Khristian; Madorsky, Alexander; Rorie, Jamal Tildon; Scurlock, Bobby; Shi, Wei; Acosta, Darin Edward
The first implementation of Boosted Decision Trees (BDTs) inside a Level-1 trigger system at the LHC is presented. The Endcap Muon Track Finder (EMTF) at CMS uses BDTs to infer the momentum of muons in the forward region of the detector, based on 25 different variables. Combinations of these variables are evaluated offline using regression BDTs, whose output is stored in 1.2 GB look-up tables (LUTs) in the EMTF hardware. These BDTs take advantage of complex correlations between variables, the inhomogeneous magnetic field, and non-linear effects such as inelastic scattering to distinguish high-momentum signal muons from the overwhelming low-momentum background. The LUTs are used to turn the complex BDT evaluation into a simple look-up operation in fixed low latency. The new momentum assignment algorithm has reduced the trigger rate by a factor of 3 at the 25 GeV trigger threshold with respect to the legacy system, with further improvements foreseen in the coming year.
Velásquez, Lía; Cruz-Tirado, J P; Siche, Raúl; Quevedo, Roberto
The aim of this study was to develop a system to classify the marbling of beef using the hyperspectral imaging technology. The Japanese standard classification of the degree of marbling of beef was used as reference and twelve standards were digitized to obtain the parameters of shape and spatial distribution of marbling of each class. A total of 35 samples M. longissmus dorsi muscle were scanned by the hyperspectral imaging system of 400-1000 nm in reflectance mode. The wavelength of 528nm was selected to segment the sample and the background, and 440nm was used for classified the samples. Processing algorithms on image, based on decision tree method, were used in the region of interest obtaining a classification error of 0.08% in the building stage. The results showed that the proposed technique has a great potential, as a non-destructive and fast technique, that can be used to classify beef with respect to the degree of marbling. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar
Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have.
Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.
Ana Paula de Assis Maia
Full Text Available Thermal comfort is of great importance in preserving body temperature homeostasis during thermal stress conditions. Although the thermal comfort of horses has been widely studied, there is no report of its relationship with surface temperature (T S. This study aimed to assess the potential of data mining techniques as a tool to associate surface temperature with thermal comfort of horses. T S was obtained using infrared thermography image processing. Physiological and environmental variables were used to define the predicted class, which classified thermal comfort as "comfort" and "discomfort". The variables of armpit, croup, breast and groin T S of horses and the predicted classes were then subjected to a machine learning process. All variables in the dataset were considered relevant for the classification problem and the decision-tree model yielded an accuracy rate of 74 %. The feature selection methods used to reduce computational cost and simplify predictive learning decreased model accuracy to 70 %; however, the model became simpler with easily interpretable rules. For both these selection methods and for the classification using all attributes, armpit and breast T S had a higher power rating for predicting thermal comfort. Data mining techniques show promise in the discovery of new variables associated with the thermal comfort of horses.
The ATLAS detector will begin taking data from p - p collisions in 2009. This experiment will allo w for man y dif ferent physics measurements and searches. The production of tau leptons at the LHC is a key signature of the decay of both the standard model Higgs (via H ! t t ) and SUSY particles. Taus have a short lifetime ( c t = 87 m m) and decay hadroni- cally 65% of the time. Man y QCD interactions produce similar hadronic sho wers and have cross-sections about 1 billion times lar ger than tau production. Multi variate techniques are therefore often used to distinguish taus from this background. Boosted Decision Trees (BDTs) are a machine-learning technique for developing cut-based discriminants which can signicantly aid in extracting small signal samples from overwhelming backgrounds. In this study , BDTs are used for tau identication for the ATLAS experiment. The y are a fast, exible alternati ve to existing discriminants with comparable or better performance.
Kim, Ki Sook; Kim, Kyung Hee
This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet café, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.
Full Text Available Beef is one of the animal food products that have high nutrition because it contains carbohydrates, proteins, fats, vitamins, and minerals. Therefore, the quality of beef should be maintained so that consumers get good beef quality. Determination of beef quality is commonly conducted visually by comparing the actual beef and reference pictures of each beef class. This process presents weaknesses, as it is subjective in nature and takes a considerable amount of time. Therefore, an automated system based on image processing that is capable of determining beef quality is required. This research aims to develop an image segmentation method by processing digital images. The system designed consists of image acquisition processes with varied distance, resolution, and angle. Image segmentation is done to separate the images of fat and meat using the Otsu thresholding method. Classification was carried out using the decision tree algorithm and the best accuracies were obtained at 90% for training and 84% for testing. Once developed, this system is then embedded into the android programming. Results show that the image processing technique is capable of proper marbling score identification.
Acosta, Darin Edward; Busch, Elena Laura; Carnes, Andrew Mathew; Furic, Ivan-Kresimir; Gleyzer, Sergei; Kotov, Khristian; Low, Jia Fu; Madorsky, Alexander; Rorie, Jamal Tildon; Scurlock, Bobby; Shi, Wei
The first implementation of Boosted Decision Trees (BDTs) inside a Level-1 trigger system at the LHC is presented. The Endcap Muon Track Finder (EMTF) at CMS uses BDTs to infer the momentum of muons in the forward region of the detector, based on 25 different variables. Combinations of these variables are evaluated offline using regression BDTs, whose output is stored in 1.2 GB look-up tables (LUTs) in the EMTF hardware. These BDTs take advantage of complex correlations between variables, the inhomogeneous magnetic field, and non-linear effects such as inelastic scattering to distinguish high-momentum signal muons from the overwhelming low-momentum background. The LUTs are used to turn the complex BDT evaluation into a simple look-up operation in fixed low latency. The new momentum assignment algorithm has reduced the trigger rate by a factor of 3 at the 25 GeV trigger threshold with respect to the legacy system, with further improvements foreseen in the coming year.
Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.
Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.
Full Text Available Abstract Background In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results. The aim of our work was to use an Oblique Decision Tree model (ODT which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates. Methods ODT is based on the classification and regression tree (CART. As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in RN, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference. We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan™. Results The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95% [52.22–67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51–60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk
Lee, Daniel Joseph; Veneri, Diana A
The most common complaint lower limb prosthesis users report is inadequacy of a proper socket fit. Adjustments to the residual limb-socket interface can be made by the prosthesis user without consultation of a clinician in many scenarios through skilled self-management. Decision trees guide prosthesis wearers through the self-management process, empowering them to rectify fit issues, or referring them to a clinician when necessary. This study examines the development and acceptability testing of patient-centered decision trees for lower limb prosthesis users. Decision trees underwent a four-stage process: literature review and expert consultation, designing, two-rounds of expert panel review and revisions, and target audience testing. Fifteen lower limb prosthesis users (average age 61 years) reviewed the decision trees and completed an acceptability questionnaire. Participants reported agreement of 80% or above in five of the eight questions related to acceptability of the decision trees. Disagreement was related to the level of experience of the respondent. Decision trees were found to be easy to use, illustrate correct solutions to common issues, and have terminology consistent with that of a new prosthesis user. Some users with greater than 1.5 years of experience would not use the decision trees based on their own self-management skills. Implications for Rehabilitation Discomfort of the residual limb-prosthetic socket interface is the most common reason for clinician visits. Prosthesis users can use decision trees to guide them through the process of obtaining a proper socket fit independently. Newer users may benefit from using the decision trees more than experienced users.
Full Text Available Identification of the sources of soil mercury (Hg on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities—for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals—were the main external sources of a large amount of Hg in the farmland soil.
Full Text Available An elaborated digital computer programme supporting the time-consuming process of selecting the importance rank of construction and operation parameters by means of stating optimum sets is based on the Quine – McCluskey algorithm of minimizing individual partial multi-valued logic functions. The example with real time data, calculated by means of the programme, showed that among the obtained optimum sets there were such which had a different number of real branches after being presented on the multi-valued logic decision tree. That is why an idea of elaborating another functionality of the programme – a module calculating the number of branches of real, multi-valued logic decision trees presenting optimum sets chosen by the programme was pursued. This paper presents the idea and the method for developing a module calculating the number of branches, real for each of optimum sets indicated by the programme, as well as to the calculation process.
Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim
In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results.
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying
Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities-for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals-were the main external sources of a large amount of Hg in the farmland soil.
Liu, Leo; Sun, Kai; Rather, Zakir Hussain
system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost......This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power...
Putranto, Rizky Ade; Wuryandari, Triastuti; Sudarno, Sudarno
Data mining is a process that employs one or more of Machine Learning techniques to analyze and extract knowledge automatically. Analysis of data mining is to determine the classification of a new data record into one of several categories that have been defined previously, also known as Supervised Learning. Classification Decision Tree is one of the well-known technique in data mining and is one of the popular methods in the decision making process of a case in which the method is obtained e...
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando
Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.
Xin, Zhong; Hua, Lin; Wang, Xu-Hong; Zhao, Dong; Yu, Cai-Guo; Ma, Ya-Hong; Zhao, Lei; Cao, Xi; Yang, Jin-Kui
We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detect...
Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad
The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid
Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Full Text Available Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP, which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.
Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.
Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.
Rohulla Kosari Langari
Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist
Fernández, Leónides; Mediano, Pilar; García, Ricardo; Rodríguez, Juan M; Marín, María
Objectives Lactational mastitis frequently leads to a premature abandonment of breastfeeding; its development has been associated with several risk factors. This study aims to use a decision tree (DT) approach to establish the main risk factors involved in mastitis and to compare its performance for predicting this condition with a stepwise logistic regression (LR) model. Methods Data from 368 cases (breastfeeding women with mastitis) and 148 controls were collected by a questionnaire about risk factors related to medical history of mother and infant, pregnancy, delivery, postpartum, and breastfeeding practices. The performance of the DT and LR analyses was compared using the area under the receiver operating characteristic (ROC) curve. Sensitivity, specificity and accuracy of both models were calculated. Results Cracked nipples, antibiotics and antifungal drugs during breastfeeding, infant age, breast pumps, familial history of mastitis and throat infection were significant risk factors associated with mastitis in both analyses. Bottle-feeding and milk supply were related to mastitis for certain subgroups in the DT model. The areas under the ROC curves were similar for LR and DT models (0.870 and 0.835, respectively). The LR model had better classification accuracy and sensitivity than the DT model, but the last one presented better specificity at the optimal threshold of each curve. Conclusions The DT and LR models constitute useful and complementary analytical tools to assess the risk of lactational infectious mastitis. The DT approach identifies high-risk subpopulations that need specific mastitis prevention programs and, therefore, it could be used to make the most of public health resources.
Geoffrey H. Donovan
Research demonstrating the biophysical benefits of urban trees are often used to justify investments in urban forestry. Far less emphasis, however, is placed on the non-bio-physical benefits such as improvements in public health. Indeed, the public-health benefits of trees may be significantly larger than the biophysical benefits, and, therefore, failure to account for...
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. Since sub-daily streamflow information is unavailable for most small basins in China, one of the main challenges is finding appropriate parameter values for simulating flash floods in ungauged catchments. In this study, we use decision tree learning to explore parameter set transferability between different catchments. For this purpose, the physically-based, semi-distributed rainfall-runoff model PRMS-OMS is set up for 35 catchments in ten Chinese provinces. Hourly data from more than 800 storm runoff events are used to calibrate the model and evaluate the performance of parameter set transfers between catchments. For each catchment, 58 catchment attributes are extracted from several data sets available for whole China. We then use a data mining technique (decision tree learning) to identify catchment similarities that can be related to good transfer performance. Finally, we use the splitting rules of decision trees for finding suitable donor catchments for ungauged target catchments. We show that decision tree learning allows to optimally utilize the information content of available catchment descriptors and outperforms regionalization based on a conventional measure of physiographic-climatic similarity by 15%-20%. Similar performance can be achieved with a regionalization method based on spatial proximity, but decision trees offer flexible rules for selecting suitable donor catchments, not relying on the vicinity of gauged catchments. This flexibility makes the method particularly suitable for implementation in sparsely gauged environments. We evaluate the probability to detect flood events exceeding a given return period, considering measured discharge and PRMS-OMS simulated flows with regionalized parameters
Patel, Hiten D; Roberts, Eric T; Constenla, Dagna O
Rotavirus gastroenteritis places a significant health and economic burden on Pakistan. To determine the public health impact of a national rotavirus vaccination program, we performed a cost-effectiveness study from the perspective of the health care system. A decision tree model was developed to assess the cost-effectiveness of a national vaccination program in Pakistan. Disease and cost burden with the program were compared to the current state. Disease parameters, vaccine-related costs, and medical treatment costs were based on published epidemiological and economic data, which were specific to Pakistan when possible. An annual birth cohort of children was followed for 5 years to model the public health impact of vaccination on health-related events and costs. The cost-effectiveness was assessed and quantified in cost (2012 US$) per disability-adjusted life-year (DALY) averted and cost per death averted. Sensitivity analyses were performed to assess the robustness of the incremental cost-effectiveness ratios (ICERs). The base case results showed vaccination prevented 1.2 million cases of rotavirus gastroenteritis, 93,000 outpatient visits, 43,000 hospitalizations, and 6700 deaths by 5 years of age for an annual birth cohort scaled from 6% current coverage to DPT3 levels (85%). The medical cost savings would be US$1.4 million from hospitalizations and US$200,000 from outpatient visit costs. The vaccination program would cost US$35 million at a vaccine price of US$5.00. The ICER was US$149.50 per DALY averted or US$4972 per death averted. Sensitivity analyses showed changes in case-fatality ratio, vaccine efficacy, and vaccine cost exerted the greatest influence on the ICER. Across a range of sensitivity analyses, a national rotavirus vaccination program was predicted to decrease health and economic burden due to rotavirus gastroenteritis in Pakistan by ~40%. Vaccination was highly cost-effective in this context. As discussions of implementing the intervention
Andries J Smit
Full Text Available Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons.Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE.218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%.SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.
Garcia Urquia, E. L.; Braun, A.; Yamagishi, H.
Tegucigalpa, the capital city of Honduras, experiences rainfall-induced landslides on a yearly basis. The high precipitation regime and the rugged topography the city has been built in couple with the lack of a proper urban expansion plan to contribute to the occurrence of landslides during the rainy season. Thousands of inhabitants live at risk of losing their belongings due to the construction of precarious shelters in landslide-prone areas on mountainous terrains and next to the riverbanks. Therefore, the city is in the need for landslide susceptibility and hazard maps to aid in the regulation of future development. Major challenges in the context of highly dynamic urbanizing areas are the overlap of natural and anthropogenic slope destabilizing factors, as well as the availability and accuracy of data. Data-driven multivariate techniques have proven to be powerful in discovering interrelations between factors, identifying important factors in large datasets, capturing non-linear problems and coping with noisy and incomplete data. This analysis focuses on the creation of a landslide susceptibility map using different methods from the field of data mining, Artificial Neural Networks (ANN), Bayesian Networks (BN) and Decision Trees (DT). The input dataset of the study contains geomorphological and hydrological factors derived from a digital elevation model with a 10 m resolution, lithological factors derived from a geological map, and anthropogenic factors, such as information on the development stage of the neighborhoods in Tegucigalpa and road density. Moreover, a landslide inventory map that was developed in 2014 through aerial photo interpretation was used as target variable in the analysis. The analysis covers an area of roughly 100 km2, while 8.95 km2 are occupied by landslides. In a first step, the dataset was explored by assessing and improving the data quality, identifying unimportant variables and finding interrelations. Then, based on a training
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Smith, James F., III; Nguyen, ThanhVu H.
A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.
Kumar, Ashwani; Singh, Tiratha Raj
Alzheimer's disease (AD) is a progressive, incurable and terminal neurodegenerative disorder of the brain and is associated with mutations in amyloid precursor protein, presenilin 1, presenilin 2 or apolipoprotein E, but its underlying mechanisms are still not fully understood. Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the clinical dataset are therefore increasingly becoming necessary. The current study deals with the construction of classifiers that can be human readable as well as robust in performance for gene dataset of AD using a decision tree. Models of classification for different AD genes were generated according to Mini-Mental State Examination scores and all other vital parameters to achieve the identification of the expression level of different proteins of disorder that may possibly determine the involvement of genes in various AD pathogenesis pathways. The effectiveness of decision tree in AD diagnosis is determined by information gain with confidence value (0.96), specificity (92 %), sensitivity (98 %) and accuracy (77 %). Besides this functional gene classification using different parameters and enrichment analysis, our finding indicates that the measures of all the gene assess in single cohorts are sufficient to diagnose AD and will help in the prediction of important parameters for other relevant assessments.
Das, Shiva K.; Zhou Sumin; Zhang, Junan; Yin, F.-F.; Dewhirst, Mark W.; Marks, Lawrence B.
Purpose: To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. Methods and Materials: The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a 'boosting' process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. Results: The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Conclusions: Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation
Purvis, Dianna; Aldaghlas, Tayseer; Trickey, Amber W; Rizzo, Anne; Sikdar, Siddhartha
Early detection and treatment of blunt cervical vascular injuries prevent adverse neurologic sequelae. Current screening criteria can miss up to 22% of these injuries. The study objective was to investigate bedside transcranial Doppler sonography for detecting blunt cervical vascular injuries in trauma patients using a novel decision tree approach. This prospective pilot study was conducted at a level I trauma center. Patients undergoing computed tomographic angiography for suspected blunt cervical vascular injuries were studied with transcranial Doppler sonography. Extracranial and intracranial vasculatures were examined with a portable power M-mode transcranial Doppler unit. The middle cerebral artery mean flow velocity, pulsatility index, and their asymmetries were used to quantify flow patterns and develop an injury decision tree screening protocol. Student t tests validated associations between injuries and transcranial Doppler predictive measures. We evaluated 27 trauma patients with 13 injuries. Single vertebral artery injuries were most common (38.5%), followed by single internal carotid artery injuries (30%). Compared to patients without injuries, mean flow velocity asymmetry was higher for single internal carotid artery (P = .003) and single vertebral artery (P = .004) injuries. Similarly, pulsatility index asymmetry was higher in single internal carotid artery (P = .015) and single vertebral artery (P = .042) injuries, whereas the lowest pulsatility index was elevated for bilateral vertebral artery injuries (P = .006). The decision tree yielded 92% specificity, 93% sensitivity, and 93% correct classifications. In this pilot feasibility study, transcranial Doppler measures were significantly associated with the blunt cervical vascular injury status, suggesting that transcranial Doppler sonography might be a viable bedside screening tool for trauma. Patient-specific hemodynamic information from transcranial Doppler assessment has the potential to alter
Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei
DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.
Das, Shiva K; Zhou, Sumin; Zhang, Junan; Yin, Fang-Fang; Dewhirst, Mark W; Marks, Lawrence B
To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a "boosting" process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation.
Smith, R.; Kasprzyk, J. R.; Balaji, R.
In light of deeply uncertain factors like future climate change and population shifts, responsible resource management will require new types of information and strategies. For water utilities, this entails potential expansion and efficient management of water supply infrastructure systems for changes in overall supply; changes in frequency and severity of climate extremes such as droughts and floods; and variable demands, all while accounting for conflicting long and short term performance objectives. Multiobjective Evolutionary Algorithms (MOEAs) are emerging decision support tools that have been used by researchers and, more recently, water utilities to efficiently generate and evaluate thousands of planning portfolios. The tradeoffs between conflicting objectives are explored in an automated way to produce (often large) suites of portfolios that strike different balances of performance. Once generated, the sets of optimized portfolios are used to support relatively subjective assertions of priorities and human reasoning, leading to adoption of a plan. These large tradeoff sets contain information about complex relationships between decisions and between groups of decisions and performance that, until now, has not been quantitatively described. We present a novel use of Multivariate Regression Trees (MRTs) to analyze tradeoff sets to reveal these relationships and critical decisions. Additionally, when MRTs are applied to tradeoff sets developed for different realizations of an uncertain future, they can identify decisions that are robust across a wide range of conditions and produce fundamental insights about the system being optimized.
Astuti, Yuniar Andi
This study examines techniques Support Vector Regression and Decision Tree C4.5 has been used in studies in various fields, in order to know the advantages and disadvantages of both techniques that appear in Data Mining. From the ten studies that use both techniques, the results of the analysis showed that the accuracy of the SVR technique for 59,64% and C4.5 for 76,97% So in this study obtained a statement that C4.5 is better than SVR 097038020
Zhang, Xuehong; Treitz, Paul M.; Chen, Dongmei; Quan, Chang; Shi, Lixin; Li, Xinhui
Mangrove forests grow in intertidal zones in tropical and subtropical regions and have suffered a dramatic decline globally over the past few decades. Remote sensing data, collected at various spatial resolutions, provide an effective way to map the spatial distribution of mangrove forests over time. However, the spectral signatures of mangrove forests are significantly affected by tide levels. Therefore, mangrove forests may not be accurately mapped with remote sensing data collected during a single-tidal event, especially if not acquired at low tide. This research reports how a decision-tree -based procedure was developed to map mangrove forests using multi-tidal Landsat 5 Thematic Mapper (TM) data and a Digital Elevation Model (DEM). Three indices, including the Normalized Difference Moisture Index (NDMI), the Normalized Difference Vegetation Index (NDVI) and NDVIL·NDMIH (the multiplication of NDVIL by NDMIH, L: low tide level, H: high tide level) were used in this algorithm to differentiate mangrove forests from other land-cover and land-use types in Fangchenggang City, China. Additionally, the recent Landsat 8 OLI (Operational Land Imager) data were selected to validate the results and compare if the methodology is reliable. The results demonstrate that short-term multi-tidal remotely-sensed data better represent the unique nearshore coastal wetland habitats of mangrove forests than single-tidal data. Furthermore, multi-tidal remotely-sensed data has led to improved accuracies using two classification approaches: i.e. decision trees and the maximum likelihood classification (MLC). Since mangrove forests are typically found at low elevations, the inclusion of elevation data in the two classification procedures was tested. Given the decision-tree method does not assume strict data distribution parameters, it was able to optimize the application of multi-tidal and elevation data, resulting in higher classification accuracies of mangrove forests. When using multi
Liu, Leo; Rather, Zakir Hussain; Chen, Zhe
and adopts a methodology of importance sampling to maximize the information contained in the database so as to increase the accuracy of DT. Further, this paper also studies the effectiveness of DT by implementing its corresponding preventive control schemes. These approaches are tested on the detailed model......Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time...
Shi Chunsheng; Meng Dapeng
The prediction index for supply risk is developed based on the factor identifying of nuclear equipment manufacturing industry. The supply risk prediction model is established with the method of support vector machine and decision tree, based on the investigation on 3 important nuclear power equipment manufacturing enterprises and 60 suppliers. Final case study demonstrates that the combination model is better than the single prediction model, and demonstrates the feasibility and reliability of this model, which provides a method to evaluate the suppliers and measure the supply risk. (authors)
Djulbegovic, Benjamin; Hozo, Iztok; Dale, William
Contemporary delivery of health care is inappropriate in many ways, largely due to suboptimal Q5 decision-making. A typical approach to improve practitioners' decision-making is to develop evidence-based clinical practice guidelines (CPG) by guidelines panels, who are instructed to use their judgments to derive practice recommendations. However, mechanisms for the formulation of guideline judgments remains a "black-box" operation-a process with defined inputs and outputs but without sufficient knowledge of its internal workings. Increased explicitness and transparency in the process can be achieved by implementing CPG as clinical pathways (CPs) (also known as clinical algorithms or flow-charts). However, clinical recommendations thus derived are typically ad hoc and developed by experts in a theory-free environment. As any recommendation can be right (true positive or negative), or wrong (false positive or negative), the lack of theoretical structure precludes the quantitative assessment of the management strategies recommended by CPGs/CPs. To realize the full potential of CPGs/CPs, they need to be placed on more solid theoretical grounds. We believe this potential can be best realized by converting CPGs/CPs within the heuristic theory of decision-making, often implemented as fast-and-frugal (FFT) decision trees. This is possible because FFT heuristic strategy of decision-making can be linked to signal detection theory, evidence accumulation theory, and a threshold model of decision-making, which, in turn, allows quantitative analysis of the accuracy of clinical management strategies. Fast-and-frugal provides a simple and transparent, yet solid and robust, methodological framework connecting decision science to clinical care, a sorely needed missing link between CPGs/CPs and patient outcomes. We therefore advocate that all guidelines panels express their recommendations as CPs, which in turn should be converted into FFTs to guide clinical care. © 2018 John Wiley
Fokkema, M.; Smits, N.; Kelderman, H.; Carlier, I.V.E.; van Hemert, A.M.
For classification problems in psychology (e.g., clinical diagnosis), batteries of tests are often administered. However, not every test or item may be necessary for accurate classification. In the current article, a combination of classification and regression trees (CART) and stochastic
Full Text Available The aim of this study was to verify whether and which parameters of the atmospheric pollen season can distinguish between pollen types, the ranges of parameter values that delineate classes of taxa, and finally which taxa are similar to others within the domain of these parameter ranges. Decision tree algorithms were applied and the best tree was chosen to describe the rules of pollen classification. The study material consisted of airborne pollen grains of the following eight taxa: Alnus, Betula, Carpinus, Corylus, Cupressaceae, Fraxinus, Populus and Ulmus. Research was conducted in Lublin in eastern Poland during 2001-2013. The following six atmospheric pollen season parameters were analyzed: season start and end, duration, maximum daily pollen concentration, date of maximum pollen concentration, and the Seasonal Pollen Index (SPI. Four algorithms were used in data analysis and the J4.8 algorithm was chosen as the best for taxa classification, date of the end of season and the SPI value belonging to characteristics that served most to discriminate between pollen types. Based on the classification tree, the following four groups of taxa were identified: (i Ulmus; (ii Corylus, Alnus, Populus; (iii Betula; and (iv Carpinus, Fraxinus, Cupressaceae.
Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.
The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.
Delphin, S; Escobedo, F J; Abd-Elrahman, A; Cropper, W
Information on the effect of direct drivers such as hurricanes on ecosystem services is relevant to landowners and policy makers due to predicted effects from climate change. We identified forest damage risk zones due to hurricanes and estimated the potential loss of 2 key ecosystem services: aboveground carbon storage and timber volume. Using land cover, plot-level forest inventory data, the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, and a decision tree-based framework; we determined potential damage to subtropical forests from hurricanes in the Lower Suwannee River (LS) and Pensacola Bay (PB) watersheds in Florida, US. We used biophysical factors identified in previous studies as being influential in forest damage in our decision tree and hurricane wind risk maps. Results show that 31% and 0.5% of the total aboveground carbon storage in the LS and PB, respectively was located in high forest damage risk (HR) zones. Overall 15% and 0.7% of the total timber net volume in the LS and PB, respectively, was in HR zones. This model can also be used for identifying timber salvage areas, developing ecosystem service provision and management scenarios, and assessing the effect of other drivers on ecosystem services and goods. Copyright © 2013 Elsevier Ltd. All rights reserved.
Full Text Available Metabolic syndrome (MetS in young adults (age 20–39 is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES 2009-2010 Cohort as a representative sample of the United States population (n=745. Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.
Miller, Brian; Fridline, Mark; Liu, Pei-Yang; Marino, Deborah
Metabolic syndrome (MetS) in young adults (age 20-39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
Kosuda, Shigeru; Momiyama, Yukihiko; Ohsuzu, Fumitaka; Kusano, Shoichi; Ichihara, Kiyoshi
To evaluate the potential cost-effectiveness of exercise 201 Tl myocardial SPECT in outpatients with angina-like chest pain, we developed a decision-tree model which comprises three 1000-patients groups, i.e., a coronary arteriography (CAG) group, a follow-up group, and a SPECT group, and total cost and cardiac events, including cardiac deaths, were calculated. Variables used for the decision-tree analysis were obtained from references and the data available at out hospital. The sensitivity and specificity of 201 Tl SPECT for diagnosing angina pectoris, and its prevalence were assumed to be 95%, 85%, and 33%, respectively. The mean costs were 84.9 x 10 4 yen/patient in the CAG group, 30.2 x 10 4 yen/patient in the follow-up group, and 71.0 x 10 4 yen/patient in the SPECT group. The numbers of cardiac events and cardiac deaths were 56 and 15, respectively in the CAG group, 264 and 81 in the follow-up group, and 65 and 17 in the SPECT group. SPECT increases cardiac events and cardiac deaths by 0.9% and 0.2%, but it reduces the number of CAG studies by 50.3%, and saves 13.8 x 10 4 yen/patient, as compared to the CAG group. In conclusion, the exercise 201 Tl myocardial SPECT strategy for patients with chest pain has the potential to reduce health care costs in Japan. (author)
Ainscough, Kate M; Lindsay, Karen L; O'Sullivan, Elizabeth J; Gibney, Eileen R; McAuliffe, Fionnuala M
Antenatal healthy lifestyle interventions are frequently implemented in overweight and obese pregnancy, yet there is inconsistent reporting of the behaviour-change methods and behavioural outcomes. This limits our understanding of how and why such interventions were successful or not. The current paper discusses the application of behaviour-change theories and techniques within complex lifestyle interventions in overweight and obese pregnancy. The authors propose a decision tree to help guide researchers through intervention design, implementation and evaluation. The implications for adopting behaviour-change theories and techniques, and using appropriate guidance when constructing and evaluating interventions in research and clinical practice are also discussed. To enhance the evidence base for successful behaviour-change interventions during pregnancy, adoption of behaviour-change theories and techniques, and use of published guidelines when designing lifestyle interventions are necessary. The proposed decision tree may be a useful guide for researchers working to develop effective behaviour-change interventions in clinical settings. This guide directs researchers towards key literature sources that will be important in each stage of study development.
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L.
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. One of the main challenges of setting up such a system is finding appropriate model parameter values for ungauged catchments. Previous studies have shown that the transfer of parameter sets from hydrologically similar gauged catchments is one of the best performing regionalization methods. However, a remaining key issue is the identification of suitable descriptors of similarity. In this study, we use decision tree learning to explore parameter set transferability in the full space of catchment descriptors. For this purpose, a semi-distributed rainfall-runoff model is set up for 35 catchments in ten Chinese provinces. Hourly runoff data from in total 858 storm events are used to calibrate the model and to evaluate the performance of parameter set transfers between catchments. We then present a novel technique that uses the splitting rules of classification and regression trees (CART) for finding suitable donor catchments for ungauged target catchments. The ability of the model to detect flood events in assumed ungauged catchments is evaluated in series of leave-one-out tests. We show that CART analysis increases the probability of detection of 10-year flood events in comparison to a conventional measure of physiographic-climatic similarity by up to 20%. Decision tree learning can outperform other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Spatial proximity can be used as a selection criteria but is skipped in the case where no similar gauged catchments are in the vicinity. We conclude that the CART regionalization concept is particularly suitable for implementation in sparsely gauged and topographically complex environments where a proximity
Fenja V Ziegler
Full Text Available BACKGROUND: People tend to prefer a smaller immediate reward to a larger but delayed reward. Although this discounting of future rewards is often associated with impulsivity, it is not necessarily irrational. Instead it has been suggested that it reflects the decision maker's greater interest in the 'me now' than the 'me in 10 years', such that the concern for our future self is about the same as for someone else who is close to us. METHODOLOGY/PRINCIPAL FINDINGS: To investigate this we used a delay-discounting task to compare discount functions for choices that people would make for themselves against decisions that they think that other people should make, e.g. to accept $500 now or $1000 next week. The psychological distance of the hypothetical beneficiaries was manipulated in terms of the genetic coefficient of relatedness ranging from zero (e.g. a stranger, or unrelated close friend, .125 (e.g. a cousin, .25 (e.g. a nephew or niece, to .5 (parent or sibling. CONCLUSIONS/SIGNIFICANCE: The observed discount functions were steeper (i.e. more impulsive for choices in which the decision-maker was the beneficiary than for all other beneficiaries. Impulsiveness of decisions declined systematically with the distance of the beneficiary from the decision-maker. The data are discussed with reference to the implusivity and interpersonal empathy gaps in decision-making.
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Full Text Available Subject of investigation is a new high-powered strontium bromide (SrBr2 vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Alyahya, Mohammad S; Hijazi, Heba H; Alshraideh, Hussam A; Al-Nasser, Amjad D
There is a growing concern that reduction in hospital length of stay (LOS) may raise the rate of hospital readmission. This study aims to identify the rate of avoidable 30-day readmission and find out the association between LOS and readmission. All consecutive patient admissions to the internal medicine services (n = 5,273) at King Abdullah University Hospital in Jordan between 1 December 2012 and 31 December 2013 were analyzed. To identify avoidable readmissions, a validated computerized algorithm called SQLape was used. The multinomial logistic regression was firstly employed. Then, detailed analysis was performed using the Decision Trees (DTs) model, one of the most widely used data mining algorithms in Clinical Decision Support Systems (CDSS). The potentially avoidable 30-day readmission rate was 44%, and patients with longer LOS were more likely to be readmitted avoidably. However, LOS had a significant negative effect on unavoidable readmissions. The avoidable readmission rate is still highly unacceptable. Because LOS potentially increases the likelihood of avoidable readmission, it is still possible to achieve a shorter LOS without increasing the readmission rate. Moreover, the way the DT model classified patient subgroups of readmissions based on patient characteristics and LOS is applicable in real clinical decisions.
Snowden, Jessica A.; Leon, Scott C.; Bryant, Fred B.; Lyons, John S.
This study explored clinical and nonclinical predictors of inpatient hospital admission decisions across a sample of children in foster care over 4 years (N = 13,245). Forty-eight percent of participants were female and the mean age was 13.4 (SD = 3.5 years). Optimal data analysis (Yarnold & Soltysik, 2005) was used to construct a nonlinear…
Zhao Hongquan; Kasai, Seiya; Shiratori, Yuta; Hashizume, Tamotsu
A two-bit arithmetic logic unit (ALU) was successfully fabricated on a GaAs-based regular nanowire network with hexagonal topology. This fundamental building block of central processing units can be implemented on a regular nanowire network structure with simple circuit architecture based on graphical representation of logic functions using a binary decision diagram and topology control of the graph. The four-instruction ALU was designed by integrating subgraphs representing each instruction, and the circuitry was implemented by transferring the logical graph structure to a GaAs-based nanowire network formed by electron beam lithography and wet chemical etching. A path switching function was implemented in nodes by Schottky wrap gate control of nanowires. The fabricated circuit integrating 32 node devices exhibits the correct output waveforms at room temperature allowing for threshold voltage variation.
Schetinin, Vitaly; Jakaite, Livia; Jakaitis, Janis; Krzanowski, Wojtek
Trauma and Injury Severity Score (TRISS) models have been developed for predicting the survival probability of injured patients the majority of which obtain up to three injuries in six body regions. Practitioners have noted that the accuracy of TRISS predictions is unacceptable for patients with a larger number of injuries. Moreover, the TRISS method is incapable of providing accurate estimates of predictive density of survival, that are required for calculating confidence intervals. In this paper we propose Bayesian inference for estimating the desired predictive density. The inference is based on decision tree models which split data along explanatory variables, that makes these models interpretable. The proposed method has outperformed the TRISS method in terms of accuracy of prediction on the cases recorded in the US National Trauma Data Bank. The developed method has been made available for evaluation purposes as a stand-alone application. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim
Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard
integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy......-intensive but efficient vessels conducting pelagic or industrial fishing are more inclined to base their decision on fish price only, while numerous smaller and less efficient vessels conducting demersal mixed or crustacean fishery usually consider other flexible factors, e.g., the potential for a large catch, weather...... the adaptations of individual fishermen to resource availability dynamics, increasing fuel prices, changes in regulations, and the consequences of socioeconomic external pressures on harvested stocks. A new methodology is described here to obtain quantitative information on the fishermen’s micro-scale decisions...
Woodruff, Katherine [New Mexico State U.
MicroBooNE is a liquid argon time projection chamber (LArTPC) neutrino experiment that is currently running in the Booster Neutrino Beam at Fermilab. LArTPC technology allows for high-resolution, three-dimensional representations of neutrino interactions. A wide variety of software tools for automated reconstruction and selection of particle tracks in LArTPCs are actively being developed. Short, isolated proton tracks, the signal for low- momentum-transfer neutral current (NC) elastic events, are easily hidden in a large cosmic background. Detecting these low-energy tracks will allow us to probe interesting regions of the proton's spin structure. An effective method for selecting NC elastic events is to combine a highly efficient track reconstruction algorithm to find all candidate tracks with highly accurate particle identification using a machine learning algorithm. We present our work on particle track classification using gradient tree boosting software (XGBoost) and the performance on simulated neutrino data.
Ljubica Milanović Glavan
Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is deﬁned as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.
Šlapák, M.; Neruda, Roman
Roč. 38, č. 4 (2017), s. 335-342 ISSN 0257-4306 Institutional support: RVO:67985807 Keywords : auction systems * decision making * genetic programming * multi-agent system * task distribution Subject RIV: IN - Informatics, Computer Science OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) http://rev-inv-ope.univ-paris1.fr/fileadmin/rev-inv-ope/files/38417/38417-04.pdf
Ma Hongxia; Guo Yulin; Wang Qiuping; Qiang Yongqian; Liu Min; Guo Xiaojuan; Guo Youmin; Chen Qihang
Objective: To establish classification and regression tree (CART) for differentiating benign from malignant solitary pulmonary nudules (SPN). Methods: One hundred and sixteen consecutive cases with 116 solitary pulmonary nodules, which finally were pathologically proven 54 malignant nodules and 62 benign nodules, were prospectively registered in this research. Twelve clinical presentations and 22 CT findings were collected as predictors. A classification tree was established to distinguish benign SPNs from malignant ones. In the observer test, two groups (one made of junior radiologists and one of senior radiologists) were independently presented with clinical information and CT images without knowing the pathologic and machine-learning results. Performance of observers and CART were compared by receiver operating characteristic analysis. Results: Receiver operating characteristic analysis showed areas under the curve of CART, senior radiologists and junior radiologists respectively were 0.910±0.029, 0.827±0.038, 0.612±0.052. Difference between areas(DBF) between CART and junior radiologists was 0.297(P<0.01). DBF between CART and senior radiologists was 0.083 (P<0.05). DBF between senior and junior radiologists was 0.214 (P<0.01). CART showed a best diagnostic efficiency, followed by junior radiologists, and then senior radiologists. Conclusion: Our data mining techniques using CART prove a high accuracy in differentiating benign from malignant pulmonary nodules based on clinical variables and CT findings. It will be a potentially useful tool in further application of artificial intelligence in the imaging diagnosis. (authors)
Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David
For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was
Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.
Lee, Seung-Mi; Kang, Jin-Oh; Suh, Yong-Moo
Analysis and prediction of the care charges related to colorectal cancer in Korea are important for the allocation of medical resources and the establishment of medical policies because the incidence and the hospital charges for colorectal cancer are rapidly increasing. But the previous studies based on statistical analysis to predict the hospital charges for patients did not show satisfactory results. Recently, data mining emerges as a new technique to extract knowledge from the huge and diverse medical data. Thus, we built models using data mining techniques to predict hospital charge for the patients. A total of 1,022 admission records with 154 variables of 492 patients were used to build prediction models who had been treated from 1999 to 2002 in the Kyung Hee University Hospital. We built an artificial neural network (ANN) model and a classification and regression tree (CART) model, and compared their prediction accuracy. Linear correlation coefficients were high in both models and the mean absolute errors were similar. But ANN models showed a better linear correlation than CART model (0.813 vs. 0.713 for the hospital charge paid by insurance and 0.746 vs. 0.720 for the hospital charge paid by patients). We suggest that ANN model has a better performance to predict charges of colorectal cancer patients.
Friesen, M.C.; Wheeler, D.C.; Vermeulen, R.; Locke, S.J.; Zaebst, D.D.; Koutros, S.; Pronk, A.; Colt, J.S.; Baris, D.; Karagas, M.R.; Malats, N.; Schwenn, M.; Johnson, A.; Armenti, K.R.; Rothman, N.; Stewart, P.A.; Kogevinas, M.; Silverman, D.T.
Objectives: To efficiently and reproducibly assess occupational diesel exhaust exposure in a Spanish case-control study, we examined the utility of applying decision rules that had been extracted from expert estimates and questionnaire response patterns using classification tree (CT) models from a
Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...
Chen, Hsiu-Chin; Bennett, Sean
Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
Nguyen, L.A.; Verreth, J.A.J.; Leemans, H.B.J.; Bosma, R.H.; Silva, De S.
This study uses the decision tree framework to analyse possible climate change impact adaptation options for pangasius (Pangasianodon hypopthalmus Sauvage) farming in the Mekong Delta. Here we present the risks for impacts and the farmers' autonomous and planned public adaptation by using primary
Full Text Available BACKGROUND: The gene regulatory circuit motif in which two opposing fate-determining transcription factors inhibit each other but activate themselves has been used in mathematical models of binary cell fate decisions in multipotent stem or progenitor cells. This simple circuit can generate multistability and explains the symmetric "poised" precursor state in which both factors are present in the cell at equal amounts as well as the resolution of this indeterminate state as the cell commits to either cell fate characterized by an asymmetric expression pattern of the two factors. This establishes the two alternative stable attractors that represent the two fate options. It has been debated whether cooperativity of molecular interactions is necessary to produce such multistability. PRINCIPAL FINDINGS: Here we take a general modeling approach and argue that this question is not relevant. We show that non-linearity can arise in two distinct models in which no explicit interaction between the two factors is assumed and that distinct chemical reaction kinetic formalisms can lead to the same (generic dynamical system form. Moreover, we describe a novel type of bifurcation that produces a degenerate steady state that can explain the metastable state of indeterminacy prior to cell fate decision-making and is consistent with biological observations. CONCLUSION: The general model presented here thus offers a novel principle for linking regulatory circuits with the state of indeterminacy characteristic of multipotent (stem cells.
Full Text Available Downward shortwave radiation (DSR is an essential parameter in the terrestrial radiation budget and a necessary input for models of land-surface processes. Although several radiation products using satellite observations have been released, coarse spatial resolution and low accuracy limited their application. It is important to develop robust and accurate retrieval methods with higher spatial resolution. Machine learning methods may be powerful candidates for estimating the DSR from remotely sensed data because of their ability to perform adaptive, nonlinear data fitting. In this study, the gradient boosting regression tree (GBRT was employed to retrieve DSR measurements with the ground observation data in China collected from the China Meteorological Administration (CMA Meteorological Information Center and the satellite observations from the Advanced Very High Resolution Radiometer (AVHRR at a spatial resolution of 5 km. The validation results of the DSR estimates based on the GBRT method in China at a daily time scale for clear sky conditions show an R2 value of 0.82 and a root mean square error (RMSE value of 27.71 W·m−2 (38.38%. These values are 0.64 and 42.97 W·m−2 (34.57%, respectively, for cloudy sky conditions. The monthly DSR estimates were also evaluated using ground measurements. The monthly DSR estimates have an overall R2 value of 0.92 and an RMSE of 15.40 W·m−2 (12.93%. Comparison of the DSR estimates with the reanalyzed and retrieved DSR measurements from satellite observations showed that the estimated DSR is reasonably accurate but has a higher spatial resolution. Moreover, the proposed GBRT method has good scalability and is easy to apply to other parameter inversion problems by changing the parameters and training data.
Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R
The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all Plaminitis (OR 40.5, Plaminitis. 'Presence of a flat/convex sole' also significantly enhanced clinical diagnosis discrimination (OR 15.5, Plaminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements. British Veterinary Association.
Jerez-Aragonés, José M; Gómez-Ruiz, José A; Ramos-Jiménez, Gonzalo; Muñoz-Pérez, José; Alba-Conejo, Emilio
The prediction of clinical outcome of patients after breast cancer surgery plays an important role in medical tasks such as diagnosis and treatment planning. Different prognostic factors for breast cancer outcome appear to be significant predictors for overall survival, but probably form part of a bigger picture comprising many factors. Survival estimations are currently performed by clinicians using the statistical techniques of survival analysis. In this sense, artificial neural networks are shown to be a powerful tool for analysing datasets where there are complicated non-linear interactions between the input data and the information to be predicted. This paper presents a decision support tool for the prognosis of breast cancer relapse that combines a novel algorithm TDIDT (control of induction by sample division method, CIDIM), to select the most relevant prognostic factors for the accurate prognosis of breast cancer, with a system composed of different neural networks topologies that takes as input the selected variables in order for it to reach good correct classification probability. In addition, a new method for the estimate of Bayes' optimal error using the neural network paradigm is proposed. Clinical-pathological data were obtained from the Medical Oncology Service of the Hospital Clinico Universitario of Málaga, Spain. The results show that the proposed system is an useful tool to be used by clinicians to search through large datasets seeking subtle patterns in prognostic factors, and that may further assist the selection of appropriate adjuvant treatments for the individual patient.
Full Text Available Abstract Background Methicillin-resistant Staphylococcus aureus (MRSA infections represent a serious challenge for health-care institutions. Rapid and precise identification of MRSA carriers can help to reduce both nosocomial transmissions and unnecessary isolations and associated costs. The practical details of MRSA screenings (who, how, when and where to screen remain a controversial issue. Methods Aim of this study was to determine which MRSA screening and management strategy causes the lowest expected cost for a hospital. For this cost analysis a decision analytic cost model was developed, primary based on data from peer-reviewed literature. Single and multiplex sensitivity analyses of the parameters “costs per MRSA case per day”, “costs for pre-emptive isolation per day”, “MRSA rate of transmission not in isolation per day” and “MRSA prevalence” were conducted. Results The omission of MRSA screening was identified as the alternative with the highest risk for the hospital. Universal MRSA screening strategies are by far more cost-intensive than targeted screening approaches. Culture confirmation of positive PCR results in combination with pre-emptive isolation generates the lowest costs for a hospital. This strategy minimizes the chance of false-positive results as well as the possibility of MRSA cross transmissions and therefore contains the costs for the hospital. These results were confirmed by multiplex and single sensitivity analyses. Single sensitivity analyses have shown that the parameters “MRSA prevalence” and the “rate of MRSA of transmission per day of non-isolated patients” exert the greatest influence on the choice of the favorite screening strategy. Conclusions It was shown that universal MRSA screening strategies are far more cost-intensive than the targeted screening approaches. In addition, it was demonstrated that all targeted screening strategies produce lower costs than not performing a screening at
A. I. Khader
Full Text Available Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i ignore the health risk of nitrate-contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water
Khader, A. I.; Rosenberg, D. E.; McKee, M.
Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs
Khader, A.; Rosenberg, D.; McKee, M.
Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i) ignore the health risk of nitrate contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current
Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies
Goodman, Katherine E; Lessler, Justin; Cosgrove, Sara E; Harris, Anthony D; Lautenbach, Ebbing; Han, Jennifer H; Milstone, Aaron M; Massey, Colin J; Tamma, Pranita D
Timely identification of extended-spectrum β-lactamase (ESBL) bacteremia can improve clinical outcomes while minimizing unnecessary use of broad-spectrum antibiotics, including carbapenems. However, most clinical microbiology laboratories currently require at least 24 additional hours from the time of microbial genus and species identification to confirm ESBL production. Our objective was to develop a user-friendly decision tree to predict which organisms are ESBL producing, to guide appropriate antibiotic therapy. We included patients ≥18 years of age with bacteremia due to Escherichia coli or Klebsiella species from October 2008 to March 2015 at Johns Hopkins Hospital. Isolates with ceftriaxone minimum inhibitory concentrations ≥2 µg/mL underwent ESBL confirmatory testing. Recursive partitioning was used to generate a decision tree to determine the likelihood that a bacteremic patient was infected with an ESBL producer. Discrimination of the original and cross-validated models was evaluated using receiver operating characteristic curves and by calculation of C-statistics. A total of 1288 patients with bacteremia met eligibility criteria. For 194 patients (15%), bacteremia was due to a confirmed ESBL producer. The final classification tree for predicting ESBL-positive bacteremia included 5 predictors: history of ESBL colonization/infection, chronic indwelling vascular hardware, age ≥43 years, recent hospitalization in an ESBL high-burden region, and ≥6 days of antibiotic exposure in the prior 6 months. The decision tree's positive and negative predictive values were 90.8% and 91.9%, respectively. Our findings suggest that a clinical decision tree can be used to estimate a bacteremic patient's likelihood of infection with ESBL-producing bacteria. Recursive partitioning offers a practical, user-friendly approach for addressing important diagnostic questions. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of
Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana
Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.
Zeng, Xiaoji; Liu, Zhifeng; He, Chunyang; Ma, Qun; Wu, Jianguo
Detecting surface coal mining areas (SCMAs) using remote sensing data in a timely and an accurate manner is necessary for coal industry management and environmental assessment. We developed an approach to effectively extract SCMAs from remote sensing imagery based on object-oriented decision trees (OODT). This OODT approach involves three main steps: object-oriented segmentation, calculation of spectral characteristics, and extraction of SCMAs. The advantage of this approach lies in its effective integration of the spectral and spatial characteristics of SCMAs so as to distinguish the mining areas (i.e., the extracting areas, stripped areas, and dumping areas) from other areas that exhibit similar spectral features (e.g., bare soils and built-up areas). We implemented this method to extract SCMAs in the eastern part of Ordos City in Inner Mongolia, China. Our results had an overall accuracy of 97.07% and a kappa coefficient of 0.80. As compared with three other spectral information-based methods, our OODT approach is more accurate in quantifying the amount and spatial pattern of SCMAs in dryland regions.
Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K
The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index.
Teodoro, A. C.; Ferreira, D.; Gonçalves, H.
Evaluation of beach hydromorphological behaviour and its classification is highly complex. The available beach morphologic and classification models are mainly based on wave, tidal and sediment parameters. Since these parameters are usually unavailable for some regions - such as in the Portuguese coastal zone - a morphologic analysis using remotely sensed data seems to be a valid alternative. Data mining for spatial pattern recognition is the process of discovering useful information, such as patterns/forms, changes and significant structures from large amounts of data. This study focuses on the application of data mining techniques, particularly Decision Trees (DT), to an IKONOS-2 image in order to classify beach features/patterns, in a stretch of the northwest coast of Portugal. Based on the knowledge of the coastal features, five classes were defined: Sea, Suspended-Sediments, Breaking-Zone, Beachface and Beach. The dataset was randomly divided into training and validation subsets. Based on the analysis of several DT algorithms, the CART algorithm was found to be the most adequate and was thus applied. The performance of the DT algorithm was evaluated by the confusion matrix, overall accuracy, and Kappa coefficient. In the classification of beach features/patterns, the algorithm presented an overall accuracy of 98.2% and a kappa coefficient of 0.97. The DTs were compared with a neural network algorithm, and the results were in agreement. The methodology presented in this paper provides promising results and should be considered in further applications of beach forms/patterns classification.
Chien, Ching-Wen; Lee, Yi-Chih; Ma, Tsochiang; Lee, Tian-Shyug; Lin, Yang-Chu; Wang, Weu; Lee, Wei-Jei
Gastric cancer remains a leading cause of death worldwide. Post-operative complication is one important factor which causes mortality of gastric cancer patients after gastrectomy. Better prediction of post-operative complication before gastrectomy can significantly reduce post-operative mortality and morbidity. Therefore, 3 data mining techniques were applied in this study on improving prediction of post-operative complication. A retrospective study was performed on 521 patients from 3 over 2,000 acute-bed medical centers in Taiwan during February 2002 to October 2004. Pre- and post-operative clinical data were collected and analyzed by applying 3 data mining techniques, included Artificial Neural Networks (ANN), Decision Tree (DT) and Logistic Regression (LR). Results of this study indicated that ANN was a better technique than DT and LR in predicting post-operative complication. Nutritious status, pathological characteristics and operational characteristics were important predictors of post-operative complication. Further study on predicting postoperative complication in gastric cancer patients is still important. However, how to combine different data mining techniques to improve accuracies of prediction will be another important issue for clinicians and researchers.
Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.
Tsai, M.; Veziroglu, A.; Warren, S.; Que, Y.
According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns
Brown, Allen W; Malec, James F; McClelland, Robyn L; Diehl, Nancy N; Englander, Jeffrey; Cifu, David X
Traumatic brain injury (TBI) often presents clinicians with a complex combination of clinical elements that can confound treatment and make outcome prediction challenging. Predictive models have commonly used acute physiological variables and gross clinical measures to predict mortality and basic outcome endpoints. The primary goal of this study was to consider all clinical elements available concerning a survivor of TBI admitted for inpatient rehabilitation, and identify those factors that predict disability, need for supervision, and productive activity one year after injury. The Traumatic Brain Injury Model Systems (TBIMS) database was used for decision tree analysis using recursive partitioning (n = 3463). Outcome measures included the Functional Independence Measure(), the Disability Rating Scale, the Supervision Rating Scale, and a measure of productive activity. Predictor variables included all physical examination elements, measures of injury severity (initial Glasgow Coma Scale score, duration of post-traumatic amnesia [PTA], length of coma, CT scan pathology), gender, age, and years of education. The duration of PTA, age, and most elements of the physical examination were predictive of early disability. The duration of PTA alone was selected to predict late disability and independent living. The duration of PTA, age, sitting balance, and limb strength were selected to predict productive activity at 1 year. The duration of PTA was the best predictor of outcome selected in this model for all endpoints and elements of the physical examination provided additional predictive value. Valid and reliable measures of PTA and physical impairment after TBI are important for accurate outcome prediction.
Full Text Available We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.
Xin, Zhong; Hua, Lin; Wang, Xu-Hong; Zhao, Dong; Yu, Cai-Guo; Ma, Ya-Hong; Zhao, Lei; Cao, Xi; Yang, Jin-Kui
We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.
Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors' information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person's motion.
DeBari, Vincent A
It has been demonstrated that decision levels (DL) and their confidence intervals (CI) can be estimated from the second derivative, f '' (P), of the logistic regression probability curve (LRPC). Although this method generally provides smooth curves from which DL and CI can be obtained, there are datasets that generate "noisy" curves making these measurements difficult. The purpose of this study was to develop a procedure to obviate this noise, thus allowing the more facile estimation of DL and CI. Data from two clinical studies were examined. Logistic regression analysis was performed and the first derivatives, f ' (P), were fitted to Gaussian models. The derivatives of these surrogate f ' (P) were generated to provide f '' (P) and were compared with data from receiver operating characteristic (ROC) curves. For both sets of data, the surrogate curves demonstrated strong fits to the natural f ' (P) with r(2) = 0.986 for one study and 0.832 for the second. The f '' (P) generated from the surrogate curves demonstrated single maxima (M) and minima (m), compared with the f '' (P) generated from the natural f ' (P) in which multiple M and m were observed. Easily discernible DL and CI were observed for both datasets with differences from ROC-estimated DL of 1.7% for the first study and 4.8% for the second. The use of a surrogate Gaussian simulation of f ' (P) may be a useful alternative to natural f ' (P) when using the f '' (P) of the LRPC to determine DL and CI.
Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando
This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
Scholz, Miklas; Uzomah, Vincent C
The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment. Copyright © 2013 Elsevier B.V. All rights reserved.
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.
Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula
Introduction Crohn’s disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients’ risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. Materials and methods This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Results Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50–4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09–0.25] and 0.50 [0.24–1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. Conclusions The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation. PMID:28225800
Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H
Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports.
Full Text Available Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin, in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%. The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4 with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.
Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen
Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241Am-9Be and 252Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions.
Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison
Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case-control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17,151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Chen, Z M; Ji, S B; Shi, X L; Zhao, Y Y; Zhang, X F; Jin, H
Objective: To evaluate the cost-utility of different hepatitis E vaccination strategies in women aged 15 to 49. Methods: The Markov-decision tree model was constructed to evaluate the cost-utility of three hepatitis E virus vaccination strategies. Parameters of the models were estimated on the basis of published studies and experience of experts. Both methods on sensitivity and threshold analysis were used to evaluate the uncertainties of the model. Results: Compared with non-vaccination group, strategy on post-screening vaccination with rate as 100%, could save 0.10 quality-adjusted life years per capital in the women from the societal perspectives. After implementation of screening program and with the vaccination rate reaching 100%, the incremental cost utility ratio (ICUR) of vaccination appeared as 5 651.89 and 6 385.33 Yuan/QALY, respectively. Vaccination post to the implementation of a screening program, the result showed better benefit than the vaccination rate of 100%. Results from the sensitivity analysis showed that both the cost of hepatitis E vaccine and the inoculation compliance rate presented significant effects. If the cost were lower than 191.56 Yuan (RMB) or the inoculation compliance rate lower than 0.23, the vaccination rate of 100% strategy was better than the post-screening vaccination strategy, otherwise the post-screening vaccination strategy appeared the optimal strategy. Conclusion: Post-screening vaccination for women aged 15 to 49 from social perspectives seemed the optimal one but it had to depend on the change of vaccine cost and the rate of inoculation compliance.
Kelley, B.B.; Walker, G.D.; Miles, K.A.
Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc