WorldWideScience

Sample records for boosted decision trees

  1. Reweighting with Boosted Decision Trees

    CERN Document Server

    Rogozhnikov, A

    2016-01-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the "signal" events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  2. Supervised hashing using graph cuts and boosted decision trees.

    Science.gov (United States)

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data.

  3. Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

    Directory of Open Access Journals (Sweden)

    Chuan Ding

    2016-10-01

    Full Text Available Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.

  4. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    Science.gov (United States)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  5. Effect of training characteristics on object classification: an application using Boosted Decision Trees

    CERN Document Server

    Sevilla-Noarbe, Ignacio

    2015-01-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects tha...

  6. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  7. Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data

    OpenAIRE

    Spanakis, Gerasimos; Weiss, Gerhard; Roefs, Anne

    2016-01-01

    Ecological Momentary Assessment (EMA) data is organized in multiple levels (per-subject, per-day, etc.) and this particular structure should be taken into account in machine learning algorithms used in EMA like decision trees and its variants. We propose a new algorithm called BBT (standing for Bagged Boosted Trees) that is enhanced by a over/under sampling method and can provide better estimates for the conditional class probability function. Experimental results on a real-world dataset show...

  8. Measurement of the t-channel single top-quark production using boosted decision trees in ATLAS experiment at √(s)=7 TeV

    International Nuclear Information System (INIS)

    This thesis presents a measurement of the cross section of t-channel single top-quark production using 1.04 fb-1 data collected by the ATLAS detector at the LHC with proton-proton collision at center-of-mass √(s)=7 TeV. Selected events contain one lepton, missing transverse energy, and two or three jets, one of them b-tagged. The background model consists of multi-jets, W+jets and top-quark pair events, with smaller contributions from Z+jets and di-boson events. By using a selection based on the distribution of a multivariate discriminant constructed with the boosted decision trees, the cross section of t-channel single top-quark production is measured: σt = (97.3 +30.7 -30.2) pb, which is in good agreement with the prediction of the Standard Model. Assuming that the top-quark-related CKM matrix elements obey the relation |Vtb|>> |Vts|, |Vtd|, the coupling strength at the Wtb vertex is extracted from the measured cross section, |Vtb| = (1.23 +0.20 -0.19). If it is assumed that |Vtb| ≤ 1 a lower limit of |Vtb| > 0.61 is obtained at the 95% confidence level. (author)

  9. Geometric Decision Tree

    CERN Document Server

    Manwani, Naresh

    2010-01-01

    In this paper we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node, are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.

  10. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  11. Human decision error (HUMDEE) trees

    Energy Technology Data Exchange (ETDEWEB)

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  12. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  13. A Comparison of Boosting Tree and Gradient Treeboost Methods for Carpal Tunnel Syndrome

    Directory of Open Access Journals (Sweden)

    Gülhan OREKİCİ TEMEL

    2014-10-01

    Full Text Available Objective: Boosting is one of the most successful combining methods. The principal aim of these combining algorithms is to obtain a strong classifier with small estimation error from the combination of weak classifiers. Boosting based on combining tree has many advantages. Data sets can contain mixtures of nominal, ordinal and numerical variables. AdaBoost and Gradient TreeBoost are commonly used boosting procedure. Both methods are a stage wise additive model fitting procedure. Our goal in this study is to explain the both method and to compare the algorithm results on a neurology data set on the purpose of classification. Material and Methods: The data set consists of 4076 incidences in total. The condition of being a patient with Carpal Tunnel Syndrome (CTS or not was considered as the dependent variable. Boosting Tree and Gradient TreeBoost applications were conducted in Statistica 7.0 and Salford Predictive Modeler: TreeNet (R trial version 6.6.0.091. Results: In AdaBoost and Gradient TreeBoost algorithm, multiple trees are grown of the training data. 200 trees are produced for both models. 70 trees in the AdaBoost Algorithm and 196 trees in the Gradient TreeBoost algorithm are chosen as the optimal trees. Conclusion: The sensitivity or specify values in the test data of Gradient TreeBoost are high indicates that they can be used as a successful method in CTS diagnosis. . It is believed that the boosting methods will become very more and more popular in health science due to its easy implementation and high predictive performance.

  14. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    Directory of Open Access Journals (Sweden)

    Yoonseok Shin

    2015-01-01

    Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  15. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  16. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  17. A new decision tree learning algorithm

    Institute of Scientific and Technical Information of China (English)

    FANG Yong; QI Fei-hu

    2005-01-01

    In order to improve the generalization ability of binary decision trees, a new learning algorithm, the MMDT algorithm, is presented. Based on statistical learning theory the generalization performance of binary decision trees is analyzed, and the assessment rule is proposed. Under the direction of the assessment rule, the MMDT algorithm is implemented. The algorithm maps training examples from an original space to a high dimension featurespace, and constructs a decision tree in it. In the feature space, a new decision node splitting criterion, the max-min rule, is used, and the margin of each decision node is maximized using a support vector machine, to improve the generalization performance. Experimental results show that the new learning algorithm is much superior to others such as C4. 5 and OC1.

  18. Measuring Intuition: Nonconscious Emotional Information Boosts Decision Accuracy and Confidence.

    Science.gov (United States)

    Lufityanto, Galang; Donkin, Chris; Pearson, Joel

    2016-05-01

    The long-held popular notion of intuition has garnered much attention both academically and popularly. Although most people agree that there is such a phenomenon as intuition, involving emotionally charged, rapid, unconscious processes, little compelling evidence supports this notion. Here, we introduce a technique in which subliminal emotional information is presented to subjects while they make fully conscious sensory decisions. Our behavioral and physiological data, along with evidence-accumulator models, show that nonconscious emotional information can boost accuracy and confidence in a concurrent emotion-free decision task, while also speeding up response times. Moreover, these effects were contingent on the specific predictive arrangement of the nonconscious emotional valence and motion direction in the decisional stimulus. A model that simultaneously accumulates evidence from both physiological skin conductance and conscious decisional information provides an accurate description of the data. These findings support the notion that nonconscious emotions can bias concurrent nonemotional behavior-a process of intuition.

  19. Measuring Intuition: Nonconscious Emotional Information Boosts Decision Accuracy and Confidence.

    Science.gov (United States)

    Lufityanto, Galang; Donkin, Chris; Pearson, Joel

    2016-05-01

    The long-held popular notion of intuition has garnered much attention both academically and popularly. Although most people agree that there is such a phenomenon as intuition, involving emotionally charged, rapid, unconscious processes, little compelling evidence supports this notion. Here, we introduce a technique in which subliminal emotional information is presented to subjects while they make fully conscious sensory decisions. Our behavioral and physiological data, along with evidence-accumulator models, show that nonconscious emotional information can boost accuracy and confidence in a concurrent emotion-free decision task, while also speeding up response times. Moreover, these effects were contingent on the specific predictive arrangement of the nonconscious emotional valence and motion direction in the decisional stimulus. A model that simultaneously accumulates evidence from both physiological skin conductance and conscious decisional information provides an accurate description of the data. These findings support the notion that nonconscious emotions can bias concurrent nonemotional behavior-a process of intuition. PMID:27052557

  20. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  1. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  2. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  3. Using Decision Trees for Coreference Resolution

    CERN Document Server

    McCarthy, J F; Carthy, Joseph F. Mc; Lehnert, Wendy G.

    1995-01-01

    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.

  4. Diagnosis of Hepatitis using Decision tree algorithm

    Directory of Open Access Journals (Sweden)

    V.Shankar sowmien

    2016-06-01

    Full Text Available This research paper proposes a prediction system for liver disease using machine learning. Researchers provided various data to identify the causes for Hepatitis. Here, Decision tree method is used to determine the structural information of tissues. The algorithm used to construct the decision tree is C4.5 that concentrates on 19 attributes such as age, sex, steroids, antivirals, spleen, fatigue, malaise, anorexia, liver big, liver firm, spiders, vilirubin, varices, ascites, ALK phosphate, SGOT, albumin, protime, and histology for the diagnosis of the disease. These features helped in determining the abnormalities of the patient which resulted in 85.81% accuracy.

  5. INDUCTION OF DECISION TREES BASED ON A FUZZY NEURAL NETWORK

    Institute of Scientific and Technical Information of China (English)

    Tang Bin; Hu Guangrui; Mao Xiaoquan

    2002-01-01

    Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.

  6. STUDY ON DECISION TREE COMPETENT DATA CLASSIFICATION

    OpenAIRE

    Vanitha, A.; S.Niraimathi

    2013-01-01

    Data mining is a process where intelligent methods are applied in order to extract data patterns.This is used in cases of discovering patterns and trends among large datasets. Data classification involvescategorization of data into different category according to protocols. They are many classification algorithmsavailable and among the decision tree is the most commonly used method. Classification of data objectsbased on a predefined knowledge of objects is a data mining. This paper discussed...

  7. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  8. Optimizing Decision Tree Attack on CAS Scheme

    Directory of Open Access Journals (Sweden)

    PERKOVIC, T.

    2016-05-01

    Full Text Available In this paper we show a successful side-channel timing attack on a well-known high-complexity cognitive authentication (CAS scheme. We exploit the weakness of CAS scheme that comes from the asymmetry of the virtual interface and graphical layout which results in nonuniform human behavior during the login procedure, leading to detectable variations in user's response times. We optimized a well-known probabilistic decision tree attack on CAS scheme by introducing this timing information into the attack. We show that the developed classifier could be used to significantly reduce the number of login sessions required to break the CAS scheme.

  9. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  10. Boosted Regression Trees in the H$\\rightarrow \\tau\\tau$ decay channel

    CERN Document Server

    Hedrich, Natascha Sylvia

    2013-01-01

    This report examines the application of a multivariate analysis technique, known as Boosted Regression Trees (BRT's) to the reconstruction of the Higgs mass. BRT's being evaluated as a competing method to the Missing Mass Calculator, which is currently being used in the H $\\rightarrow \\tau\\tau$ channel. The effects of the regression target distribution, input variables and training parameters on the regression performance is also investigated. BRT's are a promising technique and further studies will aim to better understand potential biases.

  11. The identification of complex interactions in epidemiology and toxicology : a simulation study of Boosted Regression Trees

    OpenAIRE

    Lampa, Erik; Lind, Lars; Lind, Monica P.; Bornefalk-Hermansson, Anna

    2014-01-01

    Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. Methods: We simulate a continuous outcome from real data on 27 environment...

  12. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  13. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  14. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  15. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  16. Statistical Decision-Tree Models for Parsing

    CERN Document Server

    Magerman, D M

    1995-01-01

    Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...

  17. Boosting bonsai trees for efficient features combination : application to speaker role identification

    OpenAIRE

    Laurent, Antoine; Camelin, Nathalie; Raymond, Christian

    2014-01-01

    In this article, we tackle the problem of speaker role detection from broadcast news shows. In the literature, many proposed solutions are based on the combination of various features coming from acoustic, lexical and semantic information with a machine learning algorithm. Many previous studies mention the use of boosting over decision stumps to combine efficiently these features. In this work, we propose a modification of this state-of-the-art machine learning algorithm changing the weak lea...

  18. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  19. Application of portfolio theory in decision tree analysis.

    Science.gov (United States)

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation.

  20. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  1. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  2. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  3. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  4. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  5. Decision tree approach to power systems security assessment

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1993-01-01

    An overview of the general decision tree approach to power system security assessment is presented. The general decision tree methodology is outlined, modifications proposed in the context of transient stability assessment are embedded, and further refinements are considered. The approach is then suitably tailored to handle other specifics of power systems security, relating to both preventive and emergency voltage control, in addition to transient stability. Trees are accordingly built in th...

  6. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  7. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  8. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    OpenAIRE

    Bruno Carneiro da Rocha; Rafael Timóteo de Sousa Júnior

    2010-01-01

    This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of t...

  9. Prediction Of Study Track Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Deepali Joshi

    2014-05-01

    Full Text Available One of the most important issues to succeed in academic life is to assign students to the right track when they arrive at the end of basic education stage. The education system is graded from 1st to 10th standard, where after finishing the 10th grade the student’s are distributed into different academic tracks or fields such as Science, Commerce, Arts depending on the marks that they have scored. In order to succeed in academic life the student should select the correct academic field. Many students fail to select the appropriate field. At one instant of time they prefer a certain type of career and at the next instant they consider another option. To improve the quality of education data mining techniques can be utilized instead of the traditional process. The proposed system has many benefits as compared to traditional system as the accuracy of results is better. The problems can be solved via the proposed system. The proposed system will predict the streams through the decision tree method. With each and every input the proposed system evolves with better accuracy.

  10. Decision tree methods:applicaitons for classiifcaiton and prediciton

    Institute of Scientific and Technical Information of China (English)

    Yan-yan SONG; Ying LU

    2015-01-01

    Summary:Decision tree methodology is a commonly used data mining method for establishing classiifcaiton systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can effciently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validaiton datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the opitmal ifnal model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  11. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    G KISHOR KUMAR; P VISWANATH; A ANANDA RAO

    2016-03-01

    For classification, decision trees have become very popular because of its simplicity, interpret-ability and good performance. To induce a decision tree classifier for data having continuous valued attributes, the most common approach is, split the continuous attribute range into a hard (crisp) partition having two or more blocks, using one or several crisp (sharp) cut points. But, this can make the resulting decision tree, very sensitive to noise.An existing solution to this problem is to split the continuous attribute into a fuzzy partition (soft partition) using soft or fuzzy cut points which is based on fuzzy set theory and to use fuzzy decisions at nodes of the tree. Theseare called soft decision trees in the literature which are shown to perform better than conventional decision trees, especially in the presence of noise. Current paper, first proposes to use an ensemble of soft decision trees forrobust classification where the attribute, fuzzy cut point, etc. parameters are chosen randomly from a probability distribution of fuzzy information gain for various attributes and for their various cut points. Further, the paperproposes to use probability based information gain to achieve better results. The effectiveness of the proposed method is shown by experimental studies carried out using three standard data sets. It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  12. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    Science.gov (United States)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  13. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  14. Automatic design of decision-tree algorithms with evolutionary algorithms.

    Science.gov (United States)

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  15. Decision Trees and Transient Stability of Electric Power Systems

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1991-01-01

    An inductive inference method for the automatic building of decision trees is investigated. Among its various tasks, the splitting and the stop splitting criteria successively applied to the nodes of a grown tree, are found to play a crucial role on its overall shape and performances. The application of this general method to transient stability is systematically explored. Parameters related to the stop splitting criterion, to the learning set and to the tree classes are thus considered, a...

  16. Confidence sets for split points in decision trees

    OpenAIRE

    Banerjee, Moulinath; McKeague, Ian W.

    2007-01-01

    We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others cali...

  17. Detection and Extraction of Videos using Decision Trees

    Directory of Open Access Journals (Sweden)

    Sk.Abdul Nabi

    2011-12-01

    Full Text Available This paper addresses a new multimedia data mining framework for the extraction of events in videos by using decision tree logic. The aim of our DEVDT (Detection and Extraction of Videos using Decision Trees system is for improving the indexing and retrieval of multimedia information. The extracted events can be used to index the videos. In this system we have considered C4.5 Decision tree algorithm [3] which is used for managing both continuous and discrete attributes. In this process, firstly we have adopted an advanced video event detection method to produce event boundaries and some important visual features. This rich multi-modal feature set is filtered by a pre-processing step to clean the noise as well as to reduce the irrelevant data. This will improve the performance of both Precision and Recall. After producing the cleaned data, it will be mined and classified by using a decision tree model. The learning and classification steps of this Decision tree are simple and fast. The Decision Tree has good accuracy. Subsequently, by using our system we will reach maximum Precision and Recall i.e. we will extract pure video events effectively and proficiently.

  18. Decision support and data warehousing tools boost competitive advantage.

    Science.gov (United States)

    Waldo, B H

    1998-01-01

    The ability to communicate across the care continuum is fast becoming an integral component of the successful health enterprise. As integrated delivery systems are formed and patient care delivery is restructured, health care professionals must be able to distribute, access, and evaluate information across departments and care settings. The Aberdeen Group, a computer and communications research and consulting organization, believes that "the single biggest challenge for next-generation health care providers is to improve on how they consolidate and manage information across the continuum of care. This involves building a strategic warehouse of clinical and financial information that can be shared and leveraged by health care professionals, regardless of the location or type of care setting" (Aberdeen Group, Inc., 1997). The value and importance of data and systems integration are growing. Organizations that create a strategy and implement DSS tools to provide decision-makers with the critical information they need to face the competition and maintain quality and costs will have the advantage. PMID:9592525

  19. Nerual Networks with Decision Trees for Diagnosis Issues

    Directory of Open Access Journals (Sweden)

    Yahia Kourd

    2013-05-01

    Full Text Available This paper presents a new idea for fault detection and isolation (FDI technique which is applied to industrial system. This technique is bas ed on Neural Networks fault-free and Faulty behaviours Models (NNFMs. NNFMs are used for resid ual generation, while decision tree architecture is used for residual evaluation. The d ecision tree is realized with data collected from the NNFM’s outputs and is used to isolate dete ctable faults depending on computed threshold. Each part of the tree corresponds to spe cific residual. With the decision tree, it becomes possible to take the appropriate decision r egarding the actual process behaviour by evaluating few numbers of residuals. In comparison to usual systematic evaluation of all residuals, the proposed technique requires less com putational effort and can be used for on line diagnosis. An application example is presented to i llustrate and confirm the effectiveness and the accuracy of the proposed approach.

  20. Generating Decision Trees Method Based on Improved ID3 Algorithm

    Institute of Scientific and Technical Information of China (English)

    Yang Ming; Guo Shuxu1; Wang Jun3

    2011-01-01

    The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.

  1. Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.

    Science.gov (United States)

    Beck, Kirk A.

    2005-01-01

    This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…

  2. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  3. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    Directory of Open Access Journals (Sweden)

    Bruno Carneiro da Rocha

    2010-10-01

    Full Text Available This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DMmanagement model of data mining in large operational databases logged from internet banktransactions.

  4. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  5. Smart City Mobility Application—Gradient Boosting Trees for Mobility Prediction and Analysis Based on Crowdsourced Data

    Directory of Open Access Journals (Sweden)

    Ivana Semanjski

    2015-07-01

    Full Text Available Mobility management represents one of the most important parts of the smart city concept. The way we travel, at what time of the day, for what purposes and with what transportation modes, have a pertinent impact on the overall quality of life in cities. To manage this process, detailed and comprehensive information on individuals’ behaviour is needed as well as effective feedback/communication channels. In this article, we explore the applicability of crowdsourced data for this purpose. We apply a gradient boosting trees algorithm to model individuals’ mobility decision making processes (particularly concerning what transportation mode they are likely to use. To accomplish this we rely on data collected from three sources: a dedicated smartphone application, a geographic information systems-based web interface and weather forecast data collected over a period of six months. The applicability of the developed model is seen as a potential platform for personalized mobility management in smart cities and a communication tool between the city (to steer the users towards more sustainable behaviour by additionally weighting preferred suggestions and users (who can give feedback on the acceptability of the provided suggestions, by accepting or rejecting them, providing an additional input to the learning process.

  6. Smart City Mobility Application--Gradient Boosting Trees for Mobility Prediction and Analysis Based on Crowdsourced Data.

    Science.gov (United States)

    Semanjski, Ivana; Gautama, Sidharta

    2015-01-01

    Mobility management represents one of the most important parts of the smart city concept. The way we travel, at what time of the day, for what purposes and with what transportation modes, have a pertinent impact on the overall quality of life in cities. To manage this process, detailed and comprehensive information on individuals' behaviour is needed as well as effective feedback/communication channels. In this article, we explore the applicability of crowdsourced data for this purpose. We apply a gradient boosting trees algorithm to model individuals' mobility decision making processes (particularly concerning what transportation mode they are likely to use). To accomplish this we rely on data collected from three sources: a dedicated smartphone application, a geographic information systems-based web interface and weather forecast data collected over a period of six months. The applicability of the developed model is seen as a potential platform for personalized mobility management in smart cities and a communication tool between the city (to steer the users towards more sustainable behaviour by additionally weighting preferred suggestions) and users (who can give feedback on the acceptability of the provided suggestions, by accepting or rejecting them, providing an additional input to the learning process). PMID:26151209

  7. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  8. Automatic sleep staging using state machine-controlled decision trees.

    Science.gov (United States)

    Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2015-01-01

    Automatic sleep staging from a reduced number of channels is desirable to save time, reduce costs and make sleep monitoring more accessible by providing home-based polysomnography. This paper introduces a novel algorithm for automatic scoring of sleep stages using a combination of small decision trees driven by a state machine. The algorithm uses two channels of EEG for feature extraction and has a state machine that selects a suitable decision tree for classification based on the prevailing sleep stage. Its performance has been evaluated using the complete dataset of 61 recordings from PhysioNet Sleep EDF Expanded database achieving an overall accuracy of 82% and 79% on training and test sets respectively. The algorithm has been developed with a very small number of decision tree nodes that are active at any given time making it suitable for use in resource-constrained wearable systems.

  9. English BNP identification based on corpus-trained decision tree

    Institute of Scientific and Technical Information of China (English)

    孟遥; 赵铁军; 李生; 张晓光

    2001-01-01

    Finding simple, non-recursive, base noun phrase is an important step for many natural language processing applications. This paper presents a new corpus-based approach using decision tree for that purpose. In contrast to previous methods for Base NP identification, we adopt a decision tree trained from Penn Treebank to identify Base NP. And a self-learning mechanism is further integrated into our model. Experimental results show good performances using our method. The method can also be applied to processing of any other language.

  10. USING PRECEDENTS FOR REDUCTION OF DECISION TREE BY GRAPH SEARCH

    Directory of Open Access Journals (Sweden)

    I. A. Bessmertny

    2015-01-01

    Full Text Available The paper considers the problem of mutual payment organization between business entities by means of clearing that is solved by search of graph paths. To reduce the decision tree complexity a method of precedents is proposed that consists in saving the intermediate solution during the moving along decision tree. An algorithm and example are presented demonstrating solution complexity coming close to a linear one. The tests carried out in civil aviation settlement system demonstrate approximately 30 percent shortage of real money transfer. The proposed algorithm is planned to be implemented also in other clearing organizations of the Russian Federation.

  11. Matching in Vitro Bioaccessibility of Polyphenols and Antioxidant Capacity of Soluble Coffee by Boosted Regression Trees.

    Science.gov (United States)

    Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A

    2015-11-01

    The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods.

  12. Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees

    Science.gov (United States)

    Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.

    2015-12-01

    In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.

  13. Matching in Vitro Bioaccessibility of Polyphenols and Antioxidant Capacity of Soluble Coffee by Boosted Regression Trees.

    Science.gov (United States)

    Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A

    2015-11-01

    The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods. PMID:26457815

  14. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  15. Boosted Regression Trees for Early Yield Estimation: A Case Study on Winter Wheat in the North China Plain

    Science.gov (United States)

    Heremans, Stien; Dong, Qinghan; Bydekerke, Lieven; Van Orshoven, Jos

    2014-11-01

    Crop yield estimates should be both early and accurate to be useful in the context of food security. The combination of remotely sensed vegetation indices and measured climatic variables (temperature, rainfall,…) allows timely, cost efficient and spatially explicit yield estimation. Machine learning methods like boosted regression trees (BoRTs) have proven to be accurate in predicting a diverse range of biophysical parameters. Until now, they have however rarely been applied in crop yield estimation.

  16. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    Hoogh, de Sebastiaan; Schoenmakers, Berry; Chen, Ping; Akker, op den Harm

    2014-01-01

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approa

  17. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  18. Fingerprint Gender Classification using Univariate Decision Tree (J48

    Directory of Open Access Journals (Sweden)

    S. F. Abdullah

    2016-09-01

    Full Text Available Data mining is the process of analyzing data from a different category. This data provide information and data mining will extracts a new knowledge from it and a new useful information is created. Decision tree learning is a method commonly used in data mining. The decision tree is a model of decision that looklike as a tree-like graph with nodes, branches and leaves. Each internal node denotes a test on an attribute and each branch represents the outcome of the test. The leaf node which is the last node will holds a class label. Decision tree classifies the instance and helps in making a prediction of the data used. This study focused on a J48 algorithm for classifying a gender by using fingerprint features. There are four types of features in the fingerprint that is used in this study, which is Ridge Count (RC, Ridge Density (RD, Ridge Thickness to Valley Thickness Ratio (RTVTR and White Lines Count (WLC. Different cases have been determined to be executed with the J48 algorithm and a comparison of the knowledge gain from each test is shown. All the result of this experiment is running using Weka and the result achieve 96.28% for the classification rate.

  19. Construction of a decision tree in linear programming problems

    International Nuclear Information System (INIS)

    The dependence of the solution of a linear programming problem on its parameter has been analyzed. An algorithm for the construction of a decision tree has been proposed with the use of the simplex method together with the validity support system

  20. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  1. Soil Organic Matter Mapping by Decision Tree Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHOU Bin; ZHANG Xing-Gang; WANG Fan; WANG Ren-Chao

    2005-01-01

    Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data.The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform,geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative,landscape factor-associated regular system. This system could be used to predict continuous SOM spatial distribution.By analyzing factors such as elevation, geological unit, soil type, land use, remotely sensed data, upslope contributing area, slope, aspect, planform curvature, and profile curvature, the decision tree could predict distribution of soil organic matter levels. Among these factors, elevation, land use, aspect, soil type, the first principle component of bitemporal Landsat TM, and upslope contributing area were considered the most important variables for predicting SOM. Results of the prediction between SOM content and landscape types sorted by the decision tree showed a close relationship with an accuracy of 81.1%.

  2. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression T...

  3. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  4. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  5. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  6. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  7. Distributed Decision-Tree Induction in Peer-to-Peer Systems

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in...

  8. Emergent Linguistic Rules from Inducing Decision Trees Disambiguating Discourse Clue Words

    CERN Document Server

    Siegel, E V; Siegel, Eric V.; Keown, Kathleen R. Mc

    1994-01-01

    We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules.

  9. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  10. Rule Extraction in Transient Stability Study Using Linear Decision Trees

    Institute of Scientific and Technical Information of China (English)

    SUN Hongbin; WANG Kang; ZHANG Boming; ZHAO Feng

    2011-01-01

    Traditional operation rules depend on human experience, which are relatively fixed and difficult to fulfill the new demand of the modern power grid. In order to formulate suitable and quickly refreshed operation rules, a method of linear decision tree based on support samples is proposed for rule extraction in this paper. The operation rules extracted by this method have advantages of refinement and intelligence, which helps the dispatching center meet the requirement of smart grid construction.

  11. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  12. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  13. Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference

    OpenAIRE

    Tran, Tung; Yang, Bo-Suk; Oh, Myung-Suck; Tan, Andy Chit Chiow

    2009-01-01

    This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and le...

  14. Decision Tree Classifiers for Star/Galaxy Separation

    CERN Document Server

    Vasconcellos, E C; Gal, R R; LaBarbera, F L; Capelato, H V; Velho, H F Campos; Trevisan, M; Ruiz, R S R

    2010-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14\\le r\\le21$ ($85.2%$) and $r\\ge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15\\le r\\le21$, with m...

  15. PREDIKSI CALON MAHASISWA BARU MENGUNAKAN METODE KLASIFIKASI DECISION TREE

    Directory of Open Access Journals (Sweden)

    Mambang

    2015-02-01

    Full Text Available Prior to the organization of health education begin the new school year, then the first step will be carried out selection of new admissions from general secondary education graduates and vocational. In this study, predicting new students to take multiple data attributes. The model is a decision tree classification prediction method to create a tree consisting of a root node, internal nodes and terminal nodes. While the root node and internal nodes are variables / features, the terminal node. Based on the experimental results and evaluations are done, it can be concluded that algorithm C4.5 with 80.39% accuracy obtained Uncertainty, Precision 94.44%, Recall of 75.00 % while the C4.5 algorithm with Information Gain Accuracy Ratio 88.24%, 98.28% Precision, 83.82% Recall.

  16. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  17. FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ferdaws Ezzi

    2016-01-01

    Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.

  18. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision tree approach for predicting smokers'quit intentions using the data from the International Tobacco Control Four Country Survey. Three rule-based classification models are generated from three data sets using attributes in relation to demographics, warning labels, and smokers' beliefs. Both demographic attributes and warning label attributes are important in predicting smokers' quit intentions. The model's ability to predict smokers' quit intentions is enhanced, if the attributes regarding smokers' internal motivation and beliefs about quitting are included.

  19. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    Directory of Open Access Journals (Sweden)

    O.BENCHAREF

    2011-09-01

    Full Text Available The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

  20. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  1. Distribution-Specific Agnostic Boosting

    CERN Document Server

    Feldman, Vitaly

    2009-01-01

    We consider the problem of boosting the accuracy of weak learning algorithms in the agnostic learning framework of Haussler (1992) and Kearns et al. (1992). Known algorithms for this problem (Ben-David et al., 2001; Gavinsky, 2002; Kalai et al., 2008) follow the same strategy as boosting algorithms in the PAC model: the weak learner is executed on the same target function but over different distributions on the domain. We demonstrate boosting algorithms for the agnostic learning framework that only modify the distribution on the labels of the points (or, equivalently, modify the target function). This allows boosting a distribution-specific weak agnostic learner to a strong agnostic learner with respect to the same distribution. When applied to the weak agnostic parity learning algorithm of Goldreich and Levin (1989) our algorithm yields a simple PAC learning algorithm for DNF and an agnostic learning algorithm for decision trees over the uniform distribution using membership queries. These results substantia...

  2. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  3. An Applied Research of Decision Tree Algorithm in Track and Field Equipment Training

    OpenAIRE

    Liu Shaoqing; Wang Kebin

    2015-01-01

    This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering c...

  4. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    CERN Document Server

    Farid, Dewan Md; Rahman, Mohammad Zahidur; 10.5121/ijnsa.2010.2202

    2010-01-01

    In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learn...

  5. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  6. CLASSIFICATION OF LISS IV IMAGERY USING DECISION TREE METHODS

    Directory of Open Access Journals (Sweden)

    A. K. Verma

    2016-06-01

    Full Text Available Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  7. Classification of Liss IV Imagery Using Decision Tree Methods

    Science.gov (United States)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  8. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  9. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    Science.gov (United States)

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  10. Binary Decision Tree Development for Probabilistic Safety Assessment Applications

    International Nuclear Information System (INIS)

    The aim of this article is to describe state of the development for the relatively new approach in the probabilistic safety analysis (PSA). This approach is based on the application of binary decision diagrams (BDD) representation for the logical function on the quantitative and qualitative analysis of complex systems that are presented by fault trees and event trees in the PSA applied for the nuclear power plants risk determination. Even BDD approach offers full solution comparing to the partial one from the conventional quantification approach there are still problems to be solved before new approach could be fully implemented. Major problem with full application of BDD is difficulty of getting any solution for the PSA models of certain complexity. This paper is comparing two approaches in PSA quantification. Major focus of the paper is description of in-house developed BDD application with implementation of the original algorithms. Resulting number of nodes required to represent the BDD is extremely sensitive to the chosen order of variables (i.e., basic events in PSA). The problem of finding an optimal order of variables that form the BDD falls under the class of NP-complete complexity. This paper presents an original approach to the problem of finding the initial order of variables utilized for the BDD construction by various dynamical reordering schemes. Main advantage of this approach compared to the known methods of finding the initial order is with better results in respect to the required working memory and time needed to finish the BDD construction. Developed method is compared against results from well known methods such as depth-first, breadth-first search procedures. Described method may be applied in finding of an initial order for fault trees/event trees being created from basic events by means of logical operations (e.g. negation, and, or, exclusive or). With some testing models a significant reduction of used memory has been achieved, sometimes

  11. Computer Crime Forensics Based on Improved Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Ying Wang

    2014-04-01

    Full Text Available To find out the evidence of crime-related evidence and association rules among massive data, the classic decision tree algorithms such as ID3 for classification analysis have appeared in related prototype systems. So how to make it more suitable for computer forensics in variable environments becomes a hot issue. When selecting classification attributes, ID3 relies on computation of information entropy. Then the attributes owning more value are selected as classification nodes of the decision tress. Such classification is unrealistic under many cases. During the process of ID3 algorithm there are too many logarithms, so it is complicated to handle with the dataset which has various classification attributes. Therefore, contraposing the special demand for computer crime forensics, ID3 algorithm is improved and a novel classification attribute selection method based on Maclaurin-Priority Value First method is proposed. It adopts the foot changing formula and infinitesimal substitution to simplify the logarithms in ID3. For the errors generated in this process, an apposite constant is introduced to be multiplied by the simplified formulas for compensation. The idea of Priority Value First is introduced to solve the problems of value deviation. The performance of improved method is strictly proved in theory. Finally, the experiments verify that our scheme has advantage in computation time and classification accuracy, compared to ID3 and two existing algorithms

  12. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  13. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087

  14. Searches for Supersymmetric Particles with the ATLAS Detector Using Boosted Decay Tree Topologies

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00399438; De, Kaushik; Hadavand, Haleh; Musielak, Zdzislaw; White, Andrew

    The existence of a scalar Higgs particle poses a challenge to the Standard Model through an unnatural hierarchy problem with quadratic divergence. A supersymmetric framework, proposing heavy partners to every Standard Model particle, can solve this problem by introducing new loop diagrams that involve a new fermion-boson symmetry. The LHC has the potential to probe the energy scale necessary for creation of these particles and the ATLAS experiment is poised for discovery. The detected particles are studied by reconstructing the detected events in boosted frames that approximate each decay frame of the interaction with pairs of heavy, invisible particles. This Razor method was used in the analysis of data from 2011 and 2012 and then generalized to the Recursive Jigsaw method in 2015.

  15. Decision Rules, Trees and Tests for Tables with Many-valued Decisions–comparative Study

    KAUST Repository

    Azad, Mohammad

    2013-10-04

    In this paper, we present three approaches for construction of decision rules for decision tables with many-valued decisions. We construct decision rules directly for rows of decision table, based on paths in decision tree, and based on attributes contained in a test (super-reduct). Experimental results for the data sets taken from UCI Machine Learning Repository, contain comparison of the maximum and the average length of rules for the mentioned approaches.

  16. Decision-tree induction from self-mapping space based on web

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shu-yu; ZHU Zhong-ying

    2007-01-01

    An improved decision tree method for web information retrieval with self-mapping attributes is proposed. The self-mapping tree has a value of self-mapping attribute in its internal node, and information based on dissimilarity between a pair of mapping sequences. This method selects self-mapping which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that the improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that the selfmapping decision tree is promising for data mining and knowledge discovery.

  17. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad

    2016-06-09

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  18. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  19. An Analysis on Performance of Decision Tree Algorithms using Student’s Qualitative Data

    Directory of Open Access Journals (Sweden)

    T.Miranda Lakshmi

    2013-06-01

    Full Text Available Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classification accuracy of CART is higher when compared to ID3 and C4.5. However the difference in classification accuracy between the decision tree algorithms is not considerably higher. The experimental results of decision tree indicate that student’s performance also influenced by qualitative factors.

  20. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  1. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  2. Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems

    CERN Document Server

    Gupta, Anupam; Nagarajan, Viswanath; Ravi, R

    2010-01-01

    We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disease, what is a good adaptive strategy to perform these tests to minimize the expected cost to identify the disease? We settle the approximability of this problem by giving a tight $O(\\log m)$-approximation algorithm. We also consider a more substantial generalization, the Adaptive TSP problem. Given an underlying metric space, a random subset $S$ of cities is drawn from a known distribution, but $S$ is initially unknown to us--we get information about whether any city is in $S$ only when we visit the city in question. What is a good adaptive way of visiting all the cities in the random subset $S$ while minimizing the expected distance traveled? For this problem, we give the first poly-logarithmic approximation, and show that this algorithm is best possible unless w...

  3. Discovering Patterns in Brain Signals Using Decision Trees

    Directory of Open Access Journals (Sweden)

    Narusci S. Bastos

    2016-01-01

    Full Text Available Even with emerging technologies, such as Brain-Computer Interfaces (BCI systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain’s behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain’s behaviour.

  4. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Dewan Md. Farid

    2010-04-01

    Full Text Available In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposedalgorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS. We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR andsignificant reduce false positives (FP for different types of network intrusions using limited computational resources

  5. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  6. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  7. Efficient OCR using simple features and decision trees with backtracking

    International Nuclear Information System (INIS)

    In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)

  8. PERFORMANCE EVALUATION OF C-FUZZY DECISION TREE BASED IDS WITH DIFFERENT DISTANCE MEASURES

    Directory of Open Access Journals (Sweden)

    Vinayak Mantoor

    2012-01-01

    Full Text Available With the ever-increasing growth of computer networks and emergence of electronic commerce in recent years, computer security has become a priority. Intrusion detection system (IDS is often used as another wall of protection in addition to intrusion prevention techniques. This paper introduces a concept and design of decision trees based on Fuzzy clustering. Fuzzy clustering is the core functional part of the overall decision tree development and the developed tree will be referred to as C-fuzzy decision trees. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study the performance of C-fuzzy decision tree based IDS with different distance measures. We analyzed the results of our study using KDD Cup 1999 data and compared the accuracy of the classifier with different distance measures.

  9. Iron Supplementation and Altitude: Decision Making Using a Regression Tree

    Directory of Open Access Journals (Sweden)

    Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore

    2016-03-01

    Full Text Available Altitude exposure increases the body’s need for iron (Gassmann and Muckenthaler, 2015, primarily to support accelerated erythropoiesis, yet clear supplementation guidelines do not exist. Athletes are typically recommended to ingest a daily oral iron supplement to facilitate altitude adaptations, and to help maintain iron balance. However, there is some debate as to whether athletes with otherwise healthy iron stores should be supplemented, due in part to concerns of iron overload. Excess iron in vital organs is associated with an increased risk of a number of conditions including cancer, liver disease and heart failure. Therefore clear guidelines are warranted and athletes should be discouraged from ‘self-prescribing” supplementation without medical advice. In the absence of prospective-controlled studies, decision tree analysis can be used to describe a data set, with the resultant regression tree serving as guide for clinical decision making. Here, we present a regression tree in the context of iron supplementation during altitude exposure, to examine the association between pre-altitude ferritin (Ferritin-Pre and the haemoglobin mass (Hbmass response, based on daily iron supplement dose. De-identified ferritin and Hbmass data from 178 athletes engaged in altitude training were extracted from the Australian Institute of Sport (AIS database. Altitude exposure was predominantly achieved via normobaric Live high: Train low (n = 147 at a simulated altitude of 3000 m for 2 to 4 weeks. The remaining athletes engaged in natural altitude training at venues ranging from 1350 to 2800 m for 3-4 weeks. Thus, the “hypoxic dose” ranged from ~890 km.h to ~1400 km.h. Ethical approval was granted by the AIS Human Ethics Committee, and athletes provided written informed consent. An in depth description and traditional analysis of the complete data set is presented elsewhere (Govus et al., 2015. Iron supplementation was prescribed by a sports physician

  10. In vivo quantitative evaluation of vascular parameters for angiogenesis based on sparse principal component analysis and aggregated boosted trees

    Science.gov (United States)

    Zhao, Fengjun; Liu, Junting; Qu, Xiaochao; Xu, Xianhui; Chen, Xueli; Yang, Xiang; Cao, Feng; Liang, Jimin; Tian, Jie

    2014-12-01

    To solve the multicollinearity issue and unequal contribution of vascular parameters for the quantification of angiogenesis, we developed a quantification evaluation method of vascular parameters for angiogenesis based on in vivo micro-CT imaging of hindlimb ischemic model mice. Taking vascular volume as the ground truth parameter, nine vascular parameters were first assembled into sparse principal components (PCs) to reduce the multicolinearity issue. Aggregated boosted trees (ABTs) were then employed to analyze the importance of vascular parameters for the quantification of angiogenesis via the loadings of sparse PCs. The results demonstrated that vascular volume was mainly characterized by vascular area, vascular junction, connectivity density, segment number and vascular length, which indicated they were the key vascular parameters for the quantification of angiogenesis. The proposed quantitative evaluation method was compared with both the ABTs directly using the nine vascular parameters and Pearson correlation, which were consistent. In contrast to the ABTs directly using the vascular parameters, the proposed method can select all the key vascular parameters simultaneously, because all the key vascular parameters were assembled into the sparse PCs with the highest relative importance.

  11. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  12. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  13. Minimum description length criterion based decision tree dynamic pruning method in speech recognition

    Institute of Scientific and Technical Information of China (English)

    XU Xianghua; HE lin

    2006-01-01

    In phonetic decision tree based state tying, decision trees with varying leaf nodes denote models with different complexity. By studying the influence of model complexity on system performance and speaker adaptation, a decision tree dynamic pruning method based on Minimum Description Length (MDL) criterion is presented. In the method, a well-trained,large-sized phonetic decision tree is selected as an initial model set, and model complexity is computed by adding a penalty parameter which alters according to the amount of adaptation data. Largely attributed to the reasonable selection of initial models and the integration of stochastic and aptotic of MDL criterion, the proposed method gains high performance by combining with speaker adaptation.

  14. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  15. One-year renal graft survival prediction using a weighted decision tree classifier

    OpenAIRE

    Dalia Atallah; Ali Eldesoky; Amira H.; Mohamed Ghoneim

    2014-01-01

    This study introduces a weighted decision tree algorithm for prediction of graft survival in renal transplantation using preoperative patient's data. The objective was to identify the preoperative attributes that affect the graft survival. Between the years 2000-2009, renal allotransplantation was carried out for 889 patients at Urology and Nephrology Center which is the subject matter of this study. The ID3 algorithm was chosen to build up the decision tree using the weka machine learning so...

  16. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    Science.gov (United States)

    Moisen, G.G.; Freeman, E.A.; Blackard, J.A.; Frescino, T.S.; Zimmermann, N.E.; Edwards, T.C.

    2006-01-01

    Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's?? See5 and Cubist (for binary and continuous responses, respectively) are the tools of choice in many of these applications. These tools are widely used in large remote sensing applications, but are not easily interpretable, do not have ties with survey estimation methods, and use proprietary unpublished algorithms. Consequently, three alternative modelling techniques were compared for mapping presence and basal area of 13 species located in the mountain ranges of Utah, USA. The modelling techniques compared included the widely used See5/Cubist, generalized additive models (GAMs), and stochastic gradient boosting (SGB). Model performance was evaluated using independent test data sets. Evaluation criteria for mapping species presence included specificity, sensitivity, Kappa, and area under the curve (AUC). Evaluation criteria for the continuous basal area variables included correlation and relative mean squared error. For predicting species presence (setting thresholds to maximize Kappa), SGB had higher values for the majority of the species for specificity and Kappa, while GAMs had higher values for the majority of the species for sensitivity. In evaluating resultant AUC values, GAM and/or SGB models had significantly better results than the See5 models where significant differences could be detected between models. For nine out of 13 species, basal area prediction results for all modelling techniques were poor (correlations less than 0.5 and relative mean squared errors greater than 0.8), but SGB provided the most stable predictions in these instances. SGB and Cubist performed equally well for modelling basal area for three species with moderate prediction success

  17. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  18. A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams

    Institute of Scientific and Technical Information of China (English)

    Xue-Gang Hu; Pei-Pei Li; Xin-Dong Wu; Gong-Qing Wu

    2007-01-01

    Mining with streaming data is a hot topic in data mining. When performing classification on data streams,traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

  19. Induction of hybrid decision tree based on post-discretization strategy

    Institute of Scientific and Technical Information of China (English)

    WANG Limin; YUAN Senmiao

    2004-01-01

    By redefining test selection measure, we propose in this paper a new algorithm, Flexible NBTree, which induces a hybrid of decision tree and Naive Bayes. Flexible NBTree mitigates the negative effect of information loss on test selection by applying postdiscretization strategy: at each internal node in the tree, we first select the test which is the most useful for improving classification accuracy, then apply discretization of continuous tests. The finial decision tree nodes contain univariate splits as regular decision trees, but the leaves contain Naive Bayesian classifiers. To evaluate the performance of Flexible NBTree, we compare it with NBTree and C4.5, both applying pre-discretization of continuous attributes. Experimental results on a variety of natural domains indicate that the classification accuracy of Flexible NBTree is substantially improved.

  20. Visualization of Decision Tree State for the Classification of Parkinson's Disease

    NARCIS (Netherlands)

    Valentijn, E

    2016-01-01

    Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node wi

  1. A BOOSTING APPROACH FOR INTRUSION DETECTION

    Institute of Scientific and Technical Information of China (English)

    Zan Xin; Han Jiuqiang; Zhang Junjie; Zheng Qinghua; Han Chongzhao

    2007-01-01

    Intrusion detection can be essentially regarded as a classification problem,namely,distinguishing normal profiles from intrusive behaviors.This paper introduces boosting classification algorithm into the area of intrusion detection to learn attack signatures.Decision tree algorithm is used as simple base learner of boosting algorithm.Furthermore,this paper employs the Principle Component Analysis(PCA)approach,an effective data reduction approach,to extract the key attribute set from the original high-dimensional network traffic data.KDD CUP 99 data set is used in these exDeriments to demonstrate that boosting algorithm can greatly improve the clas.sification accuracy of weak learners by combining a number of simple"weak learners".In our experiments,the error rate of training phase of boosting algorithm is reduced from 30.2%to 8%after 10 iterations.Besides,this Daper also compares boosting algorithm with Support Vector Machine(SVM)algorithm and shows that the classification accuracy of boosting algorithm is little better than SVM algorithm's.However,the generalization ability of SVM algorithm is better than boosting algorithm.

  2. Decision trees are PAC-learnable from most product distributions: a smoothed analysis

    CERN Document Server

    Kalai, Adam Tauman

    2008-01-01

    We consider the problem of PAC-learning decision trees, i.e., learning a decision tree over the n-dimensional hypercube from independent random labeled examples. Despite significant effort, no polynomial-time algorithm is known for learning polynomial-sized decision trees (even trees of any super-constant size), even when examples are assumed to be drawn from the uniform distribution on {0,1}^n. We give an algorithm that learns arbitrary polynomial-sized decision trees for {\\em most product distributions}. In particular, consider a random product distribution where the bias of each bit is chosen independently and uniformly from, say, [.49,.51]. Then with high probability over the parameters of the product distribution and the random examples drawn from it, the algorithm will learn any tree. More generally, in the spirit of smoothed analysis, we consider an arbitrary product distribution whose parameters are specified only up to a [-c,c] accuracy (perturbation), for an arbitrarily small positive constant c.

  3. CLOUD DETECTION BASED ON DECISION TREE OVER TIBETAN PLATEAU WITH MODIS DATA

    Directory of Open Access Journals (Sweden)

    L. Xu

    2012-07-01

    Full Text Available Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  4. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Science.gov (United States)

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  5. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  6. Applying decision tree models to SMEs: A statistics-based model for customer relationship management

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-07-01

    Full Text Available Customer Relationship Management (CRM has been an important part of enterprise decision-making and management. In this regard, Decision Tree (DT models are the most common tools for investigating CRM and providing an appropriate support for the implementation of CRM systems. Yet, this method does not yield any estimate of the degree of separation of different subgroups involved in analysis. In this research, we compute three decision-making models in SMEs, analyzing different decision tree methods (C&RT, C4.5 and ID3. The methods are then used to compute ME and VoE for the models and they were then used to calculate the Mean Errors (ME and Variance of Errors (VoE estimates to investigate the predictive power of these methods. These decision tree methods were used to analyze small- and medium-sized enterprises (SME’s datasets. The paper proposes a powerful technical support for better directing market tends and mining in CRM. According to the findings, C&RT shows a better degree of separation. As a result, we recommend using decision tree methods together with ME and VoE to determine CRM factors.

  7. Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules

    Directory of Open Access Journals (Sweden)

    G.MeeraGandhi

    2010-11-01

    Full Text Available In the era of information society, computer networks and their related applications are the emerging technologies. Network Intrusion Detection aims at distinguishing the behavior of the network. As the network attacks have increased in huge numbers over the past few years, Intrusion Detection System (IDS is increasingly becoming a critical component to secure the network. Owing to large volumes of security audit data in a network in addition to intricate and vibrant properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, the field of machine learning attempts to characterize how such changes can occur by designing, implementing, running, and analyzing algorithms that can be run on computers. The discipline draws on ideas, with the goal of understanding the computational character of learning. Learning always occurs in the context of some performance task, and that a learning method should always be coupled with a performance element that uses the knowledge acquired during learning. In this research, machine learning is being investigated as a technique for making the selection, using as training data and their outcome. In this paper, we evaluate the performance of a set of classifier algorithms of rules (JRIP, Decision Tabel, PART, and OneR and trees (J48, RandomForest, REPTree, NBTree. Based on the evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The empirical simulation result shows the comparison between the noticeable performance improvements. The classification models were trained using the data collected from Knowledge Discovery Databases (KDD for Intrusion Detection. The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The

  8. Post-event human decision errors: operator action tree/time reliability correlation

    Energy Technology Data Exchange (ETDEWEB)

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  9. An Efficient Method of Vibration Diagnostics For Rotating Machinery Using a Decision Tree

    Directory of Open Access Journals (Sweden)

    Bo Suk Yang

    2000-01-01

    Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.

  10. One-year renal graft survival prediction using a weighted decision tree classifier

    Directory of Open Access Journals (Sweden)

    Dalia Atallah

    2014-06-01

    Full Text Available This study introduces a weighted decision tree algorithm for prediction of graft survival in renal transplantation using preoperative patient's data. The objective was to identify the preoperative attributes that affect the graft survival. Between the years 2000-2009, renal allotransplantation was carried out for 889 patients at Urology and Nephrology Center which is the subject matter of this study. The ID3 algorithm was chosen to build up the decision tree using the weka machine learning software. A modification was made on ID3 to refine the results. A weighted vector was introduced. The element of such a vector represents the weight of each attribute which was obtained by trial and error. The results indicated that the weighted algorithm was successful in predicting the graft survival after one year and identifying the attributes affecting graft survival. Keywords: Decision Tree, Data Mining, ID3 Algorithm, Graft Survival, Kidney Transplantation.

  11. Post-event human decision errors: operator action tree/time reliability correlation

    International Nuclear Information System (INIS)

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations

  12. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  13. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  14. Decision tree for the binding of dipeptides to the thermally fluctuating surface of cathepsin K

    Science.gov (United States)

    Nishiyama, Katsuhiko

    2016-03-01

    The behavior of 15 dipeptides on thermally fluctuating cathepsin K was investigated by molecular dynamics and docking simulations. Four dipeptides were distributed on sites near the active center, and the variations were small. Eleven dipeptides were distributed on sites far from the active center, and the variations were large for nine dipeptides and very large for the other two. The decision tree was constructed using genetic programming, and it accurately classified the 15 dipeptides. The decision tree would accurately estimate the behavior of various peptides, and should significantly contribute to the design of useful peptides.

  15. A similarity study between the query mass and retrieved masses using decision tree content-based image retrieval (DTCBIR) CADx system for characterization of ultrasound breast mass images

    Science.gov (United States)

    Cho, Hyun-Chong; Hadjiiski, Lubomir; Chan, Heang-Ping; Sahiner, Berkman; Helvie, Mark; Paramagul, Chintana; Nees, Alexis V.

    2012-03-01

    We are developing a Decision Tree Content-Based Image Retrieval (DTCBIR) CADx scheme to assist radiologists in characterization of breast masses on ultrasound (US) images. Three DTCBIR configurations, including decision tree with boosting (DTb), decision tree with full leaf features (DTL), and decision tree with selected leaf features (DTLs) were compared. For DTb, the features of a query mass were combined first into a merged feature score and then masses with similar scores were retrieved. For DTL and DTLs, similar masses were retrieved based on the Euclidean distance between the feature vector of the query and those of the selected references. For each DTCBIR configuration, we investigated the use of the full feature set and the subset of features selected by the stepwise linear discriminant analysis (LDA) and simplex optimization method, resulting in six retrieval methods. Among the six methods, we selected five, DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, for the observer study. For a query mass, three most similar masses were retrieved with each method and were presented to the radiologists in random order. Three MQSA radiologists rated the similarity between the query mass and the computer-retrieved masses using a ninepoint similarity scale (1=very dissimilar, 9=very similar). For DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, the average Az values were 0.90+/-0.03, 0.85+/-0.04, 0.87+/-0.04, 0.79+/-0.05 and 0.71+/-0.06, respectively, and the average similarity ratings were 5.00, 5.41, 4.96, 5.33 and 5.13, respectively. Although the DTb measures had the best classification performance among the DTCBIRs studied, and DTLs had the worst performance, DTLs-full obtained higher similarity ratings than the DTb measures.

  16. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision treeapproach for predicting smokers' quit intentions usingthe data from the International Tobacco Control FourCountry Survey. Three rule-based classification modelsare generated from three data sets using attributes inrelation to demographics, warning labels, and smokers'beliefs. Both demographic attributes and warning labelattributes are important in predicting smokers' quitintentions. The model's ability to predict smokers' quitintentions is enhanced, if the attributes regardingsmokers' internal motivation and beliefs about quittingare included.

  17. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    OpenAIRE

    Oral, L. O.; V. Tecim

    2013-01-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from h...

  18. Assisting Sustainable Forest Management and Forest Policy Planning with the Sim4Tree Decision Support System

    OpenAIRE

    Floris Dalemans; Paul Jacxsens; Jos Van Orshoven; Vincent Kint; Pieter Moonen; Bart Muys

    2015-01-01

    As European forest policy increasingly focuses on multiple ecosystem services and participatory decision making, forest managers and policy planners have a need for integrated, user-friendly, broad spectrum decision support systems (DSS) that address risks and uncertainties, such as climate change, in a robust way and that provide credible advice in a transparent manner, enabling effective stakeholder involvement. The Sim4Tree DSS has been accordingly developed as a user-oriented, modular and...

  19. EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE

    Directory of Open Access Journals (Sweden)

    S. Anupama Kumar

    2011-07-01

    Full Text Available Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps were taken to improve the performance of the students who were predicted to fail. After the declaration of the results in the final examination the marks obtained by the students are fed into the system and the results were analyzed. The comparative analysis of the results states that the prediction has helped the weaker students to improve and brought out betterment in the result. To analyse the accuracy of the algorithm, it is compared with ID3 algorithm and found to be more efficient in terms of the accurately predicting the outcome of the student and time taken to derive the tree. Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps

  20. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  1. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Yanfeng Hong; Kun Deng

    2006-01-01

    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  2. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  3. Comparison of Attribute Reduction Methods for Coronary Heart Disease Data by Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    ZHENG Gang; HUANG Yalou; WANG Pengtao; SHU Guangfu

    2005-01-01

    Attribute reduction is necessary in decision making system. Selecting right attribute reduction method is more important. This paper studies the reduction effects of principal components analysis (PCA) and system reconstruction analysis (SRA) on coronary heart disease data. The data set contains 1723 records, and 71 attributes in each record. PCA and SRA are used to reduce attributes number (less than 71 ) in the data set. And then decision tree algorithms, C4.5, classification and regression tree ( CART), and chi-square automatic interaction detector ( CHAID), are adopted to analyze the raw data and attribute reduced data. The parameters of decision tree algorithms, including internal node number, maximum tree depth, leaves number, and correction rate are analyzed. The result indicates that, PCA and SRA data can complete attribute reduction work,and the decision-making rate on the reduced data is quicker than that on the raw data; the reduction effect of PCA is better than that of SRA, while the attribute assertion of SRA is better than that of PCA. PCA and SRA methods exhibit goodperformance in selecting and reducing attributes.

  4. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    Science.gov (United States)

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  5. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    like buildings, roads, grassland, trees, hedges, and walls from such an ‘intelligent’ point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using......A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects...... of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes ‘building’ (99%, 95% CI: 95%-100%) and ‘road and parking lot’ (90%, 95% CI: 83%-95%). Some...

  6. Ultrasonographic diagnosis of biliary atresia based on a decision-making tree model

    Energy Technology Data Exchange (ETDEWEB)

    Lee, So Mi; Cheon, Jung Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun Hye; Kim, In One; You, Sun Kyoung [Dept. of Radiology, Seoul National University College of Medicine, Seoul (Korea, Republic of)

    2015-12-15

    To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.

  7. Predicting metabolic syndrome using decision tree and support vector machine methods

    Science.gov (United States)

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-01-01

    BACKGROUND Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. METHODS This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. RESULTS SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. CONCLUSION The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in

  8. The Legacy of Past Tree Planting Decisions for a City Confronting Emerald Ash Borer (Agrilus planipennis) Invasion

    OpenAIRE

    Greene, Christopher S.; Millward, Andrew A.

    2016-01-01

    Management decisions grounded in ecological understanding are essential to the maintenance of a healthy urban forest. Decisions about where and what tree species to plant have both short and long-term consequences for the future function and resilience of city trees. Through the construction of a theoretical damage index, this study examines the legacy effects of a street tree planting program in a densely populated North American city confronting an invasion of emerald ash borer (Agrilus pla...

  9. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    Science.gov (United States)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  10. COMBINING DECISION TREES AND K-NN FOR CASE-BASED PLANNING

    Directory of Open Access Journals (Sweden)

    Sofia Benbelkacem

    2014-11-01

    Full Text Available In everyday life, we are often faced with similar problems which we resolve with our experience. Case-based reasoning is a paradigm of problem solving based on past experience. Thus, case-based reasoning is considered as a valuable technique for the implementation of various tasks involving solving planning problem. Planning is considered as a decision support process designed to provide resources and required services to achieve specific objectives, allowing the selection of a better solution among several alternatives. However, we propose to exploit decision trees and k-NN combination to choose the most appropriate solutions. In a previous work [1], we have proposed a new planning approach guided by case-based reasoning and decision tree, called DTR, for case retrieval. In this paper, we use a classifier combination for similarity calculation in order to select the best solution to the target case. Thus, the use of the decision trees and k-NN combination allows improving the relevance of results and finding the most relevant cases.

  11. Analisis Dan Perancangan Sistem Pendukung Keputusan Untuk Menghindari Kredit Macet (Non Performing Loan) Perbankan Menggunakan Algoritma Decision Tree

    OpenAIRE

    Sinuhaji, Andika Rafon

    2010-01-01

    A model of decision maker is needed to help people, especially to make a decission accurate, efficient, and effective, the model called decision support system. The aim of decision support system is to utilize the advantages of human and electronic instrument for solving various unstructured problems. The objective of this study is to avoid non performing loan credit in the proces of granting credit facility. Decision of the study by using decision tree method. The solution method consist of...

  12. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  13. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    CERN Document Server

    Rajendran, P

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper. The frequent patterns from the CT scan images are generated by frequent pattern tree (FP-Tree) algorithm that mines the association rules. The decision tree method has been used to classify the medical images for diagnosis. This system enhances the classification process to be more accurate. The hybrid method improves the efficiency of the proposed method than the traditional image mining methods. The experimental result on prediagnosed database of brain images showed 97% sensitivity and 95% accuracy respectively. The ph...

  14. Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in

  15. A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining

    CERN Document Server

    Kadampur, Mohammad Ali

    2010-01-01

    Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their enterprises. These data sets typically contain sensitive individual information, which consequently get exposed to the other parties. Though we cannot deny the benefits of knowledge discovery that comes through data mining, we should also ensure that data privacy is maintained in the event of data mining. Privacy preserving data mining is a specialized activity in which the data privacy is ensured during data mining. Data privacy is as important as the extracted knowledge and efforts that guarantee data privacy during data mining are encouraged. In this paper we propose a strategy that protects the data privacy during decision tree analysis of data mining process. We propose to add specific noise to the numeric attributes after exploring the decision tree of the original data. T...

  16. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  17. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  18. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  19. Intrusion Preventing System using Intrusion Detection System Decision Tree Data Mining

    Directory of Open Access Journals (Sweden)

    Syurahbil

    2009-01-01

    Full Text Available Problem statement: To distinguish the activities of the network traffic that the intrusion and normal is very difficult and to need much time consuming. An analyst must review all the data that large and wide to find the sequence of intrusion on the network connection. Therefore, it needs a way that can detect network intrusion to reflect the current network traffics. Approach: In this study, a novel method to find intrusion characteristic for IDS using decision tree machine learning of data mining technique was proposed. Method used to generate of rules is classification by ID3 algorithm of decision tree. Results: These rules can determine of intrusion characteristics then to implement in the firewall policy rules as prevention. Conclusion: Combination of IDS and firewall so-called the IPS, so that besides detecting the existence of intrusion also can execute by doing deny of intrusion as prevention.

  20. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    Science.gov (United States)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  1. Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems

    International Nuclear Information System (INIS)

    In this thesis, I studied sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. I investigated on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. In doing so, I obtained a consistent and anytime algorithm, that can easily be combined with existing strong heuristic solvers. (author)

  2. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  3. Hyper-Graph Based Documents Categorization on Knowledge from Decision Trees

    Directory of Open Access Journals (Sweden)

    Merjulah Roby

    2012-03-01

    Full Text Available This document has devised a novel representation that compactly captures a Hyper-graph Partitioning and Clustering of the documents based on the weightages. The approach we take integrates data mining and decision making to improve the effectiveness of the approach, we also present a NeC4.5 decision trees. This algorithm is creating the cluster and sub clusters according to the user query. This project is forming sub clustering in the database. Some of the datas in the database may be efficient one, so we are clustering the datas depending upon the ability.

  4. Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal; Tunç, Murat Mustafa; Tunc, Murat Mustafa

    2012-01-01

    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding ...

  5. A Decision Tree Approach to Classify Web Services using Quality Parameters

    OpenAIRE

    Sonawani, Shilpa; Mukhopadhyay, Debajyoti

    2013-01-01

    With the increase in the number of web services, many web services are available on internet providing the same functionality, making it difficult to choose the best one, fulfilling users all requirements. This problem can be solved by considering the quality of web services to distinguish functionally similar web services. Nine different quality parameters are considered. Web services can be classified and ranked using decision tree approach since they do not require long training period and...

  6. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    OpenAIRE

    Yang Youjin; Gu Bokyung; Yoon Taeseon

    2016-01-01

    Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  7. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    Directory of Open Access Journals (Sweden)

    Yang Youjin

    2016-01-01

    Full Text Available Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  8. An Examination of Mathematically Gifted Students' Learning Styles by Decision Trees

    OpenAIRE

    Esra Aksoy; Serkan Narlı

    2015-01-01

    The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all t...

  9. Independent Component Analysis and Decision Trees for ECG Holter Recording De-Noising

    OpenAIRE

    Jakub Kuzilek; Vaclav Kremen; Filip Soucek; Lenka Lhotska

    2014-01-01

    We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA). This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE) between origin...

  10. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense

    OpenAIRE

    Pedersen, Ted

    2001-01-01

    This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

  11. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    OpenAIRE

    Tran Hoai Linh; Pham Van Nam; Vuong Hoang Nam

    2014-01-01

    The paper presents a new system for ECG (ElectroCardioGraphy) signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron), modified TSK (Takagi-Sugeno-Kang) and the SVM (Support Vector Machine), will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to...

  12. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

    Science.gov (United States)

    Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

    2012-05-30

    The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release.

  13. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    Science.gov (United States)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  14. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  15. Teratozoospermia Classification Based on the Shape of Sperm Head Using OTSU Threshold and Decision Tree

    Directory of Open Access Journals (Sweden)

    Masdiyasa I Gede Susrama

    2016-01-01

    Full Text Available Teratozoospermia is one of the results of expert analysis of male infertility, by conducting lab tests microscopically to determine the morphology of spermatozoa, one of which is the normal and abnormal form of the head of spermatozoa. The laboratory test results are in the form of a complete image of spermatozoa. In this study, the shape of the head of spermatozoa was taken from a WHO standards book. The pictures taken had a fairly clear imaging and still had noise, thus to differentiate between the head of normal and abnormal spermatozoa, several processes need to be performed, which include: a pre-process or image adjusting, a threshold segmentation process using Otsu threshold method, and a classification process using a decision tree. Training and test data are presented in stages, from 5 to 20 data. Test results of using Otsu segmentation and a decision tree produced different errors in each level of training data, which were 70%, 75%, and 80% for training data of size 5×2, 10×2, and 20×2, respectively, with an average error of 75%. Thus, this study of using Otsu threshold segmentation and a Decision Tree can classify the form of the head of spermatozoa as abnormal or Normal

  16. A decision tree – based method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Science.gov (United States)

    Pavlopoulos, Sotiris A; Stasis, Antonis CH; Loukis, Euripides N

    2004-01-01

    Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS) and "clear" Mitral Regurgitation (MR) using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that describe S1, S2 and the systolic

  17. A decision tree – based method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  18. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  19. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard;

    2013-01-01

    integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy...... hypothetical conditions influencing their trip decisions, covering the duration of fishing time, choice of fishing ground(s), when to stop fishing and return to port, and the choice of the port for landing. Fleet-based energy and economy efficiency are linked to the decision (choice) dynamics. Larger fuel...... efficiency for the value of catch per unit of fuel consumed is analysed by merging the questionnaire, logbook and VMS (vessel monitoring system) information. Logic decision trees and conditional behaviour probabilities are established from the responses of fishermen regarding a range of sequential...

  20. Effective use of Fibro Test to generate decision trees in hepatitis C

    Institute of Scientific and Technical Information of China (English)

    Dana Lau-Corona; Luís Alberto Pineda; Héctor Hugo Aviés; Gabriela Gutiérrez-Reyes; Blanca Eugenia Farfan-Labonne; Rafael Núnez-Nateras; Alan Bonder; Rosalinda Martínez-García; Clara Corona-Lau; Marco Antonio Olivera-Martíanez; Maria Concepción Gutiérrez-Ruiz; Guillermo Robles-Díaz; David Kershenobich

    2009-01-01

    AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein,haptoglobin, α2 macroglobulin, and γ-glutamyl FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of F0 and F4 were classified with very high accuracy (18/20 for F0, 9/9 for F0-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in F0 and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression.transpeptidase were used as predictors, and the FibroTest

  1. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    Science.gov (United States)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  2. Performance Evaluation of Discriminant Analysis and Decision Tree, for Weed Classification of Potato Fields

    Directory of Open Access Journals (Sweden)

    Farshad Vesali

    2012-09-01

    Full Text Available In present study we tried to recognizing weeds in potato fields to effective use from herbicides. As we know potato is one of the crops which is cultivated vastly all over the world and it is a major world food crop that is consumed by over one billion people world over, but it is threated by weed invade, because of row cropping system applied in potato tillage. Machine vision is used in this research for effective application of herbicides in field. About 300 color images from 3 potato farms of Qorveh city and 2 farms of Urmia University-Iran, was acquired. Images were acquired in different illumination condition from morning to evening in sunny and cloudy days. Because of overlap and shading of plants in farm condition it is hard to use morphologic parameters. In method used for classifying weeds and potato plants, primary color components of each plant were extracted and the relation between them was estimated for determining discriminant function and classifying plants using discrimination analysis. In addition the decision tree method was used to compare results with discriminant analysis. Three different classifications were applied: first, Classification was applied to discriminate potato plant from all other weeds (two groups, the rate of correct classification was 76.67% for discriminant analysis and 83.82% for decision tree; second classification was applied to discriminate potato plant from separate groups of each weed (6 groups, the rate of correct classification was 87%. And the third, Classification of potato plant versus weed species one by one. As the weeds were different, the results of classification were different in this composition. The decision tree in all conditions showed the better result than discriminant analysis.

  3. Decision Tree Complexity of Graph Properties with Dimension at Most 5

    Institute of Scientific and Technical Information of China (English)

    高随祥; 林国辉

    2000-01-01

    A graph property is a set of graphs such that if the set contains some graph G then it also contains each isomorphic copy of G (with the same vertex set). A graph property P on n vertices is said to be elusive, if every decision tree algorithm recognizing P must examine all n(n - 1)/2 pairs of vertices in the worst case. Karp conjectured that every nontrivial monotone graph property is elusive. In this paper, this conjecture is proved for some cases. Especially, it is shown that if the abstract simplicial complex of a nontrivial monotone graph property P has dimension not exceeding 5, then P is elusive.

  4. A comprehensive decision approach for rubber tree planting management in Africa

    OpenAIRE

    Valognes, Fabrice; Ferrer, Hélène; Diaby, Moussa; Clément-Demange, André

    2011-01-01

    International audience The main objective of this study is to settle a rigorous field of decision analysis for rubber tree clones selection. Nowadays, there does not exist any process based upon a rigorous method to select the best clone to be plant in order to get the highest return on investment. The only known selection method is to use the experience of different protagonists acting in the plantation. So, we need a tool that takes into account very important criteria in order to achiev...

  5. Use of decision trees for evaluating severe accident management strategies in nuclear power plants

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moosung [Hanyang Univ., Seoul (Korea, Republic of). Dept. of Nuclerar Engineering; Lee, Yongjin; Jerng, Dong Wook [Chung-Ang Univ., Seoul (Korea, Republic of). School of Energy Systems Engineering

    2016-07-15

    Accident management strategies are defined to innovative actions taken by plant operators to prevent core damage or to maintain the sound containment integrity. Such actions minimize the chance of offsite radioactive substance leaks that lead to and intensify core damage under power plant accident conditions. Accident management extends the concept of Defense in Depth against core meltdown accidents. In pressurized water reactors, emergency operating procedures are performed to extend the core cooling time. The effectiveness of Severe Accident Management Guidance (SAMG) became an important issue. Severe accident management strategies are evaluated with a methodology utilizing the decision tree technique.

  6. Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    Jing-Fen Zhang; Si-Min He; Jin-Jin Cai; Xing-Jun Cao; Rui-Xiang Sun; Yan Fu; Rong Zeng; Wen Gao

    2005-01-01

    In this study, we present a preprocessing method for quadrupole time-of-flight(Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.

  7. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe;

    2014-01-01

    With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...... with outlier identification show high accuracy in the presence of variance and uncertainties due to wind power generation and other dispersed generation units. The performance of this approach is demonstrated on the operational model of western Danish power system with the scale of around 200 lines and 400...

  8. Improvement and analysis of ID3 algorithm in decision-making tree

    Science.gov (United States)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  9. A Decision Tree-Structured Algorithm of Speaker Adaptation Based on Gaussian Similarity Analysis

    Institute of Scientific and Technical Information of China (English)

    WU Ji; WANG Zuoying

    2001-01-01

    Gaussian Similarity Analysis (GSA)algorithm can be used to estimate the similarity between two Gaussian distributed variables with full covariance matrix. Based on this algorithm, we propose a method in speaker adaptation of covariance. It is different from the traditional algorithms, which mainly focus on the adaptation of mean vector of state observation probability density. A binary decision tree is constructed offline with the similarity measure and the adaptation procedure is data-driven. It can be shown from the experiments that we can get a significant further improvement over the mean vectors adaptation.

  10. Decision support for mitigating the risk of tree induced transmission line failure in utility rights-of-way.

    Science.gov (United States)

    Poulos, H M; Camp, A E

    2010-02-01

    Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.

  11. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  12. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    Directory of Open Access Journals (Sweden)

    Malueka Rusdy

    2012-03-01

    Full Text Available Abstract Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.

  13. Decision tree analysis of factors influencing rainfall-related building damage

    Directory of Open Access Journals (Sweden)

    M. H. Spekkers

    2014-04-01

    Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of

  14. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  15. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  16. Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis

    Directory of Open Access Journals (Sweden)

    Åhlfeldt Hans

    2008-09-01

    Full Text Available Abstract Background The guideline for postmastectomy radiotherapy (PMRT, which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Methods Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI found patterns of non-compliance with the guideline. The PMRT guideline was used to separate cases according to the recommendation to receive or not receive PMRT. The two groups of patients were analyzed separately. Resulting patterns were transformed into rules that were then compared with the reasons that were extracted by manual inspection of records for the non-compliant cases. Results Analyzing patients in the group who should receive PMRT according to the guideline did not result in a robust decision tree. However, classification of the other group, patients who should not receive PMRT treatment according to the guideline, resulted in a tree with nine leaves and three of them were representing non-compliance with the guideline. In a comparison between rules resulting from these three non-compliant patterns and manual inspection of patient records, the following was found: In the decision tree, presence of perigland growth is the most important variable followed by number of malignantly invaded lymph nodes and level of Progesterone receptor. DNA index, age, size of the tumor and level of Estrogen receptor are also involved but with less importance. From manual inspection of the cases, the most frequent pattern for non-compliance is age above the threshold followed by near cut-off values for risk factors and unknown reasons. Conclusion Comparison of patterns of non-compliance acquired from data mining and manual inspection of patient records demonstrates that not all of the non-compliances are repetitive or important. There

  17. Decision trees for evaluating skin and respiratory sensitizing potential of chemicals in accordance with European regulations.

    Science.gov (United States)

    Selgrade, Maryjane K; Sullivan, Katherine S; Boyles, Rebecca R; Dederick, Elizabeth; Serex, Tessa L; Loveless, Scott E

    2012-08-01

    Guidance for determining the sensitizing potential of chemicals is available in EC Regulation No. 1272/2008 Classification, Labeling, and Packaging of Substances; REACH guidance from the European Chemicals Agency; and the United Nations Globally Harmonized System (GHS). We created decision trees for evaluating potential skin and respiratory sensitizers. Our approach (1) brings all the regulatory information into one brief document, providing a step-by-step method to evaluate evidence that individual chemicals or mixtures have sensitizing potential; (2) provides an efficient, uniform approach that promotes consistency when evaluations are done by different reviewers; (3) provides a standard way to convey the rationale and information used to classify chemicals. We applied this approach to more than 50 chemicals distributed among 11 evaluators with varying expertise. Evaluators found the decision trees easy to use and recipients (product stewards) of the analyses found that the resulting documentation was consistent across users and met their regulatory needs. Our approach allows for transparency, process management (e.g., documentation, change management, version control), as well as consistency in chemical hazard assessment for REACH, EC Regulation No. 1272/2008 Classification, Labeling, and Packaging of Substances and the GHS. PMID:22584521

  18. Snow event classification with a 2D video disdrometer - A decision tree approach

    Science.gov (United States)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  19. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  20. Diagnostic Features of Common Oral Ulcerative Lesions: An Updated Decision Tree

    Science.gov (United States)

    Safi, Yaser

    2016-01-01

    Diagnosis of oral ulcerative lesions might be quite challenging. This narrative review article aims to introduce an updated decision tree for diagnosing oral ulcerative lesions on the basis of their diagnostic features. Various general search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of MeSH keywords such as “oral ulcer,” “stomatitis,” and “mouth diseases.” Thereafter, English-language articles published since 1983 to 2015 in both medical and dental journals including reviews, meta-analyses, original papers, and case reports were appraised. Upon compilation of the relevant data, oral ulcerative lesions were categorized into three major groups: acute, chronic, and recurrent ulcers and into five subgroups: solitary acute, multiple acute, solitary chronic, multiple chronic, and solitary/multiple recurrent, based on the number and duration of lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by stepwise progression. PMID:27781066

  1. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    Directory of Open Access Journals (Sweden)

    Wan-Yu Chang

    2015-09-01

    Full Text Available In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  2. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    Directory of Open Access Journals (Sweden)

    Jörg Huwyler

    2012-08-01

    Full Text Available Predicting blood-brain barrier (BBB permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction on both descriptor sets. Models with a corrected classification rate (CCR of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP and charge (polar surface area, which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  3. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  4. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  5. The Legacy of Past Tree Planting Decisions for a City Confronting Emerald Ash Borer (Agrilus planipennis Invasion

    Directory of Open Access Journals (Sweden)

    Christopher Sean Greene

    2016-03-01

    Full Text Available Management decisions grounded in ecological understanding are essential to the maintenance of a healthy urban forest. Decisions about where and what tree species to plant have both short and long-term consequences for the future function and resilience of city trees. Through the construction of a theoretical damage index, this study examines the legacy effects of a street tree planting program in a densely populated North American city confronting an invasion of emerald ash borer (Agrilus planipennis. An investigation of spatial autocorrelation for locations of high damage potential across the City of Toronto, Canada was then conducted using Getis-Ord Gi*. Significant spatial clustering of high damage index values affirmed that past urban tree planting practices placing little emphasis on species diversity have created time-lagged consequences of enhanced vulnerability of trees to insect pests. Such consequences are observed at the geographically local scale, but can easily cascade to become multi-scalar in their spatial extent. The theoretical damage potential index developed in this study provides a framework for contextualizing historical urban tree planting decisions where analysis of damage index values for Toronto reinforces the importance of urban forest management that prioritizes proactive tree planting strategies that consider species diversity in the context of planting location.

  6. The effect of the fragmentation problem in decision tree learning applied to the search for single top quark production

    International Nuclear Information System (INIS)

    Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.

  7. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    Directory of Open Access Journals (Sweden)

    B. Gessesse

    2015-11-01

    Full Text Available Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001. Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  8. Trees

    Science.gov (United States)

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  9. A decision-making framework for protecting process plants from flooding based on fault tree analysis

    International Nuclear Information System (INIS)

    The protection of process plants from external events is mandatory in the Seveso Directive. Among these events figures the possibility of inundation of a plant, which may cause a hazard by disabling technical components and obviating operator interventions. A methodological framework for dealing with hazards from potential flooding events is presented. It combines an extension of the fault tree method with generic properties of flooding events in rivers and of dikes, which should be adapted to site-specific characteristics in a concrete case. Thus, a rational basis for deciding whether upgrading is required or not and which of the components should be upgraded is provided. Both the deterministic and the probabilistic approaches are compared. Preference is given to the probabilistic one. The conclusions drawn naturally depend on the scope and detail of the model calculations and the decision criterion adopted. The latter has to be supplied from outside the analysis, e.g. by the analyst himself, the plant operator or the competent authority. It turns out that decision-making is only viable if the boundary conditions for both the procedure of analysis and the decision criterion are clear.

  10. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees

    Science.gov (United States)

    Sayegh, Arwa; Tate, James E.; Ropkins, Karl

    2016-02-01

    Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper

  11. Interpreting Tree Ensembles with inTrees

    OpenAIRE

    Deng, Houtao

    2014-01-01

    Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from a tree ensemble, and calculates frequent variable interactions. An rule-based learner, referred to as the simplified tree ensemble learner (STEL), can also be formed and used for future prediction. The inTrees framework can applied to both classification an...

  12. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  13. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  14. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data.With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM hi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  15. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Meisam Azad-Manjiri

    2014-04-01

    Full Text Available Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory and uses decision tree algorithm to abstract relationships between ethical principles and morality of actions. We apply this architecture to build an agent that provides guidance to health care workers faced with ethical dilemmas. Our results show that the agent is able to learn ethic well.

  16. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  17. Simulation of human behavior elements in a virtual world using decision trees

    Directory of Open Access Journals (Sweden)

    Sandra Mercado Pérez

    2013-05-01

    Full Text Available Human behavior refers to the way an individual responds to certain events or occurrences, naturally cannot predict how an individual can act, for it the computer simulation is used. This paper presents the development of the simulation of five possible human reactions within a virtual world, as well as the steps needed to create a decision tree that supports the selection of any of any of these reactions. For that creation it proposes three types of attributes, those are the personality, the environment and the level of reaction. The virtual world Second Life was selected because of its internal programming language LSL (Linden Scripting Language which allows the execution of predefined animation sequences or creates your own.

  18. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe;

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... in DIgSILENT PowerFactory environment and applied to western Danish Power System which is passing through a phase of major transformation. The results have shown that phasing out of central power plants coupled with large scale wind energy integration and more dependence on international ties can have...... Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented...

  19. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling,respectively to generate knowledge from training data.With these methods,building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach.The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area,Zhejiang Province,China using TM bi-temporal imageries and GIS data.To evaluate the performance of the resultant knowledge bases,the classification results were compared to existing soil map based on field survey.The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  20. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    Directory of Open Access Journals (Sweden)

    Tran Hoai Linh

    2014-09-01

    Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution

  1. A Decision Tree Based Pedometer and its Implementation on the Android Platform

    Directory of Open Access Journals (Sweden)

    Juanying Lin

    2015-02-01

    Full Text Available This paper describes a decision tree (DT based ped ometer algorithm and its implementation on Android. The DT- based pedometer can classify 3 gai t patterns, including walking on level ground (WLG, up stairs (WUS and down stairs (WDS . It can discard irrelevant motion and count user’s steps accurately. The overall classifi cation accuracy is 89.4%. Accelerometer, gyroscope and magnetic field sensors are used in th e device. When user puts his/her smart phone into the pocket, the pedometer can automatica lly count steps of different gait patterns. Two methods are tested to map the acceleration from mobile phone’s reference frame to the direction of gravity. Two significant features are employed to classify different gait patterns.

  2. Decision tree method applied to computerized prediction of ternary intermetallic compounds

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Decision tree method and atomic parameters were used to find the regularities of the formation of ternary intermetallic compounds in alloy systems. The criteria of formation can be expressed by a group of inequalities with two kinds of atomic parameters Zl (number of valence electrons in the atom of constituent element) and Ri/Rj (ratio of the atomic radius of constituent element i and j) as independent variables. The data of 2238 known ternary alloy systems were used to extract the empirical rules governing the formation of ternary intermetallic compounds, and the facts of ternary compound formation of other 1334 alloy systems were used as samples to test the reliability of the empirical criteria found. The rate of correctness of prediction was found to be nearly 95%. An expert system for ternary intermetallic compound formation was built and some prediction results of the expert system were confirmed.

  3. K-D Decision Tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier

    Science.gov (United States)

    Shibata, Tomoyuki; Wada, Toshikazu

    This paper presents a novel algorithm for Nearest Neighbor (NN) classifier. NN classification is a well-known method of pattern classification having the following properties: * it performs maximum-margin classification and achieves less than twice the ideal Bayesian error, * it does not require knowledge of pattern distributions, kernel functions or base classifiers, and * it can naturally be applied to multiclass classification problems. Among the drawbacks are A) inefficient memory use and B) ineffective pattern classification speed. This paper deals with the problems A and B. In most cases, NN search algorithms, such as k-d tree, are employed as a pattern search engine of the NN classifier. However, NN classification does not always require the NN search. Based on this idea, we propose a novel algorithm named k-d decision tree (KDDT). Since KDDT uses Voronoi-condensed prototypes, it consumes less memory than naive NN classifiers. We have confirmed that KDDT is much faster than NN search-based classifier through a comparative experiment (from 9 to 369 times faster than NN search based classifier). Furthermore, in order to extend applicability of the KDDT algorithm to high-dimensional NN classification, we modified it by incorporating Gabriel editing or RNG editing instead of Voronoi condensing. Through experiments using simulated and real data, we have confirmed the modified KDDT algorithms are superior to the original one.

  4. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size.

  5. A Genetic Algorithm Optimized Decision Tree-SVM based Stock Market Trend Prediction System

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2010-12-01

    Full Text Available Prediction of stock market trends has been an area of great interest both to researchers attempting to uncover the information hidden in the stock market data and for those who wish to profit by trading stocks. The extremely nonlinear nature of the stock market data makes it very difficult to design a system that can predict the future direction of the stock market with sufficient accuracy. This work presents a data mining based stock market trend prediction system, which produces highly accurate stock market forecasts. The proposed system is a genetic algorithm optimized decision tree-support vector machine (SVM hybrid, which can predict one-day-ahead trends in stockmarkets. The uniqueness of the proposed system lies in the use ofthe hybrid system which can adapt itself to the changing market conditions and in the fact that while most of the attempts at stockmarket trend prediction have approached it as a regression problem, present study converts the trend prediction task into a classification problem, thus improving the prediction accuracysignificantly. Performance of the proposed hybrid system isvalidated on the historical time series data from the Bombaystock exchange sensitive index (BSE-Sensex. The system performance is then compared to that of an artificial neural network (ANN based system and a naïve Bayes based system. It is found that the trend prediction accuracy is highest for the hybrid system and the genetic algorithm optimized decision tree- SVM hybrid system outperforms both the artificial neural network and the naïve bayes based trend prediction systems.

  6. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  7. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  8. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  9. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  10. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  11. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva;

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study...

  12. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    Science.gov (United States)

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  13. Model-Independent Evaluation of Tumor Markers and a Logistic-Tree Approach to Diagnostic Decision Support

    Directory of Open Access Journals (Sweden)

    Weizeng Ni

    2014-01-01

    Full Text Available Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models.

  14. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection

    Science.gov (United States)

    Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    Background WHO’s new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. Methods We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. Results 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. Conclusion The decision tree algorithm proposed

  15. Trees

    CERN Document Server

    Epstein, Henri

    2016-01-01

    An algebraic formalism, developped with V.~Glaser and R.~Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  16. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  17. A Fuzzy Optimization Technique for the Prediction of Coronary Heart Disease Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Persi Pamela. I

    2013-06-01

    Full Text Available Data mining along with soft computing techniques helps to unravel hidden relationships and diagnose diseases efficiently even with uncertainties and inaccuracies. Coronary Heart Disease (CHD is akiller disease leading to heart attack and sudden deaths. Since the diagnosis involves vague symptoms and tedious procedures, diagnosis is usually time-consuming and false diagnosis may occur. A fuzzy system is one of the soft computing methodologies is proposed in this paper along with a data mining technique for efficient diagnosis of coronary heart disease. Though the database has 76 attributes, only 14 attributes are found to be efficient for CHD diagnosis as per all the published experiments and doctors’ opinion. So only the essential attributes are taken from the heart disease database. From these attributes crisp rules are obtained by employing CART decision tree algorithm, which are then applied to the fuzzy system. A Particle Swarm Optimization (PSO technique is applied for the optimization of the fuzzy membership functions where the parameters of the membership functions are altered to new positions. The result interpreted from the fuzzy system predicts the prevalence of coronary heart disease and also the system’s accuracy was found to be good.

  18. A reduction approach to improve the quantification of linked fault trees through binary decision diagrams

    Energy Technology Data Exchange (ETDEWEB)

    Ibanez-Llano, Cristina, E-mail: cristina.ibanez@iit.upcomillas.e [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain); Rauzy, Antoine, E-mail: Antoine.RAUZY@3ds.co [Dassault Systemes, 10 rue Marcel Dassault CS 40501, 78946 Velizy Villacoublay, Cedex (France); Melendez, Enrique, E-mail: ema@csn.e [Consejo de Seguridad Nuclear (CSN), C/Justo Dorado 11, 28040 Madrid (Spain); Nieto, Francisco, E-mail: nieto@iit.upcomillas.e [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain)

    2010-12-15

    Over the last two decades binary decision diagrams have been applied successfully to improve Boolean reliability models. Conversely to the classical approach based on the computation of the MCS, the BDD approach involves no approximation in the quantification of the model and is able to handle correctly negative logic. However, when models are sufficiently large and complex, as for example the ones coming from the PSA studies of the nuclear industry, it begins to be unfeasible to compute the BDD within a reasonable amount of time and computer memory. Therefore, simplification or reduction of the full model has to be considered in some way to adapt the application of the BDD technology to the assessment of such models in practice. This paper proposes a reduction process based on using information provided by the set of the most relevant minimal cutsets of the model in order to perform the reduction directly on it. This allows controlling the degree of reduction and therefore the impact of such simplification on the final quantification results. This reduction is integrated in an incremental procedure that is compatible with the dynamic generation of the event trees and therefore adaptable to the recent dynamic developments and extensions of the PSA studies. The proposed method has been applied to a real case study, and the results obtained confirm that the reduction enables the BDD computation while maintaining accuracy.

  19. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  20. Decision tree for smart feature extraction from sleep HR in bipolar patients.

    Science.gov (United States)

    Migliorini, Matteo; Mariani, Sara; Bianchi, Anna M

    2013-01-01

    The aim of this work is the creation of a completely automatic method for the extraction of informative parameters from peripheral signals recorded through a sensorized T-shirt. The acquired data belong to patients affected from bipolar disorder, and consist of RR series, body movements and activity type. The extracted features, i.e. linear and non-linear HRV parameters in the time domain, HRV parameters in the frequency domain, and parameters indicative of the sleep quality, profile and fragmentation, are of interest for the automatic classification of the clinical mood state. The analysis of this dataset, which is to be performed online and automatically, must address the problems related to the clinical protocol, which also includes a segment of recording in which the patient is awake, and to the nature of the device, which can be sensitive to movements and misplacement. Thus, the decision tree implemented in this study performs the detection and isolation of the sleep period, the elimination of corrupted recording segments and the checking of the minimum requirements of the signals for every parameter to be calculated. PMID:24110866

  1. A Decision Tree Based Classifier to Analyze Human Ovarian Cancer cDNA Microarray Datasets.

    Science.gov (United States)

    Tsai, Meng-Hsiun; Wang, Hsin-Chieh; Lee, Guan-Wei; Lin, Yi-Chen; Chiu, Sheng-Hsiung

    2016-01-01

    Ovarian cancer is the deadliest gynaecological disease because of the high mortality rate and there is no any symptom in cancer early stage. It was often the terminal cancer period when patients were diagnosed with ovarian cancer and thus delays a good opportunity of treatment. The current common method for detecting ovarian cancer is blood testing for analyzing the tumor marker CA-125 of serum. However, specificity and sensitivity of CA-125 are insufficient for early detection. Therefore, it has become an urgent issue to look for an efficient method which precisely detects the tumor markers for ovarian cancer. This study aims to find the target genes of ovarian cancer by different algorithms of information science. Feature selection and decision tree were applied to analyze 9600 ovarian cancer-related genes. After screening the target genes, candidate genes will be analyzed by Ingenuity Pathway Analysis (IPA) software to create a genetic pathway model and to understand the interactive relationship in the different pathological stages of ovarian cancer. Finally, this research found 9 oncogenes associated with ovarian cancer and some genes had not been discovered in previous studies. This system will assist medical staffs in diagnosis and treatment at cancer early stage and improve the patient's survival. PMID:26531754

  2. Childhood Cancer-a Hospital based study using Decision Tree Techniques

    Directory of Open Access Journals (Sweden)

    K. Kalaivani

    2011-01-01

    Full Text Available Problem statement: Cancer is generally regarded as a disease of adults. But there being a higher proportion of childhood cancer (ALL-Acute Lymphoblastic Leukemia in India. The incidence of childhood cancer has increased over the last 25 years, but the increase is much larger in females. The aim was to increase our understanding of the determinants of south Indian parental reactions and needs. This facilitates the development of the care and follow-up routines for families, paying attention to both individual risk and resilience factors and to ways in which limitations related to treatment centre and organizational characteristics could be compensated. Approach: Decision Trees may be used for classification, clustering, affinity, grouping, prediction or estimation and description. One of the useful medical applications in India is the management of Leukemia, as it accounts for about 33% of childhood malignancies. Results: Female survivors showed greater functional disability in comparison to male survivors-demonstrated by poorer overall health status. Family stress results from a perceived imbalance between the demands on the family and the resources available to meet such demands. Conclusion: The pattern and severity of health and functional outcomes differed significantly between survivors in diagnostic subgroups. Family impact was aggravated by patients’ lasting sequelae and by parent perceived shortcomings of long-term follow-up. Female survivors were at greater risk for health related late effects.

  3. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    Science.gov (United States)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  4. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    Science.gov (United States)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, Pfood security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  5. Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    CERN Document Server

    Ball, N M; Myers, A D; Tcheng, D; Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-01-01

    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness an...

  6. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  7. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    Science.gov (United States)

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  8. Validation of probability equation and decision tree in predicting subsequent dengue hemorrhagic fever in adult dengue inpatients in Singapore.

    Science.gov (United States)

    Thein, Tun L; Leo, Yee-Sin; Lee, Vernon J; Sun, Yan; Lye, David C

    2011-11-01

    We developed a probability equation and a decision tree from 1,973 predominantly dengue serotype 1 hospitalized adult dengue patients in 2004 to predict progression to dengue hemorrhagic fever (DHF), applied in our clinic since March 2007. The parameters predicting DHF were clinical bleeding, high serum urea, low serum protein, and low lymphocyte proportion. This study validated these in a predominantly dengue serotype 2 cohort in 2007. The 1,017 adult dengue patients admitted to Tan Tock Seng Hospital, Singapore had a median age of 35 years. Of 933 patients without DHF on admission, 131 progressed to DHF. The probability equation predicted DHF with a sensitivity (Sn) of 94%, specificity (Sp) 17%, positive predictive value (PPV) 16%, and negative predictive value (NPV) 94%. The decision tree predicted DHF with a Sn of 99%, Sp 12%, PPV 16%, and NPV 99%. Both tools performed well despite a switch in predominant dengue serotypes.

  9. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  10. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    Institute of Scientific and Technical Information of China (English)

    Fang Ye; Zhi-Hua Chen; Jie Chen; Fang Liu; Yong Zhang; Qin-Ying Fan; Lin Wang

    2016-01-01

    Background:In the past decades,studies on infant anemia have mainly focused on rural areas of China.With the increasing heterogeneity of population in recent years,available information on infant anemia is inconclusive in large cities of China,especially with comparison between native residents and floating population.This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing.Methods:As useful methods to build a predictive model,Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia.A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31,2014.Results:The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics.The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia.Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy,exclusive breastfeeding in the first 6 months,and floating population,CHAID decision tree analysis also identified the fourth risk factor,the matemal educational level,with higher overall classification accuracy and larger area below the receiver operating characteristic curve.Conclusions:The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners.CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity.Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  11. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    Science.gov (United States)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  12. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  13. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    Science.gov (United States)

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  14. Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies

    Directory of Open Access Journals (Sweden)

    José António Figueiredo Almaça

    2013-09-01

    Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.

  15. Application of Random Forest Survival Models to Increase Generalizability of Decision Trees: A Case Study in Acute Myocardial Infarction

    Directory of Open Access Journals (Sweden)

    Iman Yosefian

    2015-01-01

    Full Text Available Background. Tree models provide easily interpretable prognostic tool, but instable results. Two approaches to enhance the generalizability of the results are pruning and random survival forest (RSF. The aim of this study is to assess the generalizability of saturated tree (ST, pruned tree (PT, and RSF. Methods. Data of 607 patients was randomly divided into training and test set applying 10-fold cross-validation. Using training sets, all three models were applied. Using Log-Rank test, ST was constructed by searching for optimal cutoffs. PT was selected plotting error rate versus minimum sample size in terminal nodes. In construction of RSF, 1000 bootstrap samples were drawn from the training set. C-index and integrated Brier score (IBS statistic were used to compare models. Results. ST provides the most overoptimized statistics. Mean difference between C-index in training and test set was 0.237. Corresponding figure in PT and RSF was 0.054 and 0.007. In terms of IBS, the difference was 0.136 in ST, 0.021 in PT, and 0.0003 in RSF. Conclusion. Pruning of tree and assessment of its performance of a test set partially improve the generalizability of decision trees. RSF provides results that are highly generalizable.

  16. Autoencoder Trees

    OpenAIRE

    İrsoy, Ozan; Alpaydın, Ethem

    2014-01-01

    We discuss an autoencoder model in which the encoding and decoding functions are implemented by decision trees. We use the soft decision tree where internal nodes realize soft multivariate splits given by a gating function and the overall output is the average of all leaves weighted by the gating values on their path. The encoder tree takes the input and generates a lower dimensional representation in the leaves and the decoder tree takes this and reconstructs the original input. Exploiting t...

  17. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    Science.gov (United States)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  18. Skin autofluorescence based decision tree in detection of impaired glucose tolerance and diabetes.

    Directory of Open Access Journals (Sweden)

    Andries J Smit

    Full Text Available AIM: Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons. METHODS: Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE. RESULTS: 218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%. CONCLUSION: SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.

  19. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    Science.gov (United States)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  20. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  1. Using decision trees to predict benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea.

    Science.gov (United States)

    Pesch, Roland; Pehlke, Hendrik; Jerosch, Kerstin; Schröder, Winfried; Schlüter, Michael

    2008-01-01

    In this article a concept is described in order to predict and map the occurrence of benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea. The approach consists of two work steps: (1) geostatistical analysis of abiotic measurement data and (2) calculation of benthic provinces by means of Classification and Regression Trees (CART) and GIS-techniques. From bottom water measurements on salinity, temperature, silicate and nutrients as well as from punctual data on grain size ranges (0-20, 20-63, 63-2,000 mu) raster maps were calculated by use of geostatistical methods. At first the autocorrelation structure was examined and modelled with help of variogram analysis. The resulting variogram models were then used to calculate raster maps by applying ordinary kriging procedures. After intersecting these raster maps with punctual data on eight benthic communities a decision tree was derived to predict the occurrence of these communities within the study area. Since such a CART tree corresponds to a hierarchically ordered set of decision rules it was applied to the geostatistically estimated raster data to predict benthic habitats within and near the EEZ. PMID:17680336

  2. Condition monitoring on grinding wheel wear using wavelet analysis and decision tree C4.5 algorithm

    Directory of Open Access Journals (Sweden)

    S.Devendiran

    2013-10-01

    Full Text Available A new online grinding wheel wear monitoring approach to detect a worn out wheel, based on acoustic emission (AE signals processed by discrete wavelet transform and statistical feature extraction carried out using statistical features such as root mean square and standard deviation for each wavelet decomposition level and classified using tree based knowledge representation methodology decision tree C4.5 data mining techniques is proposed. The methodology was validate with AE signal data obtained in Aluminium oxide 99 A(38A grinding wheel which is used in three quarters of majority grinding operations under different grinding conditions to validate the proposed classification system. The results of this scheme with respect to classification accuracy were discussed.

  3. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time...... of western Danish power system which is characterized by its large scale wind energy penetration and high proportion of distributed generation (DG). DIgSILENT PowerFactory is adopted for the power system simulation and Salford Predictive Modeler (SPM) is used for data mining.......-domain simulation and the process of data mining, which is then implemented online as guidelines for preventive control schemes. An algorithm named Classification and Regression Trees (CART) is used to train the DT and key to this approach lies on the accuracy of DT. This paper proposes contingency oriented DT...

  4. Determination of fetal state from cardiotocogram using LS-SVM with particle swarm optimization and binary decision tree.

    Science.gov (United States)

    Yılmaz, Ersen; Kılıkçıer, Cağlar

    2013-01-01

    We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable classification accuracy rate of 91.62%.

  5. Analysis of Human Papillomavirus Using Datamining - Apriori, Decision Tree, and Support Vector Machine (SVM and its Application Field

    Directory of Open Access Journals (Sweden)

    Cho Younghoon

    2016-01-01

    Full Text Available Human Papillomavirus(HPV has various types (compared to other viruses and plays a key role in evoking diverse diseases, especially cervical cancer. In this study, we aim to distinguish the features of HPV of different degree of fatality by analyzing their DNA sequences. We used Decision Tree Algorithm, Apriori Algorithm, and Support Vector Machine in our experiment. By analyzing their DNA sequences, we discovered some relationships between certain types of HPV, especially on the most fatal types, 16 and 18. Moreover, we concluded that it would be possible for scientists to develop more potent HPV cures by applying these relationships and features that HPV virus exhibit.

  6. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain;

    2014-01-01

    This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power......-effective algorithm is adopted in this proposed approach to optimize the trajectory of preventive control. The paper also proposes an importance sampling algorithm on database preparation for efficient DT training for power systems with high penetration of wind power and distributed generation. The performance...... of the approach is demonstrated on a 400-bus, 200-line operational model of western Danish power system....

  7. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    Directory of Open Access Journals (Sweden)

    Hedayetul Islam Shovon

    2012-08-01

    Full Text Available Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student’s GPA and based on that instructor can take necessary step to improve student academic performance

  8. An analysis and study of decision tree induction operating under adaptive mode to enhance accuracy and uptime in a dataset introduced to spontaneous variation in data attributes

    Directory of Open Access Journals (Sweden)

    Uttam Chauhan

    2011-01-01

    Full Text Available Many methods exist for the purpose of classification of an unknown dataset. Decision tree induction is one of the well-known methods for classification. Decision tree method operates under two different modes: nonadaptive and adaptive mode. The non adaptive mode of operation is applied when the data set is completely mature and available or the data set is static and their will be no changes in dataset attributes. However when the dataset is likely to have changes in the values and attributes leading to fluctuation i.e., monthly, quarterly or annually, then under the circumstances decision tree method operating under adaptive mode needs to be applied, as the conventional non-adaptive method fails, as it needs to be applied once again starting from scratch on the augmented dataset. This makes things expensive in terms of time and space. Sometimes attributesare added into the dataset, at the same time number of records also increases. This paper mainly studies the behavioral aspects of classification model particularly, when number of attr bute in dataset increase due to spontaneous changes in the value(s/attribute(s. Our investigative studies have shown that accuracy of decision tree model can be maintained when number of attributes including class increase in dataset which increases thenumber of records as well. In addition, accuracy also can be maintained when number of values increase in class attribute of dataset. The way Adaptive mode decision tree method operates is that it reads data instance by instance and incorporates the same through absorption to the said model; update the model according to valueof attribute particular and specific to the instance. As the time required to updating decision tree can be less than introducing it from scratch, thus eliminating the problem of introducing decision tree repeatedly from scratch and at the same time gaining upon memory and time.

  9. Use of decision trees to value investigation strategies for soil pollution problems

    NARCIS (Netherlands)

    Okx, J.P.; Stein, A.

    2000-01-01

    Remediation of a contaminated site usually requires costly actions, and several clean-up and sampling strategies may have to be compared by those involved in the decision-making process. In this paper several common environmental pollution problems have been addressed by using probabilistic decision

  10. Integration of health services in the care of people living with aids: an approach using a decision tree.

    Science.gov (United States)

    de Medeiros, Leidyanny Barbosa; Trigueiro, Débora Raquel Soares Guedes; da Silva, Daiane Medeiros; do Nascimento, João Agnaldo; Monroe, Aline Aparecida; Nogueira, Jordana de Almeida; Leadebal, Oriana Deyze Correia Paiva

    2016-02-01

    The care offer to people living with HIV/AIDS must transcend specialized outpatient services and include the participation of the Family Health Strategy. By understanding the importance of integration between these two points in the care network, the study aimed to build a decision support model to assist professionals of specialized health services in identifying behavior patterns in the use of Family Health Strategy services by people living with HIV/AIDS attended in the outpatient clinic. Thus, was proposed a model called decision tree, created from a database of 141 people with AIDS, users of a specialized outpatient clinic. The decision-making variable was the use of Family Health Strategy services by evaluating the integration of care. The model enabled the establishment of 23 rules with 80.1% hit percentage, what may support the decision-making of professionals in identifying situations in which it is necessary to stimulate the use of the Family Health Strategy by users. PMID:26910161

  11. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    Directory of Open Access Journals (Sweden)

    Somaya Hashem

    2016-01-01

    Full Text Available Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2 or advanced (F3-F4 fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy.

  12. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  13. Remote Sensing Image Classification Based on Decision Tree in the Karst Rocky Desertification Areas: A Case Study of Kaizuo Township

    Institute of Scientific and Technical Information of China (English)

    Shuyong; MA; Xinglei; ZHU; Yulun; AN

    2014-01-01

    Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.

  14. Network Traffic Classification Using SVM Decision Tree%基于SVM决策树的网络流量分类

    Institute of Scientific and Technical Information of China (English)

    邱婧; 夏靖波; 柏骏

    2012-01-01

    In order to solve the unrecognized area and long training time problems existed when using Support Vector Machine ( SVM) method in network traffic classification, SVM decision tree was used in network traffic classification by using its advantages in multi-class classification. The authoritative flow data sets were tested. The experiment results show that SVM decision tree method has shorter training time and better classification performance than ordinary "one-on-one" and "one-on-more"SVM method in network traffic classification, whose classification accuracy rate can reach 98. 8%.%提出一种用支持向量机(SVM)决策树来对网络流量进行分类的方法,利用SVM决策树在多类分类方面的优势,解决SVM在流量分类中存在的无法识别区域和训练时间较长的问题.对权威流量数据集进行了测试,实验结果表明,SVM决策树在流量分类中比普通的“一对一”和“一对多”SVM方法具有更短的训练时问和更好的分类性能,分类准确率可以达到98.8%.

  15. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  16. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Directory of Open Access Journals (Sweden)

    Barbara Kraszewska-Głomba

    2016-03-01

    Full Text Available As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT and C-reactive protein (CRP in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42 or viral (n=39 infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30, the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  17. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  18. Extracting impervious surfaces from multi-source satellite imagery based on unified conceptual model by decision tree algorithm

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Extraction of impervious surfaces is one of the necessary processes in urban change detection.This paper derived a unified conceptual model (UCM) from the vegetation-impervious surface-soil (VIS) model to make the extraction more effective and accurate.UCM uses the decision tree algorithm with indices of spectrum and texture,etc.In this model,we found both dependent and independent indices for multi-source satellite imagery according to their similarity and dissimilarity.The purpose of the indices is to remove the other land-use and land-cover types (e.g.,vegetation and soil) from the imagery,and delineate the impervious surfaces as the result.UCM has the same steps conducted by decision tree algorithm.The Landsat-5 TM image (30 m) and the Satellite Probatoire d’Observation de la Terre (SPOT-4) image (20 m) from Chaoyang District (Beijing) in 2007 were used in this paper.The results show that the overall accuracy in Landsat-5 TM image is 88%,while 86.75% in SPOT-4 image.It is an appropriate method to meet the demand of urban change detection.

  19. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    Directory of Open Access Journals (Sweden)

    Wided Khiari

    2013-09-01

    Full Text Available This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through the nature (type of information disclosed. Applying the decision tree method, the obtained Tree diagrams provide ways to know the characteristics of a particular firm regardless of its level of disclosure. Obtained results show that the characteristics of corporate governance to achieve good quality of disclosure are not unique for all firms. These structures are not necessarily all of the recommendations of best practices, but converge towards the best combination. Indeed, in practice, there are companies which have a good quality of disclosure but are not well governed. However, we hope that by improving their governance system their level of disclosure may be better. These findings show, in a general way, a convergence towards the standards of corporate governance with a few exceptions related to the specificity of Tunisian listed firms and show the need for the adoption of a code for each context. These findings shed the light on corporate governance features that enhance incentives for good disclosure. It allows identifying, for each firm and in any date, corporate governance determinants of disclosure quality. More specifically, and all being equal, obtained tree makes a rule of decision for the company to know the level of disclosure based on certain characteristics of the governance strategy adopted by the latter.

  20. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    Science.gov (United States)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  1. Decisions for others become less impulsive the further away they are on the family tree.

    Directory of Open Access Journals (Sweden)

    Fenja V Ziegler

    Full Text Available BACKGROUND: People tend to prefer a smaller immediate reward to a larger but delayed reward. Although this discounting of future rewards is often associated with impulsivity, it is not necessarily irrational. Instead it has been suggested that it reflects the decision maker's greater interest in the 'me now' than the 'me in 10 years', such that the concern for our future self is about the same as for someone else who is close to us. METHODOLOGY/PRINCIPAL FINDINGS: To investigate this we used a delay-discounting task to compare discount functions for choices that people would make for themselves against decisions that they think that other people should make, e.g. to accept $500 now or $1000 next week. The psychological distance of the hypothetical beneficiaries was manipulated in terms of the genetic coefficient of relatedness ranging from zero (e.g. a stranger, or unrelated close friend, .125 (e.g. a cousin, .25 (e.g. a nephew or niece, to .5 (parent or sibling. CONCLUSIONS/SIGNIFICANCE: The observed discount functions were steeper (i.e. more impulsive for choices in which the decision-maker was the beneficiary than for all other beneficiaries. Impulsiveness of decisions declined systematically with the distance of the beneficiary from the decision-maker. The data are discussed with reference to the implusivity and interpersonal empathy gaps in decision-making.

  2. Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model

    Directory of Open Access Journals (Sweden)

    Takada Masahiro

    2012-06-01

    Full Text Available Abstract Background The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method—the alternating decision tree (ADTree. Methods Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174 and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC curve analysis to discriminate node-positive patients from node-negative patients. Results The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI, 0.689–0.850] for the model training dataset and 0.772 (95% CI: 0.689–0.856 for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763–0.774. Conclusions Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.

  3. Application of decision trees to the analysis of soil radon data for earthquake prediction.

    Science.gov (United States)

    Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  4. Application of decision trees to the analysis of soil radon data for earthquake prediction

    Energy Technology Data Exchange (ETDEWEB)

    Zmazek, B. E-mail: boris.zmazek@ijs.si; Todorovski, L.; Dzeroski, S.; Vaupotic, J.; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  5. Real-time Container Transport Planning with Decision Trees based on Offline Obtained Optimal Solutions

    NARCIS (Netherlands)

    B. van Riessen (Bart); R.R. Negenborn (Rudy); R. Dekker (Rommert)

    2016-01-01

    textabstractHinterland networks for container transportation require planning methods in order to increase efficiency and reliability of the inland road, rail and waterway connections. In this paper we aim to derive real-time decision rules for suitable allocations of containers to inland services b

  6. Comparison between SARS CoV and MERS CoV Using Apriori Algorithm, Decision Tree, SVM

    Directory of Open Access Journals (Sweden)

    Jang Seongpil

    2016-01-01

    Full Text Available MERS (Middle East Respiratory Syndrome is a worldwide disease these days. The number of infected people is 1038(08/03/2015 in Saudi Arabia and 186(08/03/2015 in South Korea. MERS is all over the world including Europe and the fatality rate is 38.8%, East Asia and the Middle East. The MERS is also known as a cousin of SARS (Severe Acute Respiratory Syndrome because both diseases show similar symptoms such as high fever and difficulty in breathing. This is why we compared MERS with SARS. We used data of the spike glycoprotein from NCBI. As a way of analyzing the protein, apriori algorithm, decision tree, SVM were used, and particularly SVM was iterated by normal, polynomial, and sigmoid. The result came out that the MERS and the SARS are alike but also different in some way.

  7. FPGA-Based Network Traffic Security:Design and Implementation Using C5.0 Decision Tree Classifier

    Institute of Scientific and Technical Information of China (English)

    Tarek Salah Sobh; Mohamed Ibrahiem Amer

    2013-01-01

    In this work, a hardware intrusion detection system (IDS) model and its implementation are introduced to perform online real-time traffic monitoring and analysis. The introduced system gathers some advantages of many IDSs: hardware based from implementation point of view, network based from system type point of view, and anomaly detection from detection approach point of view. In addition, it can detect most of network attacks, such as denial of services (DoS), leakage, etc. from detection behavior point of view and can detect both internal and external intruders from intruder type point of view. Gathering these features in one IDS system gives lots of strengths and advantages of the work. The system is implemented by using field programmable gate array (FPGA), giving a more advantages to the system. A C5.0 decision tree classifier is used as inference engine to the system and gives a high detection ratio of 99.93%.

  8. Nitrogen removal influence factors in A/O process and decision trees for nitrification/denitrification system

    Institute of Scientific and Technical Information of China (English)

    MA Yong; PENG Yong-zhen; WANG Shu-ying; WANG Xiao-lian

    2004-01-01

    In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.

  9. An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

    Science.gov (United States)

    Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

    2014-03-01

    Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. PMID:24296116

  10. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    Science.gov (United States)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  11. 基于分类矩阵的决策树算法%Decision tree algorithm based on classification matrix

    Institute of Scientific and Technical Information of China (English)

    陶道强; 马良荔; 彭超

    2012-01-01

    为了提高决策树分类的速度和精确率,提出了一种基于分类矩阵的决策树算法.介绍了ID3算法的理论基础,定义了一种分类矩阵,指出了ID3算法的取值偏向性并利用分类矩阵给出了证明.在此基础上,引入了一个权重因子,抑制了原有算法的取值偏向,并利用分类矩阵给出相应证明,同时根据基于分类矩阵增益的特点,提出了新的决策树分类方案,旨在运算速率上进行优化,与原有算法进行了实验比较.对实验结果分析表明,优化后的方案在性能上有明显改善.%To improve the classification speed and accuracy of the decision tree algorithm, a new program is proposed based on classification matrix. Firstly, the basic theory of the ID3 algorithm is introduced and a classification matrix is defined. Then the variety bias of this algorithm is pointed out, which is proved using the classification matrix. On the basis of the above, a weighting factor is cited to suppress the variety bias of the ID3 algorithm on the premise of a corresponding proof. According to the characteristics of the gain based on the classification matrix, a new decision tree scheme is proposed, aiming to optimize computing speed. Finally, the program is compared with the ID3 algorithm through experiment Experimental results show that the optimized scheme is obviously better than the original one in performance.

  12. Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree

    OpenAIRE

    Huo, Q.; Ma, B.

    1999-01-01

    We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-t...

  13. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  14. Analytical solutions of linked fault tree probabilistic risk assessments using binary decision diagrams with emphasis on nuclear safety applications

    International Nuclear Information System (INIS)

    This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these deficiencies

  15. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Science.gov (United States)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  16. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Energy Technology Data Exchange (ETDEWEB)

    Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru [Saint Petersburg Electrotechnical University “LETI” (Russian Federation)

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  17. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  18. CLOUD DETECTION BASED ON DECISION TREE OVER TIBETAN PLATEAU WITH MODIS DATA

    OpenAIRE

    Xu, L.; Fang, S; Niu, R.; Li, J

    2012-01-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision meth...

  19. Forest or the trees: At what scale do elephants make foraging decisions?

    Science.gov (United States)

    Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David

    2012-07-01

    For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was

  20. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  1. Fuzzy Decision Trees with Possibility Distributions as Output%输出为可能性分布的模糊决策树

    Institute of Scientific and Technical Information of China (English)

    袁修久; 张文修

    2003-01-01

    More than one possible classifications for a given instance is supposed. A possibility distribution is assigned at a terminal node of a fuzzy decision tree. The possibility distribution of given instance with known value of attributes is determined by using simple fuzzy reasoning. The inconsistency in determining a single class for a given instance diminishes here.

  2. Nosocomial infections in Brazilian pediatric patients: using a decision tree to identify high mortality groups.

    Science.gov (United States)

    Lopes, Julia M M; Goulart, Eugenio M A; Siqueira, Arminda L; Fonseca, Inara K; Brito, Marcus V S de; Starling, Carlos E F

    2009-04-01

    Nosocomial infections (NI) are frequent events with potentially lethal outcomes. We identified predictive factors for mortality related to NI and developed an algorithm for predicting that risk in order to improve hospital epidemiology and healthcare quality programs. We made a prospective cohort NI surveillance of all acute-care patients according to the National Nosocomial Infections Surveillance System guidelines since 1992, applying the Centers for Disease Control and Prevention 1988 definitions adapted to a Brazilian pediatric hospital. Thirty-eight deaths considered to be related to NI were analyzed as the outcome variable for 754 patients with NI, whose survival time was taken into consideration. The predictive factors for mortality related to NI (p Cox regression model) were: invasive procedures and use of two or more antibiotics. The mean survival time was significantly shorter (p patients who suffered invasive procedures and for those who received two or more antibiotics. Applying a tree-structured survival analysis (TSSA), two groups with high mortality rates were identified: one group with time from admission to the first NI less than 11 days, received two or more antibiotics and suffered invasive procedures; the other group had the first NI between 12 and 22 days after admission and was subjected to invasive procedures. The possible modifiable factors to prevent mortality involve invasive devices and antibiotics. The TSSA approach is helpful to identify combinations of predictors and to guide protective actions to be taken in continuous-quality-improvement programs. PMID:20140354

  3. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    Institute of Scientific and Technical Information of China (English)

    潘大胜; 屈迟文

    2016-01-01

    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  4. Nosocomial infections in brazilian pediatric patients: using a decision tree to identify high mortality groups

    Directory of Open Access Journals (Sweden)

    Julia M.M. Lopes

    2009-04-01

    Full Text Available Nosocomial infections (NI are frequent events with potentially lethal outcomes. We identified predictive factors for mortality related to NI and developed an algorithm for predicting that risk in order to improve hospital epidemiology and healthcare quality programs. We made a prospective cohort NI surveillance of all acute-care patients according to the National Nosocomial Infections Surveillance System guidelines since 1992, applying the Centers for Disease Control and Prevention 1988 definitions adapted to a Brazilian pediatric hospital. Thirty-eight deaths considered to be related to NI were analyzed as the outcome variable for 754 patients with NI, whose survival time was taken into consideration. The predictive factors for mortality related to NI (p < 0.05 in the Cox regression model were: invasive procedures and use of two or more antibiotics. The mean survival time was significantly shorter (p < 0.05 with the Kaplan-Meier method for patients who suffered invasive procedures and for those who received two or more antibiotics. Applying a tree-structured survival analysis (TSSA, two groups with high mortality rates were identified: one group with time from admission to the first NI less than 11 days, received two or more antibiotics and suffered invasive procedures; the other group had the first NI between 12 and 22 days after admission and was subjected to invasive procedures. The possible modifiable factors to prevent mortality involve invasive devices and antibiotics. The TSSA approach is helpful to identify combinations of predictors and to guide protective actions to be taken in continuous-quality-improvement programs.

  5. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    Science.gov (United States)

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all Pdiscriminator; 92 per cent of animals with this clinical sign had laminitis (OR 40.5, Pdiscrimination (OR 15.5, P<0.001). This is the first epidemiological laminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements. PMID:26969668

  6. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  7. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. Khader

    2012-12-01

    Full Text Available Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i ignore the health risk of nitrate contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  8. Boosting foundations and algorithms

    CERN Document Server

    Schapire, Robert E

    2012-01-01

    Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate "rules of thumb." A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.

  9. Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM

    Directory of Open Access Journals (Sweden)

    Shweta Tiwari

    2012-06-01

    Full Text Available Around the world, trading in the stock market has gained huge attractiveness as a means through which, one can obtain vast profits. Attempting to profitably and precisely predict the financial market has long engrossed the interests and attention of bankers, economists and scientists alike. Stock market prediction is the act of trying, to determine the future value of a company’s stock or other financial instrument traded on a financial exchange. Accurate stock market predictions are important for many reasons. Chief among all is the need for investors, to hedge against potential market risks and the opportunities for arbitrators and speculators, to make profits by trading indexes. Stock Market is a place, where shares are issued and traded. These shares are either traded through Stock exchanges or Overthe-Counter in physical or electronic form. Data mining, as a process of discovering useful patterns, correlations has its own role in financial modeling. Data mining is a discipline in computational intelligence that deals with knowledge discovery, data analysis and full and semi-autonomous decision making. Prediction of stock market by data mining techniques has been receiving a lot of attention recently. This paper presents a hybrid system based on decision tree- rough set, for predicting the trends in the Bombay Stock Exchange (BSESENSEX with the combination of Hierarchical Hidden Markov Model. In this paper we present future trends on the bases of price earnings and dividend. The data on accounting earnings when averaged over many years help to predict the present value of future dividends.

  10. Method for Walking Gait Identification in a Lower Extremity Exoskeleton based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors’ information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person’s motion.

  11. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    Science.gov (United States)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  12. New energy opinion leaders' lifestyles and media usage - applying data mining decision tree analysis for UNIDO - ICHET web site users

    International Nuclear Information System (INIS)

    According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns

  13. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India.

    Science.gov (United States)

    Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K

    2013-01-01

    The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index.

  14. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  15. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    Science.gov (United States)

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers. PMID:27386995

  16. Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System

    Directory of Open Access Journals (Sweden)

    Jashan Koshal

    2012-08-01

    Full Text Available Main reason for the attack being introduced to the system is because of popularity of the internet. Information security has now become a vital subject. Hence, there is an immediate need to recognize and detect the attacks. Intrusion Detection is defined as a method of diagnosing the attack and the sign of malicious activity in a computer network by evaluating the system continuously. The software that performs such task can be defined as Intrusion Detection Systems (IDS. System developed with the individual algorithms like classification, neural networks, clustering etc. gives good detection rate and less false alarm rate. Recent studies show that the cascading of multiple algorithm yields much better performance than the system developed with the single algorithm. Intrusion detection systems that uses single algorithm, the accuracy and detection rate were not up to mark. Rise in the false alarm rate was also encountered. Cascading of algorithm is performed to solve this problem. This paper represents two hybrid algorithms for developing the intrusion detection system. C4.5 decision tree and Support Vector Machine (SVM are combined to maximize the accuracy, which is the advantage of C4.5 and diminish the wrong alarm rate which is the advantage of SVM. Results show the increase in the accuracy and detection rate and less false alarm rate.

  17. TreeAge Pro软件在医药卫生决策分析中的应用%The Application of TreeAge Pro software in the Medicine & Health Decision Analysis

    Institute of Scientific and Technical Information of China (English)

    李倩; 马爱霞

    2014-01-01

    TreeAge Pro software is widely used in the field of medical decision making .Most decision analysis literatures involved in the application of decision tree model use this software.Whereas,the article about this software is less and less which hinder the application of it.The aim of this essay is to meet the needs for beginners and easy to use through basic introduction.%TreeAge Pro软件在医药卫生决策领域广泛应用。在涉及应用决策树模型进行决策分析的文献中,大都采用了此软件。但是,目前关于此软件介绍性的文章少之又少,从而使得软件在使用过程中遇到了一定的障碍。本文旨在通过对此软件进行基础介绍,满足初学者对软件应用的需要,方便潜在用户使用。

  18. MALDI-TOF MS Combined With Magnetic Beads for Detecting Serum Protein Biomarkers and Establishment of Boosting Decision Tree Model for Diagnosis of Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Chibo Liu, Chunqin Pan, Jianmin Shen, Haibao Wang, Liang Yong

    2011-01-01

    Full Text Available The aim of present study is to study the serum protein fingerprint of patients with colorectal cancer (CRC and to screen protein molecules that are closely related to colorectal cancer during the onset and progression of the disease with Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS. Serum samples from 144 patients with CRC and 120 healthy volunteers were adopted in present study. Weak cation exchange (WCX magnetic beads and PBSII-C protein chips reader (Ciphergen Biosystems Ins. were used. The protein fingerprint expression of all the Serum samples and the resulted profiles between cancer and normal groups were analyzed with Biomarker Wizard system. Several proteomic peaks were detected and four potential biomarkers with different expression profiles were identified with their relative molecular weights of 2870.7Da, 3084Da, 9180.5Da, and 13748.8Da, respectively. Among the four proteins, two proteins with m/z 2870.7 and 3084 were down-regulated, and the other two with m/z 9180.5 and 13748.8 were up-regulated in serum samples from CRC patients. The present diagnostic model could distinguish CRC from healthy controls with the sensitivity of 92.85% and the specificity of 91.25%. Blind test data indicated a sensitivity of 86.95% and a specificity of 85%. The result suggested that MALDI technology could be used to screen critical proteins with differential expression in the serum of CRC patients. These differentially regulated proteins were considered as potential biomarkers for the patients with CRC in the serum and of the potential value for further investigation.

  19. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    Science.gov (United States)

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation.

  20. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    OpenAIRE

    R. Bou Kheir; P. K. Bøcher; M. B. Greve; M. H. Greve

    2010-01-01

    Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow directio...

  1. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  2. Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness.

    Directory of Open Access Journals (Sweden)

    Lukas Tanner

    Full Text Available BACKGROUND: Dengue is re-emerging throughout the tropical world, causing frequent recurrent epidemics. The initial clinical manifestation of dengue often is confused with other febrile states confounding both clinical management and disease surveillance. Evidence-based triage strategies that identify individuals likely to be in the early stages of dengue illness can direct patient stratification for clinical investigations, management, and virological surveillance. Here we report the identification of algorithms that differentiate dengue from other febrile illnesses in the primary care setting and predict severe disease in adults. METHODS AND FINDINGS: A total of 1,200 patients presenting in the first 72 hours of acute febrile illness were recruited and followed up for up to a 4-week period prospectively; 1,012 of these were recruited from Singapore and 188 from Vietnam. Of these, 364 were dengue RT-PCR positive; 173 had dengue fever, 171 had dengue hemorrhagic fever, and 20 had dengue shock syndrome as final diagnosis. Using a C4.5 decision tree classifier for analysis of all clinical, haematological, and virological data, we obtained a diagnostic algorithm that differentiates dengue from non-dengue febrile illness with an accuracy of 84.7%. The algorithm can be used differently in different disease prevalence to yield clinically useful positive and negative predictive values. Furthermore, an algorithm using platelet count, crossover threshold value of a real-time RT-PCR for dengue viral RNA, and presence of pre-existing anti-dengue IgG antibodies in sequential order identified cases with sensitivity and specificity of 78.2% and 80.2%, respectively, that eventually developed thrombocytopenia of 50,000 platelet/mm(3 or less, a level previously shown to be associated with haemorrhage and shock in adults with dengue fever. CONCLUSION: This study shows a proof-of-concept that decision algorithms using simple clinical and haematological parameters

  3. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    International Nuclear Information System (INIS)

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, ≤Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  4. 基于差分演化的GEP决策树算法%Decision Tree Algorithm by Gene Expression Programming Based on Differential Evolution

    Institute of Scientific and Technical Information of China (English)

    王卫红; 阮薇; 李曲

    2011-01-01

    基于均匀常数分布的基因表达式编程决策树算法存在对多属性数据分类效果不佳的问题.为此,提出一种基于差分演化的基因表达式编程决策树算法,该算法通过引入差分演化的方法对其附加阈值进行改进,从而使均匀的常数数组在保持均匀分布的同时仍不失多样性.实验结果表明,该方法在多属性数据的分类问题上能够得到良好的效果.%Uniformly distributed constants-based decision tree evolved by Gene Expression Programming(GEP) is a kind of classifier with fairly high accuracy, but its performance on multi-attribute data classification is not satisfactory. This paper presents an algorithm of Differential Evolution (DE)-based decision tree algorithm by GEP. This new algorithm uses differential evolution method to improve the additional threshold, and makes the uniform constant array have both uniformly and diversity. Experiments on benchmark datsets show it performs better on multi-attribute classification problems than basic GEP decision tree.

  5. Utilizing Home Healthcare Electronic Health Records for Telehomecare Patients With Heart Failure: A Decision Tree Approach to Detect Associations With Rehospitalizations.

    Science.gov (United States)

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H

    2016-04-01

    Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports. PMID:26848645

  6. Decision tree analysis to assess the cost-effectiveness of yttrium microspheres for treatment of hepatic metastases from colorectal cancer

    International Nuclear Information System (INIS)

    Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  7. Maximal standard dose of parenteral iron for hemodialysis patients: an MRI-based decision tree learning analysis.

    Directory of Open Access Journals (Sweden)

    Guy Rostoker

    Full Text Available Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin, in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%. The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4 with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.

  8. DECISION TREE CONSTRUCTION AND COST-EFFECTIVENESS ANALYSIS OF TREATMENT OF ULCERATIVE COLITIS WITH PENTASA® MESALAZINE 2 G SACHET

    Directory of Open Access Journals (Sweden)

    Alvaro Mitsunori NISHIKAWA

    2013-12-01

    Full Text Available Context Unspecified Ulcerative Rectocolitis is a chronic disease that affects between 0.5 and 24.5/105 inhabitants in the world. National and international clinical guidelines recommend the use of aminosalicylates (including mesalazine as first-line therapy for induction of remission of unspecified ulcerative rectocolitis, and recommend the maintenance of these agents after remission is achieved. However, multiple daily doses required for the maintenance of disease remission compromise compliance with treatment, which is very low (between 45% and 65%. Use of mesalazina in granules (2 g sachet once daily - Pentasa® sachets 2 g - can enhance treatment adherence, reflecting in an improvement in patients' outcomes. Objective To evaluate the evidence on the use of mesalazine for the maintenance of remission in patients with unspecified ulcerative rectocolitis and its effectiveness when taken once versus more than once a day. From an economic standpoint, to analyze the impact of the adoption of this dosage in Brazil's public health system, considering patients' adherence to treatment. Methods A decision tree was developed based on the Clinical Protocol and Therapeutic Guidelines for Ulcerative Colitis, published by the Ministry of Health in the lobby SAS/MS n° 861 of November 4 th, 2002 and on the algorithms published by the Associação Brasileira de Colite Ulcerativa e Doença de Crohn, aiming to get the cost-effectiveness of mesalazine once daily in granules compared with mesalazine twice daily in tablets. Results The use of mesalazine increases the chances of remission induction and maintenance when compared to placebo, and higher doses are associated with greater chance of success without increasing the risk of adverse events. Conclusion The use of a single daily dose in the maintenance of remission is effective and related to higher patient compliance when compared to the multiple daily dose regimens, with lower costs.

  9. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions

    OpenAIRE

    Mekala Sundaram; Willoughby, Janna R; Nathanael I Lichti; Michael A Steele; Swihart, Robert K.

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or c...

  10. Decisiveness

    OpenAIRE

    Junichiro Ishida

    2008-01-01

    This paper investigates how the presence of strong leadership influences an organization's ability to acquire and process information. The key concept is the leader's decisiveness. A decisive leader can make a bold move in response to a large change in the underlying landscape, whereas an indecisive leader biases her position excessively towards the status quo. An organization led by an indecisive leader needs to accumulate unrealistically strong evidence before it changes the course of actio...

  11. EVFDT: An Enhanced Very Fast Decision Tree Algorithm for Detecting Distributed Denial of Service Attack in Cloud-Assisted Wireless Body Area Network

    Directory of Open Access Journals (Sweden)

    Rabia Latif

    2015-01-01

    Full Text Available Due to the scattered nature of DDoS attacks and advancement of new technologies such as cloud-assisted WBAN, it becomes challenging to detect malicious activities by relying on conventional security mechanisms. The detection of such attacks demands an adaptive and incremental learning classifier capable of accurate decision making with less computation. Hence, the DDoS attack detection using existing machine learning techniques requires full data set to be stored in the memory and are not appropriate for real-time network traffic. To overcome these shortcomings, Very Fast Decision Tree (VFDT algorithm has been proposed in the past that can handle high speed streaming data efficiently. Whilst considering the data generated by WBAN sensors, noise is an obvious aspect that severely affects the accuracy and increases false alarms. In this paper, an enhanced VFDT (EVFDT is proposed to efficiently detect the occurrence of DDoS attack in cloud-assisted WBAN. EVFDT uses an adaptive tie-breaking threshold for node splitting. To resolve the tree size expansion under extreme noise, a lightweight iterative pruning technique is proposed. To analyze the performance of EVFDT, four metrics are evaluated: classification accuracy, tree size, time, and memory. Simulation results show that EVFDT attains significantly high detection accuracy with fewer false alarms.

  12. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  13. Learning Boost C++ libraries

    CERN Document Server

    Mukherjee, Arindam

    2015-01-01

    If you are a C++ programmer who has never used Boost libraries before, this book will get you up-to-speed with using them. Whether you are developing new C++ software or maintaining existing code written using Boost libraries, this hands-on introduction will help you decide on the right library and techniques to solve your practical programming problems.

  14. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    Directory of Open Access Journals (Sweden)

    Mekala Sundaram

    Full Text Available The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae to eastern gray squirrels (Sciurus carolinensis, and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%, and combined effects of seed traits and phylogeny of hardwood trees (5-55%. A phylogenetic PCA (pPCA on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is

  15. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    Science.gov (United States)

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  16. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions

    Science.gov (United States)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3- contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3- pollution activities via an unsupervised learning algorithm based on δ15N- and δ18O-NO3- and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3- contamination via a decision tree model. When a combination of δ15N-, δ18O-NO3- and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO42 - and Cl- variables. The NO3- and the δ15N- and δ18O-NO3- variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3- concentrations and isotopic values. Although only the SO42 - and Cl- were selected as important discriminating variables, concentration data alone could not identify the specific NO3- sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3- contamination, an integrated approach should be set-up by combining N and O isotopes of NO3- with land-uses and physico-chemical properties, especially in areas with complex agricultural activities.

  17. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-06-01

    Full Text Available Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation. This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to explain organic/mineral field measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in organic/mineral field measurements. The overall accuracy of the predictive organic/inorganic landscapes' map produced (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to facilitate the implementation of pedological/hydrological plans for conservation

  18. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-01-01

    Full Text Available Accurate information about soil organic carbon (SOC, presented in a spatially form, is prerequisite for many land resources management applications (including climate change mitigation. This paper aims to investigate the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes at unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to statistically explain SOC field measurements in hydromorphic landscapes of the chosen Danish area. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in field SOC measurements. The overall accuracy of the produced predictive SOC map (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to help with the implementation of pedological/hydrological plans for conservation and sustainable

  19. Entanglement asymmetry for boosted black branes

    CERN Document Server

    Mishra, Rohit

    2016-01-01

    We study the effects of asymmetry in entanglement thermodynamics of the CFT subsystems. It is found that `boosted' $p$-branes backgrounds give rise to the first law of the entanglement thermodynamics where the CFT pressure plays decisive role in the entanglement. Two different strip like subsystems, one parallel to the boost and the other perpendicular, are studied in the perturbative regime, where $T_{thermal}\\ll T_E$. We also discuss the AdS-wave backgrounds where some universal bounds can be obtained.

  20. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  1. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    International Nuclear Information System (INIS)

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  2. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  3. Invasion Rule Generation Based on Fuzzy Decision Tree%基于模糊决策树的入侵规则生成技术

    Institute of Scientific and Technical Information of China (English)

    郭洪荣

    2013-01-01

      计算机免疫系统模型GECISM中的类MC Agent,可有效的利用模糊决策树Fuzzy-Id3算法,将应用程序中系统调用视为数据集构造决策树,便会生成计算机免疫系统中入侵检测规则,并分析对比试验结束后的结果,利用Fuzzy-Id3算法所生成的规则对于未知数据的收集进行分类,具有低误报率、低漏报率。%Class MC Agent of computer immune system model GECISM can effectively use fuzzy decision-making tree Fuzzy-Id3 algorithm, consider the system call in application program as data set constructed decision-making tree, generate the invasion detection rules of computer immune system, and analyze comparison test results, use rules generated by Fuzzy-Id3 algorithm to classify for unknown data of collection, has low errors reported rate, and low omitted rate.

  4. 基于LBP和SVM决策树的人脸表情识别%Facial Expression Recognition Based on LBP and SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    李扬; 郭海礁

    2014-01-01

    为了提高人脸表情识别的识别率,提出一种LBP和SVM决策树相结合的人脸表情识别算法。首先利用LBP算法将人脸表情图像转换为LBP特征谱,然后将LBP特征谱转换成LBP直方图特征序列,最后通过SVM决策树算法完成人脸表情的分类和识别,并且在JAFFE人脸表情库的识别中证明该算法的有效性。%In order to improve the recognition rate of facial expression, proposes a facial expression recognition algorithm based on a LBP and SVM decision tree. First facial expression image is converted to LBP characteristic spectrum using LBP algorithm, and then the LBP character-istic spectrum into LBP histogram feature sequence, finally completes the classification and recognition of facial expression by SVM deci-sion tree algorithm, and proves the effectiveness of the proposed method in the recognition of facial expression database in JAFFE.

  5. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays.

  6. 决策树方法在网球训练中的应用%Application of the Decision Tree in Tennis Trainings

    Institute of Scientific and Technical Information of China (English)

    冯能山; 龙超; 熊金志; 廖国君

    2014-01-01

    数据挖掘在体育领域的应用还比较少。如何利用好体育运动的训练数据,从中挖掘出有用信息,是数据挖掘技术在体育领域中的一项重要任务。决策树方法是一种常用的数据挖掘技术,该文把决策树方法应用于网球训练,对有关数据进行挖掘,形成一棵网球训练的决策树,从而协助体育工作人员更合理地制定网球训练方案,提高网球训练的效率。%Nowadays it is still relatively rare to see the applications of data mining in the field of sports. However, applying data mining in sports can facilitate a more efficient way to use the data of sports training by digging out the relevant information. In this paper, the decision tree approach is applied in the tennis training to form a decision tree through digging out the relevant data. As a result, the application helps the staffs of sports to make a more rational tennis training program whereas the efficiency of ten-nis training can be improved.

  7. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. PMID:26796566

  8. Reconstruction of boosted $W^{\\pm}$ and $Z^{0}$ bosons from fat jets

    CERN Document Server

    Heinrich, Jochen Jens; Petersen, Troels Christian

    We present the reconstruction of heavily boosted $W^{\\pm}$ and $Z^{0}$ bosons from large R-parameter jets (fat jets) in all-hadronic proton-proton collisions at $\\sqrt{s} = 8$ TeV at the LHC. The electroweak gauge bosons are boosted to a degree at which their hadronic decay products are collimated enough to be reconstructed as a single fat jet. A mass-drop filtering procedure which is validated in studies on Monte Carlo (MC) samples is then applied to the fat jets with $p_{T} > 420$ GeV to suppress pileup and soft radiation. $W^{\\pm}$ and $Z^{0}$ bosons are identified based on their filtered jet mass. The efficiency of common substructure observables and event shape variables, in distinguishing between signal and QCD background is evaluated on MC and the optimized observable selection used for the training of two boosted decision trees (BDT), in order to reduce the dijet background not originating from the decay of an electroweak gauge boson. For the first BDT, signal MC has been trained against background MC...

  9. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  10. 基于决策树法的灭火救援指挥决策方案风险评估%Fire Risk Assessment of Firefighting Command Based on Tree Decision - making

    Institute of Scientific and Technical Information of China (English)

    赵勇; 贾定守

    2012-01-01

    Based fighting command on the tree decision - making , this paper analyzes various factors that influence the decisions of fire in real cases, sets up an assessment index system, drafts risk assessment of tree decision - making, calculates the expectation value, assesses the risk of decision - making schemas and provides decision - making plans suited to the fire fighting scenes. Assessing the risks through tree decision -making can improve the decision maker' s decision quality and command capability and perfect the operating mechanism of fire fighting command.%通过决策树法结合实际案例对影响灭火救援指挥决策的各因素进行分析,建立了指标评估体系,绘制了风险评估决策树,并计算出期望值,对决策方案进行风险评估,给出了符合火灾救援现场实际情况的决策方案。通过决策树法对指挥决策进行风险评估,有利于提高指挥者的决策质量和指挥能力,完善灭火救援指挥运作机制。

  11. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    Science.gov (United States)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  12. Classification of Questions Consulted by Patients with Decision-making Tree%利用决策树的患者咨询问题分类

    Institute of Scientific and Technical Information of China (English)

    吴东东; 刘锋; 于鸿飞; 黄昊

    2016-01-01

    目的:对患者咨询的问题进行分类,了解患者就医过程中的盲点,解决按照医疗分类造成的无效分类、交叉分类和分类繁琐的问题。方法:利用CLS(Concept Learning System)算法,通过建立初始决策树,利用测试样本进行测试,最终获得分类决策树和咨询信息分类类别。结果:通过决策树分类得出的患者咨询信息类别对应于就医流程的各个环节,最终分类类别可以覆盖所有的数据样本,各类别之间不存在交叉。结论:患者咨询的问题涉及到就诊过程中的27个环节,医院需要有的放矢,针对患者提问中涉及到的就医流程细节进行讲解、宣教,医院可以将患者咨询所涉及到的信息公布在网站、APP、微网站等各类信息平台,便于患者查询。%Objective:Classify the questions which are asked by patients, get out the blindness of patients in the hospital and solve the problem of Over-Categorization, crossed classification and tedious classification. Methods:Using the CLS (Concept Learning System) algorithm to build decision-making tree by initial decision-making tree and sample test, classify the new data samples. Result: The category can be get out with decision-making tree, which are parallel to part of hospital works. These categories can cover all massage from patients and there are no overlaps. Conclusion: Patients' questions involved in 27 hospital works. Hospital need to publish the information patients asked on hospital information platform such as website, APP, interactive platform, to provide easy access for patients.

  13. The Use of Decision Tree Flowchart in Stomatology Education%决策树流程图辅助口腔临床教学经验介绍

    Institute of Scientific and Technical Information of China (English)

    周敏; 刘宏伟; 何园

    2013-01-01

    Objective:To investigate feasibility of the decision tree flowchart model applying into the clinical teaching of stomatology. Methods: First, a clinical problem of a patient was selected as the target. Then the students were ordered to list all the different possible conditions of the clinical problem or its classifications, and list the indications / contraindications of each treatment method. Finally, a decision tree flowchart was established after the completion of the tasks above. Results: This teaching mode gave full play to the initiative and enthusiasm of the students, which also helped them to classify and summarize the knowledge and developed their logical thinking. It was welcomed and very satisfying for most students. Conclusion: It's more active and effective in dentistry clinical teaching with the help of the decision tree flowchart modal.%目的:探讨将决策树流程图模式应用于口腔临床教学的可行性.方法:2010-08-2012-12期间,对进入牙周科轮转的20名住院医师,临床理论教学采用了决策树流程图方法.以某一临床问题为目标,引导学生通过列举出与这一目标问题的相关分类、不同的临床情况以及所有相应治疗方式的适应证和禁忌证,从而构建出决策树模型.结果:在这一教学模式中学生可以充分发挥积极性,将多个知识点进行归类、梳理和归纳,调动了发散思维和逻辑思维,获得学生好评,取得了满意的教学效果.结论:利用决策树流程图进行教学,可以使口腔临床教学更加积极有效.

  14. Internet Traffic Classification Using C4.5 Decision Tree%基于C4.5决策树的流量分类方法

    Institute of Scientific and Technical Information of China (English)

    徐鹏; 林森

    2009-01-01

    近年来,利用机器学习方法处理流量分类问题成为网络测量领域一个新兴的研究方向.在现有研究中,朴素贝叶斯方法及其改进算法以其实现简单、分类高效的特点而被广泛应用.但此类方法过分依赖于样本在样本空间的分布,具有潜在的不稳定性.为此,引入C4.5决策树方法来处理流量分类问题.该方法利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.理论分析和实验结果都表明,利用C4.5决策树来处理流量分类问题在分类稳定性上均具有明显的优势.%In recent years, Internet traffic classification using machine learning has become a new direction in network measurement. Being simple and efficient Naive Bayes and its improved methods have been widely used in this area. But these methods depend too much on probability distribution of sample spacing, so they have connatural instability. To handle this problem, a new method based on C4.5 decision tree is proposed in this paper. This method builds a classification model using information entropy in training data and classifies flows just by a simple search of the decision tree. The theoretical analysis and experimental results show that there are obvious advantages in classification stability when C4.5 decision tree method is used to classify Internet traffic.

  15. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  16. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  17. Design of a new hybrid artificial neural network method based on decision trees for calculating the Froude number in rigid rectangular channels

    Directory of Open Access Journals (Sweden)

    Ebtehaj Isa

    2016-09-01

    Full Text Available A vital topic regarding the optimum and economical design of rigid boundary open channels such as sewers and drainage systems is determining the movement of sediment particles. In this study, the incipient motion of sediment is estimated using three datasets from literature, including a wide range of hydraulic parameters. Because existing equations do not consider the effect of sediment bed thickness on incipient motion estimation, this parameter is applied in this study along with the multilayer perceptron (MLP, a hybrid method based on decision trees (DT (MLP-DT, to estimate incipient motion. According to a comparison with the observed experimental outcome, the proposed method performs well (MARE = 0.048, RMSE = 0.134, SI = 0.06, BIAS = -0.036. The performance of MLP and MLP-DT is compared with that of existing regression-based equations, and significantly higher performance over existing models is observed. Finally, an explicit expression for practical engineering is also provided.

  18. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    Science.gov (United States)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the

  19. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  20. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with rresources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  1. Virus Detection Algorithm Based on Decision Tree%基于决策树的病毒检测算法磁

    Institute of Scientific and Technical Information of China (English)

    朱俚治

    2015-01-01

    如今病毒的智能性,日益突出。具有当代智能性技术的病毒能够躲避部分杀毒软件的检测。因此有些病毒,在传统检测算法面前是难以被发现。为有效检测出采用了新技术的病毒,使得病毒检测算法具有新的智能性是十分必要的。MMTD算法和决策树算法是两种智能性的算法,该智能性算法在检测病毒上进行应用将有助提高病毒检测算法的智能性。因此根据当病毒检测时的过程中病毒表现出的特性,论文将M M TD算法和决策树算法结合在一起而提出了一种新的病毒检测算法。%Today intelligence viruses have become increasingly prominent .Virus with a contemporary intelligent tech‐nologies can evade detection portion antivirus software .Therefore ,some viruses ,in front of the traditional detection algo‐rithm are difficult to be found .To effectively detect the virus ,using a new technology ,virus detection algorithm with a new intelligence is essential .MMTD algorithms and decision tree algorithms are two intelligent algorithms .The intelligent algo‐rithms for application in the detection of the virus will help to improve virus detection algorithm intelligence .Therefore ,ac‐cording to the time when the process of virus detection virus exhibit characteristics ,this article combines MMTD algorithms and decision tree algorithms together and propose a new virus detection algorithm .

  2. Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods

    OpenAIRE

    Khan, Saqib Hussain

    2010-01-01

    Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the databas...

  3. Cost effectiveness of community-based therapeutic care for children with severe acute malnutrition in Zambia: decision tree model

    OpenAIRE

    Bachmann Max O

    2009-01-01

    Abstract Background Children aged under five years with severe acute malnutrition (SAM) in Africa and Asia have high mortality rates without effective treatment. Primary care-based treatment of SAM can have good outcomes but its cost effectiveness is largely unknown. Method This study estimated the cost effectiveness of community-based therapeutic care (CTC) for children with severe acute malnutrition in government primary health care centres in Lusaka, Zambia, compared to no care. A decision...

  4. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    OpenAIRE

    Hedayetul Islam Shovon; Mahfuza Haque

    2012-01-01

    Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA) in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information sho...

  5. Analytical solutions of linked fault tree probabilistic risk assessments using binary decision diagrams with emphasis on nuclear safety applications[Dissertation 17286

    Energy Technology Data Exchange (ETDEWEB)

    Nusbaumer, O. P. M

    2007-07-01

    This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these

  6. Analytical solutions of linked fault tree probabilistic risk assessments using binary decision diagrams with emphasis on nuclear safety applications[Dissertation 17286

    Energy Technology Data Exchange (ETDEWEB)

    Nusbaumer, O. P. M

    2007-07-01

    This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these

  7. Competitive Decision Algorithm for the Steiner Minimal Tree Problem in Graphs%图的Steiner最小树的竞争决策算法

    Institute of Scientific and Technical Information of China (English)

    熊小华; 刘艳芳; 宁爱兵

    2012-01-01

    图的Steiner最小树问题是一个著名的NP难题,在通讯网络、VLSI等工程实践中有着重要的应用.在分析图的Steiner最小树问题数学性质的基础上,提出了图的Steiner最小树的竞争决策算法.为了验证算法的有效性,求解了OR-Library中的基准问题,测试结果表明了算法具有较好的求解效果.%The Steiner minimal tree problem in graphs(GSTP) is a well-known NP-hard problem. Its applications can be found in many areas, such as telecommunication network design,VLSI design,etc. A competitive decision algorithm was developed to solve the GSTP. The mathematical properties of GSTP were analysed, which can be used to scale down the size of original problem and accelerate the algorithm. To assess the efficiency of the proposed competitive decision algorithm,it was applied to a set of benchmark problems in the OR-Library. In terms of computation times,our algorithm clearly outperforms other heuristics for the Steiner problem in graphs, while obtaining better or comparable solutions.

  8. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    Science.gov (United States)

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value. PMID:27221983

  9. A best-first soft/hard decision tree searching MIMO decoder for a 4 × 4 64-QAM system

    KAUST Repository

    Shen, Chungan

    2012-08-01

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. © 2011 IEEE.

  10. Breast boost - why, how, when...?

    International Nuclear Information System (INIS)

    Background: Breast conservation management including tumorectomy or quadrantectomy and external beam radiotherapy with a dose of 45 to 50 Gy in the treatment of small breast carcinomas is generally accepted. The use of a radiation boost - in particular for specific subgroups - has not been clarified. With regard to the boost technique there is some controversy between groups emphasizing the value of electron boost treatment and groups pointing out the value of interstitial boost treatment. This controversy has become even more complicated as there is an increasing number of institutions reporting the use of HDR interstitial brachytherapy for boost treatment. The most critical issue with regard to interstitial HDR brachytherapy is the assumed serious long-term morbidity after a high single radiation dose as used in HDR-treatments. Methods and Results: This article gives a perspective and recommendations on some aspects of this issue (indication, timing, target volume, dose and dose rate). Conclusion: More information about the indication for a boost is to be expected from the EORTC trial 22881/10882. Careful selection of treatment procedures for specific subgroups of patients and refinement in surgical procedures and radiotherapy techniques may be useful in improving the clinical and cosmetic results in breast conservation therapy. Prospective trials comparing on the one hand different boost techniques and on the other hand particular morphologic criteria in treatments with boost and without boost are needed to give more detailed recommendations for boost indication and for boost techniques. (orig.)

  11. Diversity-Based Boosting Algorithm

    Directory of Open Access Journals (Sweden)

    Jafar A. Alzubi

    2016-05-01

    Full Text Available Boosting is a well known and efficient technique for constructing a classifier ensemble. An ensemble is built incrementally by altering the distribution of training data set and forcing learners to focus on misclassification errors. In this paper, an improvement to Boosting algorithm called DivBoosting algorithm is proposed and studied. Experiments on several data sets are conducted on both Boosting and DivBoosting. The experimental results show that DivBoosting is a promising method for ensemble pruning. We believe that it has many advantages over traditional boosting method because its mechanism is not solely based on selecting the most accurate base classifiers but also based on selecting the most diverse set of classifiers.

  12. The AdaBoost Flow

    CERN Document Server

    Lykov, A; Vaninsky, K

    2011-01-01

    We introduce a dynamical system which we call the AdaBoost flow. The flow is defined by a system of ODEs with control. We show how by a suitable choice of control AdaBoost algorithm of Schapire and Freund and arc-gv algorithm of Breiman can be embedded in the AdaBoost flow. We also show how previously studied by Schapire and Singer confidence rated prediction can be obtained from our continuous time approach. We introduce a new continuous time algorithm which we call superBoost and describe its properties. The AdaBoost flow equations coincide with the equations of dynamics of the nonperiodic Toda system written in terms of spectral variables. This establishes a connection between two seemingly unrelated fields of boosting algorithms and classical integrable models. Finally we explain similarity of the AdaBoost flow with Perelman's ideas to control Ricci flow.

  13. Boosting Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Elkin Eduardo García Díaz

    2006-11-01

    Full Text Available En este artículo, se presenta un algoritmo de clasificación binaria basado en Support Vector Machines (Máquinas de Vectores de Soporte que combinado apropiadamente con técnicas de Boosting consigue un mejor desempeño en cuanto a tiempo de entrenamiento y conserva características similares de generalización con un modelo de igual complejidad pero de representación más compacta./ In this paper we present an algorithm of binary classification based on Support Vector Machines. It is combined with a modified Boosting algorithm. It run faster than the original SVM algorithm with a similar generalization error and equal complexity model but it has more compact representation.

  14. Analytic Boosted Boson Discrimination

    CERN Document Server

    Larkoski, Andrew J; Neill, Duff

    2015-01-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, $D_2$, valid for both signal and background jets. Our factorization theorem simultaneously describes the production of both collinear and soft subjets, and we introduce a novel zero-bin procedure to correctly describe the transition region between these limits. By proving an all orders factorization theorem, we enable a systematically improvable description, and allow for precision comparisons between data, Monte Carlo, and first principles QCD calculations for jet substructure observables. Using our factorization theorem, we present numerical results for the discrimination of a boosted $Z...

  15. SUSY using boosted techniques

    CERN Document Server

    Stark, Giordon; The ATLAS collaboration

    2016-01-01

    In this talk, I present a discussion of techniques used in supersymmetry searches in papers published by the ATLAS Collaboration from late Run 1 to early Run 2. The goal is to highlight concepts the analyses have in common, why/how they work, and possible SUSY searches that could benefit from boosted studies. Theoretical background will be provided for reference to encourage participants to explore in depth on their own time.

  16. Analytic boosted boson discrimination

    OpenAIRE

    Andrew J. Larkoski; Moult, Ian; Neill, Duff

    2015-01-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, $D_2$, valid for both signal and background jets. Our factorization t...

  17. 最小比率生成树的竞争决策算法%Competitive decision algorithm for minimum ratio spanning tree

    Institute of Scientific and Technical Information of China (English)

    熊小华; 宁爱兵

    2012-01-01

    最小比率生成树是找出目标函数形式为两个线性函数比值最小的生成树,例如总代价与总收益比值最小的生成树.当不限制分母的符号时,这是一个NP-hard问题.在分析最小比率生成树数学性质的基础上,提出了最小比率生成树的竞争决策算法.为了防止算法陷入局部最优,采用edge_exchange操作来增加算法的搜索范围.为了验证算法的有效性,采用无关和相关两种策略产生测试数据,并使用Delphi 7.0买现了算法的具体步骤.%Minimum ratio spanning tree(MRST for short) is the problem of finding a minimum spanning tree when the objective function is the ratio of two linear cost functions (e.g., a ratio of cost to weight). MRST problem is NP-hard when the denominator is unrestricted in sign. Based on the mathematical properties of MRST, a competitive decision algorithm for MRST is presented. The edge-exchange is used to prevent the problem from becoming trapped in a local optimum. The algorithm is coded in Delphi 7.0, by which series of typical instances are tested. These instances are generated using two strategies, one is irrelevant strategy, and the other one is related strategy.

  18. Optimized algorithm of decision tree based on weighting factor%基于权衡因子的决策树优化算法

    Institute of Scientific and Technical Information of China (English)

    董跃华; 刘力

    2015-01-01

    Through the analysis of the issues of multivalue bias in the ID3 algorithm and subjectivity of the optimized traditional ID3 algorithm, an improved algorithm of decision tree based on weighting factor is put forward. The new algorithm introduces the weight factor that reflects the mutual relationship between the attributes. The ID3 algorithm is improved by redistricting the weight of attributes which has most values. The experiments on UCI data sets show that the optimization ID3 algorithm can overcome multivalue bias when the values of different attributes in data set are not the same. This algorithm not only improves the accuracy of average classification, but also reduces the number of average leaf nodes in the process of constructing a decision tree.%通过分析ID3算法的多值偏向问题和传统ID3改进算法中出现的主观性等问题,提出了一种基于权衡因子的决策树优化算法. 该优化算法通过引入能够反映属性之间相互依赖关系的权衡因子,对取值个数最多的属性的划分权重重新进行权衡,以完成对ID3算法的改进. 实例验证和标准数据集UCI上的实验结果表明,当数据集中属性的取值个数不相同时,优化后的ID3算法能够解决多值偏向问题, 在构建决策树的过程中, 优化后的ID3算法既能提高平均分类准确率,又能减少平均叶子节点数.

  19. Structural Equation Model Trees

    Science.gov (United States)

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  20. Short-Time Fourier Transform and Decision Tree-Based Pattern Recognition for Gas Identification Using Temperature Modulated Microhotplate Gas Sensors

    Directory of Open Access Journals (Sweden)

    Aixiang He

    2016-01-01

    Full Text Available Because the sensor response is dependent on its operating temperature, modulated temperature operation is usually applied in gas sensors for the identification of different gases. In this paper, the modulated operating temperature of microhotplate gas sensors combined with a feature extraction method based on Short-Time Fourier Transform (STFT is introduced. Because the gas concentration in the ambient air usually has high fluctuation, STFT is applied to extract transient features from time-frequency domain, and the relationship between the STFT spectrum and sensor response is further explored. Because of the low thermal time constant, the sufficient discriminatory information of different gases is preserved in the envelope of the response curve. Feature information tends to be contained in the lower frequencies, but not at higher frequencies. Therefore, features are extracted from the STFT amplitude values at the frequencies ranging from 0 Hz to the fundamental frequency to accomplish the identification task. These lower frequency features are extracted and further processed by decision tree-based pattern recognition. The proposed method shows high classification capability by the analysis of different concentration of carbon monoxide, methane, and ethanol.

  1. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  2. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    Science.gov (United States)

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-12-28

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  3. Research of H5N6 Treatment by Comparing with H6N1 and H10N8 by Using Decision Tree and Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Kim Sunghyun

    2016-01-01

    Full Text Available Since 2003, 608 people in 15 countries have infected with human-infectious AI viruses and 359 of them died. Especially, in China, H6N1 and H10N8 viruses were wide-spread and a lot of people were infected and died. Recently, H5N6 virus emerged in China and the number of patients has been increasing gradually. Therefore, this research compared amino acid strain of Matrix Protein, Hemagglutinin, Neuraminidase and Nucleoprotein of H5N6, H6N1 and H10N8, by using Decision tree and Apriori Algorithm, to figure out their similarity and devise the treatment. In result, Matrix protein and Nucleoprotein sequences of H5N6 were similar with those of H6N1 and H10N8. Therefore, this research concluded that the treatment targeting those proteins of H6N1 and H10N8 will be also effective to H5N6.

  4. Substructure of Boosted Jets

    CERN Document Server

    Duchovni, Ehud

    2013-01-01

    Jets with transverse energy of few TeV are becoming now common in LHC data. Most of these jets are produced by QCD processes and some from the collimated decay of highly boosted objects like W, Z, H0 and top-quark. The study of such QCD jets may shed light on QCD showering processes and the identification of the jets coming from decays may test the Standard Model under extreme conditions and may also provide the first hints for Physics Beyond the Standard Model. A short review of jet algorithms, Correction procedures for pile-up effects and commonly used substructure observables are described.

  5. R-C4.5决策树模型在高职就业分析中的应用%The Application of R-C4.5 Decision Tree Model in Higher Vocational Employment

    Institute of Scientific and Technical Information of China (English)

    张继美; 桂红兵

    2011-01-01

    Expounds the decision tree classification technology and R-C4.5 decision tree model.In a recent graduates of higher vocational colleges of education personal information,information and employment information data for the research object,experimental data in the data pretreatment,using R-C4.5 decision tree classification technology data mining,dig out the influence the quality of higher vocational graduate employment related factors,for government and schools improve employment of the quality of all kinds of measures and reform provides decision-making basis.%阐述了决策树分类技术和R-C4.5决策树模型。以某高职院校近几届毕业生的个人信息、教育信息和就业信息数据为研究对象,对实验数据进行数据预处理,运用R-C4.5决策树分类技术进行数据挖掘,挖掘出影响高职毕业生就业质量的相关因素,为政府和学校提高就业质量的各类措施和改革提供了决策依据。

  6. Tree sets

    OpenAIRE

    Diestel, Reinhard

    2015-01-01

    We study an abstract notion of tree structure which generalizes tree-decompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked to graph-theoretical trees, these `tree sets' can provide a suitable formalization of tree structure also for infinite graphs, matroids, or set partitions, as well as for other discrete structures, such as order trees. In this first of two papers we introduce tree sets, establish their relation to graph and order trees, and show h...

  7. Boost C++ application development cookbook

    CERN Document Server

    Polukhin, Antony

    2013-01-01

    This book follows a cookbook approach, with detailed and practical recipes that use Boost libraries.This book is great for developers new to Boost, and who are looking to improve their knowledge of Boost and see some undocumented details or tricks. It's assumed that you will have some experience in C++ already, as well being familiar with the basics of STL. A few chapters will require some previous knowledge of multithreading and networking. You are expected to have at least one good C++ compiler and compiled version of Boost (1.53.0 or later is recommended), which will be used during the exer

  8. Gradient boosting machines, a tutorial.

    Science.gov (United States)

    Natekin, Alexey; Knoll, Alois

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed. PMID:24409142

  9. Gradient Boosting Machines, A Tutorial

    Directory of Open Access Journals (Sweden)

    Alexey eNatekin

    2013-12-01

    Full Text Available Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. A set of practical examples of gradient boosting applications are presented and comprehensively analyzed.

  10. Analytic boosted boson discrimination

    Science.gov (United States)

    Larkoski, Andrew J.; Moult, Ian; Neill, Duff

    2016-05-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, D 2, valid for both signal and background jets. Our factorization theorem simultaneously describes the production of both collinear and soft subjets, and we introduce a novel zero-bin procedure to correctly describe the transition region between these limits. By proving an all orders factorization theorem, we enable a systematically improvable description, and allow for precision comparisons between data, Monte Carlo, and first principles QCD calculations for jet substructure observables. Using our factorization theorem, we present numerical results for the discrimination of a boosted Z boson from massive QCD background jets. We compare our results with Monte Carlo predictions which allows for a detailed understanding of the extent to which these generators accurately describe the formation of two-prong QCD jets, and informs their usage in substructure analyses. Our calculation also provides considerable insight into the discrimination power and calculability of jet substructure observables in general.

  11. Cost Effectiveness of Imiquimod 5% Cream Compared with Methyl Aminolevulinate-Based Photodynamic Therapy in the Treatment of Non-Hyperkeratotic, Non-Hypertrophic Actinic (Solar) Keratoses: A Decision Tree Model

    OpenAIRE

    Wilson, Edward C F

    2010-01-01

    Background: Actinic keratosis (AK) is caused by chronic exposure to UV radiation (sunlight). First-line treatments are cryosurgery, topical 5-fluorouracil (5-FU) and topical diclofenac. Where these are contraindicated or less appropriate, alternatives are imiquimod and photodynamic therapy (PDT). Objective: To compare the cost effectiveness of imiquimod and methyl aminolevulinate-based PDT (MAL-PDT) from the perspective of the UK NHS. Methods: A decision tree model was populated with data fro...

  12. Ultrarelativistic boost with scalar field

    Science.gov (United States)

    Svítek, O.; Tahamtan, T.

    2016-02-01

    We present the ultrarelativistic boost of the general global monopole solution which is parametrized by mass and deficit solid angle. The problem is addressed from two different perspectives. In the first one the primary object for performing the boost is the metric tensor while in the second one the energy momentum tensor is used. Since the solution is sourced by a triplet of scalar fields that effectively vanish in the boosting limit we investigate the behavior of a scalar field in a simpler setup. Namely, we perform the boosting study of the spherically symmetric solution with a free scalar field given by Janis, Newman and Winicour. The scalar field is again vanishing in the limit pointing to a broader pattern of scalar field behaviour during an ultrarelativistic boost in highly symmetric situations.

  13. Boosted Higgs Shapes

    CERN Document Server

    Schlaffer, Matthias; Takeuchi, Michihisa; Weiler, Andreas; Wymant, Chris

    2014-01-01

    The inclusive Higgs production rate through gluon fusion has been measured to be in agreement with the Standard Model (SM). We show that even if the inclusive Higgs production rate is very SM-like, a precise determination of the boosted Higgs transverse momentum shape offers the opportunity to see effects of natural new physics. These measurements are generically motivated by effective field theory arguments and specifically in extensions of the SM with a natural weak scale, like composite Higgs models and natural supersymmetry. We show in detail how a measurement at high transverse momentum of $H\\to 2\\ell+\\mathbf{p}\\!\\!/_T$ via $H\\to \\tau\\tau$ and $H\\to WW^*$ could be performed and demonstrate that it offers a compelling alternative to the $t\\bar t H$ channel. We discuss the sensitivity to new physics in the most challenging scenario of an exactly SM-like inclusive Higgs cross-section.

  14. Boosted Higgs shapes

    Energy Technology Data Exchange (ETDEWEB)

    Schlaffer, Matthias [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany); Spannowsky, Michael [Durham Univ. (United Kingdom). Inst. for Particle Physics Phenomenology; Takeuchi, Michihisa [King' s College London (United Kingdom). Theoretical Physics and Cosmology Group; Weiler, Andreas [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany); European Organization for Nuclear Research (CERN), Geneva (Switzerland); Wymant, Chris [Durham Univ. (United Kingdom). Inst. for Particle Physics Phenomenology; Laboratoire d' Annecy-le-Vieux de Physique Theorique, Annecy-le-Vieux (France)

    2014-05-15

    The inclusive Higgs production rate through gluon fusion has been measured to be in agreement with the Standard Model (SM). We show that even if the inclusive Higgs production rate is very SM-like, a precise determination of the boosted Higgs transverse momentum shape offers the opportunity to see effects of natural new physics. These measurements are generically motivated by effective field theory arguments and specifically in extensions of the SM with a natural weak scale, like composite Higgs models and natural supersymmetry. We show in detail how a measurement at high transverse momentum of H→2l+p{sub T} via H→ττ and H→WW{sup *} could be performed and demonstrate that it offers a compelling alternative to the t anti tH channel. We discuss the sensitivity to new physics in the most challenging scenario of an exactly SM-like inclusive Higgs cross-section.

  15. Boosted Higgs shapes

    Energy Technology Data Exchange (ETDEWEB)

    Schlaffer, Matthias [DESY, Hamburg (Germany); Spannowsky, Michael [Durham University, Department of Physics, Institute for Particle Physics Phenomenology, Durham (United Kingdom); Takeuchi, Michihisa [King' s College London, Theoretical Physics and Cosmology Group, Department of Physics, London (United Kingdom); Weiler, Andreas [DESY, Hamburg (Germany); CERN, Theory Division, Physics Department, Geneva 23 (Switzerland); Wymant, Chris [Durham University, Department of Physics, Institute for Particle Physics Phenomenology, Durham (United Kingdom); Laboratoire d' Annecy-le-Vieux de Physique Theorique, 9 Chemin de Bellevue, 74940, Annecy-le-Vieux (France); Imperial College London, Department of Infectious Disease Epidemiology, London (United Kingdom)

    2014-10-15

    The inclusive Higgs production rate through gluon fusion has been measured to be in agreement with the Standard Model (SM). We show that even if the inclusive Higgs production rate is very SM-like, a precise determination of the boosted Higgs transverse momentum shape offers the opportunity to see effects of natural new physics. These measurements are generically motivated by effective field theory arguments and specifically in extensions of the SM with a natural weak scale, like composite Higgs models and natural supersymmetry. We show in detail how a measurement at high transverse momentum of H → 2l + p{sub T} via H → ττ and H → WW* could be performed and demonstrate that it offers a compelling alternative to the t anti tH channel. We discuss the sensitivity to newphysics in the most challenging scenario of an exactly SM-like inclusive Higgs cross section. (orig.)

  16. Application of Data Mining in the Student Information System Based on Decision Tree Algorithm%基于决策树算法的数据挖掘在学生信息系统中的应用

    Institute of Scientific and Technical Information of China (English)

    侯海霞

    2012-01-01

    There are many methods of data mining,one of which is the decision tree.The decision tree method can classify data intelligently without any data for hypothesis to find hidden and valuable information according to certain rules.This paper chooses the typical C4.5 algorithm of the method of decision tree and takes the university student information system as an example to collect potential rules and factors in favor of graduate employment,so as to guide the education and management.%数据挖掘的方法很多,决策树方法是数据挖掘方法之一。决策树方法不需要对数据进行任何假设,直接将大量数据智能地分类,按照一定的规则找出隐藏的、有价值的信息。文章选取决策树方法中具有代表性的C4.5算法,以高校学生信息管理系统中毕业就业海量信息为实例生成决策树,挖掘出有利于毕业生就业的潜在规则和因素,以便指导高校的教育和管理。

  17. 基于相似度衡量的决策树自适应迁移%Self-adaptive Transfer for Decision Trees Based on Similarity Metric

    Institute of Scientific and Technical Information of China (English)

    王雪松; 潘杰; 程玉虎; 曹戈

    2013-01-01

    如何解决迁移学习中的负迁移问题并合理把握迁移的时机与方法,是影响迁移学习广泛应用的关键点.针对这个问题,提出一种基于相似度衡量机制的决策树自适应迁移方法(Self-adaptive transfer for decision trees based on a similarity metric,STDT).首先,根据源任务数据集是否允许访问,自适应地采用成分预测概率或路径预测概率对决策树间的相似性进行判定,其亲和系数作为量化衡量关联任务相似程度的依据.然后,根据多源判定条件确定是否采用多源集成迁移,并将相似度归一化后依次分配给待迁移源决策树作为迁移权值.最后,对源决策树进行集成迁移以辅助目标任务实现决策.基于UCI机器学习库的仿真结果说明,与多源迁移加权求和算法(Weighted sum rule,WSR)和MS-TrAdaBoost相比,STDT能够在保证决策精度的前提下实现更为快速的迁移.

  18. Detection of Illegitimate Emails using Boosting Algorithm

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we...... propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations....

  19. 基于优化的决策树算法在热轧工艺中的应用%Application of improved decision tree on the hot rolling process

    Institute of Scientific and Technical Information of China (English)

    钟蜜; 刘斌

    2011-01-01

    Decision tree classification method is a very effective machine learning methods, with a classification of high precision, good noise robustness of the data and the formation of the advantages of a tree model. The optimization of decision tree algorithms are mainly from the choice of the branch properties standards, decision tree pruning, and the introduction of fuzzy theory, rough set theory, genetic algorithm and neural network algorithms to optimize several aspects. This article introduces the properties of rough set theory, the importance of the principle to optimize the decision tree, first calculated for each condition attribute importance to classification, and then importance sample set size of a filter, without prejudice to the classification accuracy rate while reducing the size of tree. The algorithm in Visual C + + 6. 0 programming environment, and is applied to hot rolling model, data processing by hot rolling to verify the validity of the algorithm.%决策树分类方法是一种非常有效的机器学习方法,具有分类精度高、对噪声数据有很好的健壮性以及形成树状模式等优点,对决策树算法的优化也主要是从分支属性的选择标准,对决策树的修剪,以及引入模糊理论、粗糙集理论、遗传算法和神经网络算法等几个方面进行优化.引入粗糙集理论中的属性重要性原理来对决策树进行优化,首先计算出每个条件属性对分类的重要度,然后根据重要度大小来对样本集进行一个筛选,在不损害分类准确率的同时减小决策树的规模.整个算法在Visual C++6.0环境下编程实现,并应用于热轧工艺模型中,通过对热轧数据的处理,验证了算法的有效性.

  20. Physics with boosted top quarks

    CERN Document Server

    Kuutmann, Elin Bergeaas

    2014-01-01

    The production at the LHC of boosted top quarks (top quarks with a transverse momentum that greatly exceeds their rest mass) is a promising process to search for phenomena beyond the Standard Model. In this contribution several examples are discussed of new techniques to reconstruct and identify (tag) the collimated decay topology of the boosted hadronic decays of top quarks. Boosted top reconstruction techniques have been utilized in searches for new physical phenomena. An overview is given of searches by ATLAS, CDF and CMS for heavy new particles decaying into a top and an anti-top quark, vector-like quarks and supersymmetric partners to the top quark.

  1. The potential impact of improving appropriate treatment for fever on malaria and non-malarial febrile illness management in under-5s: a decision-tree modelling approach.

    Directory of Open Access Journals (Sweden)

    V Bhargavi Rao

    Full Text Available BACKGROUND: As international funding for malaria programmes plateaus, limited resources must be rationally managed for malaria and non-malarial febrile illnesses (NMFI. Given widespread unnecessary treatment of NMFI with first-line antimalarial Artemisinin Combination Therapies (ACTs, our aim was to estimate the effect of health-systems factors on rates of appropriate treatment for fever and on use of ACTs. METHODS: A decision-tree tool was developed to investigate the impact of improving aspects of the fever care-pathway and also evaluate the impact in Tanzania of the revised WHO malaria guidelines advocating diagnostic-led management. RESULTS: Model outputs using baseline parameters suggest 49% malaria cases attending a clinic would receive ACTs (95% Uncertainty Interval:40.6-59.2% but that 44% (95% UI:35-54.8% NMFI cases would also receive ACTs. Provision of 100% ACT stock predicted a 28.9% increase in malaria cases treated with ACT, but also an increase in overtreatment of NMFI, with 70% NMFI cases (95% UI:56.4-79.2% projected to receive ACTs, and thus an overall 13% reduction (95% UI:5-21.6% in correct management of febrile cases. Modelling increased availability or use of diagnostics had little effect on malaria management outputs, but may significantly reduce NMFI overtreatment. The model predicts the early rollout of revised WHO guidelines in Tanzania may have led to a 35% decrease (95% UI:31.2-39.8% in NMFI overtreatment, but also a 19.5% reduction (95% UI:11-27.2%, in malaria cases receiving ACTs, due to a potential fourfold decrease in cases that were untested or tested false-negative (42.5% vs.8.9% and so untreated. DISCUSSION: Modelling multi-pronged intervention strategies proved most effective to improve malaria treatment without increasing NMFI overtreatment. As malaria transmission declines, health system interventions must be guided by whether the management priority is an increase in malaria cases receiving ACTs (reducing the

  2. Gradient boosting machines, a tutorial

    OpenAIRE

    Natekin, Alexey; Knoll, Alois

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with de...

  3. Gradient Boosting Machines, A Tutorial

    OpenAIRE

    Alexey Natekin; Alois Knoll

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all th...

  4. Efficent-cutting packet classification algorithm based on the statistical decision tree%基于统计的高效决策树分组分类算法

    Institute of Scientific and Technical Information of China (English)

    陈立南; 刘阳; 马严; 黄小红; 赵庆聪; 魏伟

    2014-01-01

    Packet classification algorithms based on decision tree are easy to implement and widely employed in high-speed packet classification. The primary objective of constructing a decision tree is minimal storage and searching time complexity. An improved decision-tree algorithm is proposed based on statistics and evaluation on filter sets. HyperEC algorithm is a multiple dimensional packet classification algorithm. The proposed algorithm allows the tradeoff between storage and throughput during constructing decision tree. For it is not sensitive to IP address length, it is suitable for IPv6 packet classifi-cation as well as IPv4. The algorithm applies a natural and performance-guided decision-making process. The storage budget is preseted and then the best throughput is achieved. The results show that the HyperEC algorithm outperforms the HiCuts and HyperCuts algorithm, improving the storage and throughput performance and scalable to large filter sets.%基于决策树的分组分类算法因易于实现和高效性,在快速分组分类中广泛使用。决策树算法的基本目标是构造一棵存储高效且查找时间复杂度低的决策树。设计了一种基于规则集统计特性和评价指标的决策树算法——HyperEC 算法。HyperEC算法避免了在构建决策树过程中决策树高度过高和存储空间膨胀的问题。HyperEC算法对IP地址长度不敏感,同样适用于IPv6的多维分组分类。实验证明,HyperEC算法当规则数量较少时,与HyperCuts基本相同,但随着规则数量的增加,该算法在决策树高度、存储空间占用和查找性能方面都明显优于经典的决策树算法。

  5. 基于粗糙变精度的食品安全决策树研究%Research on Decision Tree for Food Safety Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    鄂旭; 任骏原; 毕嘉娜; 沈德海

    2014-01-01

    Food safety decision is an important content of food safety research. Based on variable precision rough sets model,a method of building decision tree with rules that have definite confidence is proposed for food safety analysis. It is an improvement for decision tree inducing approach presented in traditional methods. Present a new algorithm for constructing decision tree with variable precision weighted mean roughness as the criteria for selecting attribute. The new algorithm used variable precision approximate accuracy instead the approxi-mate accuracy. Noisy data of training sets are considered enough. Limited inconsistency is allowed to existed examples of the positive re-gions. So the decision tree is simplified and its extensive ability is improved and more comprehensible. Experiments show that the algo-rithm is feasible and effective.%食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。

  6. 决策树算法在团购商品销售预测中的应用%Application of Sales Volume Forecast of Group Purchase Based on Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    费斐; 叶枫

    2013-01-01

      网络团购,指的是互相不认识的消费者在特定的时间内在同一网站上共同购买同一种商品,以求得最优价格的一种网络购物方式。现如今,作为平台方的团购网站在面对大量报名参加团购的商品,审核过程中需要介入大量人力,对经验过于依赖。利用决策树算法,对影响团购商品销量水平的变量进行分析,生成可读的决策树,用以辅助决策,筛选出优质的商品。%Group purchase is a shopping mode that customers buying goods which been selling at a discount in a limited period of time and specific website. Nowadays, facing the large number of application of commodity. Group purchase website as a Platform, which has to intervene a lot of manpower for product review. Also may excessively dependent on the former experience. This paper intends to use the decision tree algorithm to analyse the sales volume of the group purchase goods. Generate readable decision tree to make a strategic decision and select the high quality goods.

  7. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  8. Fuzzy Decision Tree Model for Driver Behavior Confronting Yellow Signal at Signalized Intersection%交叉口黄灯期间驾驶员行为的模糊决策树模型

    Institute of Scientific and Technical Information of China (English)

    龙科军; 赵文秀; 肖向良

    2011-01-01

    Drivers decision to go or stop during the yellow interval belongs to uncertain decision making. This paper collects drivers behavior data at four similar intersections. Fuzzy Decision Tree(FDT) is applied to model driver behavior at signalized intersection. Considering vehicle location,velocity and countdown timer as the influencing factors, the FDT model is constructed using FID3 algorithm, and decision roles are generated as well. Test sample is applied to test FDT model, and results indicate that FDT model can predict drivers' decision with overall accuracy of 84.8%.%采集黄灯期间驾驶员行为的相关数据,考虑车辆位置、车速、倒计时表3个影响因素,分别设定其隶属度函数,应用模糊决策树中的FID3算法,以模糊信息熵为启发信息,构建驾驶员选择的模糊决策树模型,生成决策规则.利用测试样本对模型进行检验,结果表明,基于模糊决策树的预测结果准确率总体达到84.8%.

  9. 胶东半岛果园TM影像信息的提取决策树方法%Decision tree classification of orchard information extraction from TM imagery in Jiaodong Peninsula of China

    Institute of Scientific and Technical Information of China (English)

    于新洋; 张安定; 侯西勇

    2012-01-01

    Decision tree classification is a kind of classification model which uses certain classification rules to gradually thin the research image. It has been widely used for information extraction from remote sensing images due to its goodness of intuitive and high efficiency. Jiaodong Peninsula is one of the most famous areas in China for the production of fruits; therefore, it is very significant to monitor the distribution of orchards. In this paper, the decision tree classification was used to extract the area of orchard in Jiaodong Peninsula. Specifically, Landsat5 TM image (path 120 row034, October24, 2005) was available and five most representative cities (Penglai, Longkou, Laizhou, Qixia, Zhaoyuan) were selected as the study area. It turned out that the decision tree classification had satisfactory performance, the classification results were acceptable and could be used as the original inputs for related researches.%本文选取胶东半岛最具代表性的5个果品县(市)为研究区,以Landsat TM影像数据为分类影像,尝试提取果园信息.选用可以“无缝”融入多种辅助信息的决策树分类方法,综合NDVI、地形地貌和缨帽变换等多种辅助信息,利用年内物候变化最大的果园与背景地物的光谱差异,进行果园信息提取;利用SPOT影像以及野外考察资料作为检验样本进行精度验证.表明综合多种辅助信息,利用决策树分类法提取TM影像果园信息可行且准确性较高.

  10. 基于决策树方法的银行客户关系管理的研究和应用%Research and Application of Bank Customer Relationship Management based on the Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    李明辉

    2012-01-01

      Decision tree algorithm in data mining is a very important value in the banking industry. Decision tree technology for the banking industry, through the analysis of specific customer background information, predict the customer's customer categories in order to take the appropriate business strategy, both to improve the service level of banking services, development of client resources, to avoid the loss of customers, to conserve resources, use of a minimum investment to get a larger income. Bank lending to judge whether the borrowers have the risk of the loan proposal is feasible, customers will be classified in accordance with the actual needs of the bank, these problems can be resolved through the decision tree algorithm%  数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。

  11. 聚类支持下决策树模型的借阅数据分析%Analysis Of The Lending Data With Decision Tree Model Based On Clustering

    Institute of Scientific and Technical Information of China (English)

    翟剑锋

    2012-01-01

    通过对高校图书馆提供的借阅数据进行筛选、净化、转换等数据处理,研究了聚类支持下决策树分类技术及其在图书馆借阅数据中的应用。利用聚类得到决策树的训练样本,以期得到高质量的决策树并进一步提高推荐的准确率。以某高校图书馆借阅数据为例,将以上研究结果应用于该校图书馆借阅数据分析,分析的结果提供给图书馆管理者,作为馆藏政策、图书推荐、图书馆管理的参考依据。%Through the choice, purification and transfer of lending data provided by the library, probes into the features of library lending data by using data-mining technique, and then puts the research result into the use of library information system. The paper explores Decision Tree technique supported by clustering and its application in library, uses clustering analysis to obtain the training samples of Decision Tree, and then to obtain high-quality DecisionTree and further improve the preciseness of books' recommendation. Taking an University Library as an example ,the paper applies the above research results to analyze lending data. The result of the analysis offers a basis to collection-policy-making, books recommendation and library management for library managers.

  12. Implementation of Fuzzy Logic controller in Photovoltaic Power generation using Boost Converter and Boost Inverter

    OpenAIRE

    Abubakkar Siddik A; Shangeetha M

    2012-01-01

    Increasing in power demand and shortage of conventional energy sources, researchers are focused on renewable energy. The proposed solar power generation circuit consists of solar array, boost converter and boost inverter. Low voltage, of photovoltaic array, is boosted using dc-dc boost converter to charge the battery and boost inverter convert this battery voltage to high quality sinusoidal ac voltage. The output of solar power fed from boost inverter feed to autonomous load without any inter...

  13. Planting Trees

    OpenAIRE

    Relf, Diane

    2009-01-01

    The key aspects in planning a tree planting are determining the function of the tree, the site conditions, that the tree is suited to site conditions and space, and if you are better served by a container-grown. After the tree is planted according to the prescribed steps, you must irrigate as needed and mulch the root zone area.

  14. 基于决策树的戈壁信息提取研究%Gobi information extraction based on decision tree classification method

    Institute of Scientific and Technical Information of China (English)

    冯益明; 智长贵; 姚爱冬

    2013-01-01

    Gobi is one of the main landscape types of earth' s surface in the arid region of northwestern parts of China, with the total area of 458 000-757 000 km2, accounting for the 4.8%-7.9% of China's total land area. The gobi holds abundant natural resources such as minerals, wind energy and solar power. Meanwhile, many modern cities and towns and some important traffic routes were also constructed on the gobi region. The gobi region plays an important role in the construction of western economy. Therefore, it is important to launch the gobi research under current social and economic conditions, and accurately revealing the distribution and area of gobi is the base and premise of launching the gobi research. At present, it is difficult to do fieldwork due to the execrable natural conditions and the sparse dweller in the gobi region, which leads to the scarcity of research documents on the situation, distribution, type classification, transformation and utilization of gobi. The studied region of this paper is a typical gobi distribution region, locating in Ejina County in Inner Mongolia, China, and its climatic characteristics include lack of rain, more evaporation, full sunshine, large temperature difference and frequent windy sand weather. Using Remote Sensing imageries Landsat TM5 and TM7 of plant growth season of 2005-2010, the DEM with 30 m spatial resolution, administrative map, present land use map, field investigation data and related documents as the basic data resource. Firstly, the non-gobi distribution regions were extracted in GIS software by analyzing DEM. Then, based on the analysis of spectral characteristics of difference typical ground objects, the information extraction model of Decision Tree based on knowledge was constructed to classify the remote sensing imageries, and eroded gobi and cumulated gobi were relatively accurately separated. The general accuracy of the extracted gobi information reached 91.57%. There were few materials in China on using

  15. Resolving Boosted Jets with XCone

    CERN Document Server

    Thaler, Jesse

    2015-01-01

    We show how the recently proposed XCone jet algorithm smoothly interpolates between resolved and boosted kinematics. When using standard jet algorithms to reconstruct the decays of hadronic resonances like top quarks and Higgs bosons, one typically needs separate analysis strategies to handle the resolved regime of well-separated jets and the boosted regime of fat jets with substructure. XCone, by contrast, is an exclusive cone jet algorithm that always returns a fixed number of jets, so jet regions remain resolved even when (sub)jets are overlapping in the boosted regime. In this paper, we perform three LHC case studies---dijet resonances, Higgs decays to bottom quarks, and all-hadronic top pairs---that demonstrate the physics applications of XCone over a wide kinematic range.

  16. Boost.Asio C++ network programming

    CERN Document Server

    Torjo, John

    2013-01-01

    What you want is an easy level of abstraction, which is just what this book provides in conjunction with Boost.Asio. Switching to Boost.Asio is just a few extra #include directives away, with the help of this practical and engaging guide.This book is great for developers that need to do network programming, who don't want to delve into the complicated issues of a raw networking API. You should be familiar with core Boost concepts, such as smart pointers and shared_from_this, resource classes (noncopyable), functors and boost::bind, boost mutexes, and the boost date/time library. Readers should

  17. Boosted Horizon of a Boosted Space-Time Geometry

    CERN Document Server

    Battista, Emmanuele; Scudellaro, Paolo; Tramontano, Francesco

    2015-01-01

    We apply the ultrarelativistic boosting procedure to map the metric of Schwarzschild-de Sitter spacetime into a metric describing de Sitter spacetime plus a shock-wave singularity located on a null hypersurface, by exploiting the picture of the embedding of an hyperboloid in a five-dimensional Minkowski spacetime. After reverting to the usual four-dimensional formalism, we also solve the geodesic equation and evaluate the Riemann curvature tensor of the boosted Schwarzschild-de Sitter metric by means of numerical calculations, which make it possible to reach the ultrarelativistic regime gradually by letting the boost velocity approach the speed of light. Eventually, the analysis of the Kretschmann invariant (and of the geodesic equation) shows the global structure of space- time, as we demonstrate the presence of a "scalar curvature singularity" within a 3-sphere and find that it is also possible to define what we have called "boosted horizon", a sort of elastic wall where all particles are surprisingly pushe...

  18. Machine learning approximation techniques using dual trees

    OpenAIRE

    Ergashbaev, Denis

    2015-01-01

    This master thesis explores a dual-tree framework as applied to a particular class of machine learning problems that are collectively referred to as generalized n-body problems. It builds a new algorithm on top of it and improves existing Boosted OGE classifier.

  19. Hi-trees and their layout.

    Science.gov (United States)

    Marriott, Kim; Sbarski, Peter; van Gelder, Tim; Prager, Daniel; Bulka, Andy

    2011-03-01

    We introduce hi-trees, a new visual representation for hierarchical data in which, depending on the kind of parent node, the child relationship is represented using either containment or links. We give a drawing convention for hi-trees based on the standard layered drawing convention for rooted trees, then show how to extend standard bottom-up tree layout algorithms to draw hi-trees in this convention. We also explore a number of other more compact layout styles for layout of larger hi-trees and give algorithms for computing these. Finally, we describe two applications of hi-trees: argument mapping and business decision support.

  20. Representing Arbitrary Boosts for Undergraduates.

    Science.gov (United States)

    Frahm, Charles P.

    1979-01-01

    Presented is a derivation for the matrix representation of an arbitrary boost, a Lorentz transformation without rotation, suitable for undergraduate students with modest backgrounds in mathematics and relativity. The derivation uses standard vector and matrix techniques along with the well-known form for a special Lorentz transformation. (BT)

  1. BIM-Boost in Nederland

    NARCIS (Netherlands)

    Berlo, L.A.H.M.

    2012-01-01

    Onlangs sloot TNO een samenwerkingsovereenkomst met brancheorganisaties in de bouwkolom waaromder Bouwend Nederland en BNA. Doel van de overeenkomst: een BIM-boost in Nederland bewerkstelligen. Een gesprek met Leon van Berlo van TNO over deze en andere BIM-actualiteiten

  2. The Research of Fault Diagnosis Knowledge Representation of Track Circuit Based on Decision Tree Method%基于决策树的轨道电路故障诊断知识表示方法研究

    Institute of Scientific and Technical Information of China (English)

    刘扬

    2014-01-01

    针对ZPW-2000 A无绝缘轨道电路故障逻辑机理模糊的问题,本文采用了基于决策树的轨道电路专家系统知识表示方法。该方法首先将轨道电路故障影响较大的特征向量样本建立故障决策表,然后运用最小信息熵算法对属性值离散化,根据决策树算法快速学习及分类的特点对离散后的数据样本进行训练学习,生成故障决策树后进行知识规则的获取,在专家系统的知识库中以产生规则存储。通过对ZPW-2000 A无绝缘轨道电路的实例分析验证了该方法在轨道电路专家系统知识表示与获取中的有效性和实用性。%For the problem of ZPW-2000A jointless track circuit fault fuzzy logic mechanism,this paper adopts the track circuit expert system knowledge representation method based on decision tree. This method first samples of a greater influence on the track circuit fault feature vector to build up the fault decision table,then use the minimum information entropy algorithm to discretize attribute value,according to the characteristics of fast learning and classification of the decision tree algorithm,it trains and learns discrete data samples,obtains knowledge rules after generating fault decision tree,then the rules are stored in the knowledge base of expert system . Through the instance analysis of ZPW-2000A jointless track circuit,it verifies the method is validity and practicability in the expert system knowledge representation and acquisition of track circuit.

  3. Seeing the forest through the trees: Improving decision making on the Iowa gambling task by shifting focus from short- to long-term outcomes

    Directory of Open Access Journals (Sweden)

    Melissa T Buelow

    2013-10-01

    Full Text Available Introduction: The present study sought to examine two methods by which to improve decision making on the Iowa Gambling Task (IGT: inducing a negative mood and providing additional learning trials. Method: In the first study, 194 undergraduate students (74 male; Mage = 19.44 [SD = 3.69] were randomly assigned to view a series of pictures to induce a positive, negative, or neutral mood immediately prior to the IGT. In the second study, 276 undergraduate students (111 male; Mage = 19.18 [SD = 2.58] completed a delay discounting task and back-to-back administrations of the IGT. Results: Participants in an induced negative mood selected more from Deck C during the final trials than those in an induced positive mood. Providing additional learning trials resulted in better decision making: participants shifted their focus from the frequency of immediate gains/losses (i.e., a preference for Decks B and D to long-term outcomes (i.e., a preference for Deck D. In addition, disadvantageous decision making on the additional learning trials was associated with larger delay discounting (i.e., a preference for more immediate but smaller rewards. Conclusions: The present results indicate that decision making is affected by negative mood state, and that decision making can be improved by increasing the number of learning trials. In addition, the current results provide evidence of a relationship between performance on the IGT and on a separate measure of decision making, the delay discounting task. Moreover, the present results indicate that improved decision making on the IGT can be attributed to shifting focus towards long-term outcomes, as evidenced by increased selections from advantageous decks as well as correlations between the IGT and delay discounting task. Implications for the assessment of decision making using the IGT are discussed.

  4. Extensions and applications of ensemble-of-trees methods in machine learning

    Science.gov (United States)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  5. Boosting Applied to Word Sense Disambiguation

    OpenAIRE

    Escudero, Gerard; Marquez, Lluis; Rigau, German

    2000-01-01

    In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are s...

  6. Decision Trees in the Analysis of the Intensity of Damage to Portal Frame Buildings in Mining Areas / Drzewa Decyzyjne W Analizie Intensywności Uszkodzeń Budynków Halowych Na Terenach Górniczych

    Science.gov (United States)

    Firek, Karol; Rusek, Janusz; Wodyński, Aleksander

    2015-09-01

    The article presents a preliminary database analysis regarding the technical condition of 94 portal frame buildings located in the mining area of Legnica-Głogów Copper District (LGOM), using the methodology of decision trees. The scope of the analysis was divided into two stages. The first one included creating a decision tree by a standard CART method, and determining the importance of individual damage indices in the values of the technical wear of buildings. The second one was based on verification of the created decision tree and the importance of these indices in the technical wear of buildings by means of a simulation of individual dendritic models using the method of random forest. The obtained results confirmed the usefulness of decision trees in the early stage of data analysis. This methodology allows to build the initial model to describe the interaction between variables and to infer about the importance of individual input variables. Celem prezentowanych w artykule badań było sprawdzenie możliwości pozyskiwania informacji na temat udziału uszkodzeń w zużyciu technicznym zabudowy terenu górniczego z wykorzystaniem metody drzew decyzyjnych. Badania przeprowadzono na podstawie utworzonej przez autorów bazy danych o stanie technicznym i uszkodzeniach 94 budynków typu halowego, usytuowanych na terenie górniczym Legnicko-Głogowskiego Okręgu Miedziowego (LGOM). Do analiz przyjęto metodę drzew decyzyjnych CART - Classification & Regression Tree, na bazie której utworzono model aproksymujący wartość zużycia technicznego budynków. W efekcie ustalono wpływ poszczególnych zmiennych na przebieg modelowanego procesu (Rys. 3 i 4). W drugim etapie, stosując metodę losowych lasów przeprowadzono weryfikację wyników uzyskanych dla modelu utworzonego metodą CART (Tab. 2). Przeprowadzone badania pozwoliły na ustalenie udziałów wyspecyfikowanych kategorii uszkodzeń elementów badanych budynków w ich stopniu zużycia technicznego. Najwi

  7. A Cost-Sensitive Decision Tree Learning Model—An Application to Customer-Value Based Segmentation%基于代价敏感决策树的客户价值细分

    Institute of Scientific and Technical Information of China (English)

    邹鹏; 莫佳卉; 江亦华; 叶强

    2011-01-01

    The objective of this research is to extend the current decision tree learning model, to handle data sets with unequal misclassification costs.The research explores the issue of asymmetric misclassification costs through an application to customer-value based segmentation using empirical data collected from one of the largest credit card issuing banks in China.The data includes attributes from customer satisfaction survey and credit card transaction history is used to validate the proposed model.The results show that the proposed cost-sensitive decision tree for customer-value based segmentation is an effective method compared to the original decision tree learning model.%由于错误分类代价差异和不同价值客户数量的不平衡分布,基于总体准确率的数据挖掘方法不能体现由于客户价值不同对分类效果带来的影响.为了解决错误分类不平衡的数据分类问题,利用代价敏感学习技术扩展现有决策树模型,将这一方法应用在客户价值细分,建立基于客户价值的错分代价矩阵,以分类代价最小化作为决策树分支的标准,建立分类的期望损失函数作为分类效果的评价标准,采用中国某银行的信用卡客户数据进行实验.实验结果表明,与传统决策树方法相比,代价敏感决策树对客户价值细分问题有更好的分类效果,可以更精确地控制代价敏感性和不同种分类错误的分布,降低总体的错误分类代价,使模型能更准确反映分类的代价,有效识别客户价值

  8. Top-down induction of clustering trees

    OpenAIRE

    Blockeel, Hendrik; De Raedt, Luc; Ramon, Jan

    2000-01-01

    An approach to clustering is presented that adapts the basic top-down induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both ...

  9. Totally Corrective Boosting for Regularized Risk Minimization

    CERN Document Server

    Shen, Chunhua; Barnes, Nick

    2010-01-01

    Consideration of the primal and dual problems together leads to important new insights into the characteristics of boosting algorithms. In this work, we propose a general framework that can be used to design new boosting algorithms. A wide variety of machine learning problems essentially minimize a regularized risk functional. We show that the proposed boosting framework, termed CGBoost, can accommodate various loss functions and different regularizers in a totally-corrective optimization fashion. We show that, by solving the primal rather than the dual, a large body of totally-corrective boosting algorithms can actually be efficiently solved and no sophisticated convex optimization solvers are needed. We also demonstrate that some boosting algorithms like AdaBoost can be interpreted in our framework--even their optimization is not totally corrective. We empirically show that various boosting algorithms based on the proposed framework perform similarly on the UCIrvine machine learning datasets [1] that we hav...

  10. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.;

    2013-01-01

    We introduce a new compression scheme for labeled trees based on top trees [3]. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  11. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.;

    2015-01-01

    We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  12. Class Evolution Tree: A Graphical Tool to Support Decisions on the Number of Classes in Exploratory Categorical Latent Variable Modeling for Rehabilitation Research

    Science.gov (United States)

    Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

    2011-01-01

    The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…

  13. Application of analyzing influencing factors of life pressure in college students by decision tree%决策树分析在高校大学生生活压力影响因素分析中的应用

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

    Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions and poor interpersonal were largest (68.84%). Conclusions Combine with the characteristic of these different sub-health group when we develop mental health education and guiding. Specially, pay more attention to those poor interpersonal relationships, poor economic conditions and the only child college students.%  [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关

  14. 决策树分析在高校大学生生活压力影响因素分析中的应用%Application of analyzing influencing factors of life pressure in college students by decision tree

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

      [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关系;家庭子女数;兼职情况.经济情况差、人际关系差的大学生生活压力所占的比例最大(68.84%).[结论]开展大学生心理健康教育和指导时,要结合不同亚群人群的特点,有针对性开展;要特别关注人际关系差、经济情况差或独生子女的大学生.%Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions

  15. A New Fuzzy-Rough Decision Tree Algorithm Based on Conceptual Hierarchy%一种新的基于粗糙集的概念模糊化决策树算法

    Institute of Scientific and Technical Information of China (English)

    吴晓明

    2014-01-01

    A method which based on the combination of fuzzy-rough decision tree and conceptual hierarchy is proposed. The algorithm can be used to solve fuzzy-semantic problem.%提出了一种新的基于粗糙集的概念模糊化决策树算法。本算法将利用属性归纳和概念模糊化的方法删除不能反映概化信息的属性,结合模糊粗糙决策树算法,提取对决策有潜在价值的知识和规则。

  16. 基于决策树技术分析动态图形数据的研究与实现%Research and implementation of dynamic graph data based on decision trees technology

    Institute of Scientific and Technical Information of China (English)

    雷炜; 叶东毅

    2011-01-01

    针对传统动态数据分析方法(如时间序列分析)存在对动态图分析较繁琐的问题,研究基于决策树技术进行动态图形数据分析的方法和过程.利用采集的心电图数据和SLIQ算法加以实现,所得模型准确率约为73%.%Traditional dynamic data analysis approaches such as time series analysis turn out to have shortcoming in the analysis of dynamic graphs. In this paper, a method for dynamic graph data analysis based on decision tree technique was researched and implemented by using SLIQ algorithm to analyze real electrocardiogram data. The experiment results show that the obtained model is accurate to about 73%.

  17. MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS%基于决策树的消费行为因素建模与实现

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  18. 基于决策树的消费行为因素建模与实现%MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  19. 基于C4.5决策树的嵌入型恶意代码检测方法%Detection of Embedded Malware Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    张福勇; 齐德昱; 胡镜林

    2011-01-01

    Embedded malware has become a novel computer security threat due to its high concealment and poor detectability. However, the existing statistical analysis methods are ineffective because they do not fully consider the small number of malicious bytes and the high information gain of embedded malware. In order to solve this problem, a new detection method of embedded malware is proposed based on C4. 5 decision tree, which implements the detection by establishing a decision tree with 500 high-information-gain 3-grams extracted from training samples as the attribute. Experimental results show that the proposed method is superior to the existing methods in terms of detection rate and classification accuracy, and that it may achieve a detection rate of 99. 80% for infected Word .%嵌入型恶意代码以其高隐蔽性和难检测性,成为计算机安全的新威胁.文中针对以往的统计分析法没有充分考虑嵌入型恶意代码所占字节数小、信息增益大的特点提出一种采用C4.5决策树的嵌入型恶意代码检测方法,即通过提取训练样本中信息增益最大的500个3-gram作为属性特征,建立决策树,实现对未知嵌入型恶意代码的检测.实验结果表明,文中方法在检测率和分类准确率上均具有明显优势,对感染了嵌入型恶意代码的Word文档的检测率达99.80%.

  20. 基于决策树和链接相似的DeepWeb查询接口判定%Deep Web query interface identification based on decision tree and link-similar

    Institute of Scientific and Technical Information of China (English)

    李雪玲; 施化吉; 兰均; 李星毅

    2011-01-01

    针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法.该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口.结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率.%In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished, this paper proposed a Deep Web query interface identification method based on decision tree and link-similar. This method used attribute information gain ratio as selection level, built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features, and then used a new method based on link-similar to identify these unidentified again, distinguishing between Deep Web query interface and the interface of search engines. The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.

  1. Research on Recognition and Determination on Effective Technology Innovation Based on Decision Tree%基于决策树法的有效技术创新识别认定研究

    Institute of Scientific and Technical Information of China (English)

    吴红; 李玉平; 常飞; 耿霞

    2012-01-01

    首先论述了技术创意可行性论证的必要性、可行性影响因素及论证的基本流程;然后在对有效技术创新识别认定标准及方法简要分析的基础上,以“成本—效益”为视角,利用决策树法构建有效技术创新识别的认定模型,该模型利用决策树法的逆序归纳进行信息分析,计算出不同行动的收益与成本之间的差值,根据该差值与企业利润期望值的符合程度识别认定有效技术创新.%Firstly, this paper discusses the necessary of feasibility demonstration, influencing factors of feasibility and basic process of demonstration on the theme of technological innovation. Then, based on brief analysis and the standards and methods on recognition and determination for effective technology innovation, from the perspective of Costs-Benefits, u-sing the method of decision tree to construct the recognition and determination model of effective technology innovation, which carries out information analysis by using inverted sequence and induction of decision tree, and calculates the difference value between benefits and costs of various actions, then, according to the coincidence degree between this difference value and expected profit Value of enterprise, recognizes and determinates the effective technology innovation.

  2. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev;

    2010-01-01

    distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index......) topographic parameters were generated from Digital Elevation Models (DEMs) acquired using airborne LIDAR (Light Detection and Ranging) systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type) to explain organic/mineral field...... measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186) were developed using (1) all of the parameters, (2) the primary DEM-derived topographic (morphological/hydrological) parameters only, (3) selected pairs of parameters and (4) excluding...

  3. 基于改进决策树算法的Web数据库查询结果自动分类方法%A Categorization Approach Based on Adapted Decision Tree Algorithm for Web Databases Query Results

    Institute of Scientific and Technical Information of China (English)

    孟祥福; 马宗民; 张霄雁; 王星

    2012-01-01

    To deal with the problem that too many results are returned from a Web database in response to a user query, this paper proposes a novel approach based on adapted decision tree algorithm for automatically categorizing Web database query results. The query history of all users in the system is analyzed offline and then similar queries in semantics are merged into the same cluster. Next, a set of tuple clusters over the original data is generated in accordance to the query clusters, each tuple cluster corresponding to one type of user preferences. When a query is coming, based on the tuple clusters generated in the offline time, a labeled and leveled categorization tree, which can enable the user to easily select and locate the information he/she needs, is constructed by using the adapted decision tree algorithm. Experimental results demonstrate that the categorization approach has lower navigational cost and better categorization effectiveness, and can meet different type user's personalized query needs effectively as well.%为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.

  4. Where boosted significances come from

    Science.gov (United States)

    Plehn, Tilman; Schichtel, Peter; Wiegand, Daniel

    2014-03-01

    In an era of increasingly advanced experimental analysis techniques it is crucial to understand which phase space regions contribute a signal extraction from backgrounds. Based on the Neyman-Pearson lemma we compute the maximum significance for a signal extraction as an integral over phase space regions. We then study to what degree boosted Higgs strategies benefit ZH and tt¯H searches and which transverse momenta of the Higgs are most promising. We find that Higgs and top taggers are the appropriate tools, but would profit from a targeted optimization towards smaller transverse momenta. MadMax is available as an add-on to MadGraph 5.

  5. Positive Semidefinite Metric Learning with Boosting

    OpenAIRE

    Shen, Chunhua; Kim, Junae; Wang, Lei; Hengel, Anton van den

    2009-01-01

    The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed \\BoostMetric, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. \\BoostMetric is instead based on a key observat...

  6. Adaptive Sampling for Large Scale Boosting

    OpenAIRE

    Dubout, Charles; Fleuret, Francois

    2014-01-01

    Classical Boosting algorithms, such as AdaBoost, build a strong classifier without concern for the computational cost. Some applications, in particular in computer vision, may involve millions of training examples and very large feature spaces. In such contexts, the training time of off-the-shelf Boosting algorithms may become prohibitive. Several methods exist to accelerate training, typically either by sampling the features or the examples used to train the weak learners. Even if some of th...

  7. Recursive bias estimation and L2 boosting

    Energy Technology Data Exchange (ETDEWEB)

    Hengartner, Nicolas W [Los Alamos National Laboratory; Cornillon, Pierre - Andre [INRA, FRANCE; Matzner - Lober, Eric [RENNE, FRANCE

    2009-01-01

    This paper presents a general iterative bias correction procedure for regression smoothers. This bias reduction schema is shown to correspond operationally to the L{sub 2} Boosting algorithm and provides a new statistical interpretation for L{sub 2} Boosting. We analyze the behavior of the Boosting algorithm applied to common smoothers S which we show depend on the spectrum of I - S. We present examples of common smoother for which Boosting generates a divergent sequence. The statistical interpretation suggest combining algorithm with an appropriate stopping rule for the iterative procedure. Finally we illustrate the practical finite sample performances of the iterative smoother via a simulation study.

  8. Talking Trees

    Science.gov (United States)

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  9. 基于决策树分类的云南省迪庆地区景观类型研究%Exploring Landscapes Based on Decision Tree Classification in the Diqin Region, Yunnan Province

    Institute of Scientific and Technical Information of China (English)

    李亚飞; 刘高焕; 黄翀

    2011-01-01

    Decision tree classification is a type of supervised classification method based on spatial data mining and knowledge discovery. In this paper, the authors examined the landscape pattern of the Diqin region by building the classification decision tree in Yunnan province and using Landsat TM imagery and digital elevation models (DEMs). Subsequently, a landscape distribution map was made. In order to look at the reliability and robustness of the decision tree classification method,the traditional supervised classification was used to derive a landscape distribution map over the region. A multitude of field sampling points were used to evaluate the accuracy of the two classification methods, covering the whole Diqing region and consisting of information regarding geographic coordinates, elevations, and the description of the major landscape types. Results indicate that the overall classification accuracies of the decision tree classification and the traditional supervised classification were 85.5% and 67.4% , respectively. The landscape distribution map derived by the decision tree classification method seems to be reliable in terms of the achievable accuracy. Several conclusions could be drawn by analyzing the derived landscape distribution map as follows. Landscape types in the Diqin region primarily included valley shrub,coniferous forest, sub alpine shrub meadow, alpine snow and ice, bare land, and water body,accounting for 5.5%, 36.16%, 3.4%, 3.7%, 25.4%, and 4.4% of the Diqin region area, respectively.Except bare land and water body, other landscape types varied essentially with elevation and aspect of maintains. The landscape of the largest area was found to be coniferous forest, which was consistent with the landform of alpine and canyon. Coniferous forest was the major landscape in the region, which was distributed over 3000 m above the sea level. In terms of different elevations,the coniferous forest could be conceptually divided into three

  10. Characterization of African Bush Mango trees with emphasis on the differences between sweet and bitter trees in the Dahomey Gap (West Africa)

    NARCIS (Netherlands)

    Vihotogbe, R.

    2012-01-01

     African bush mango trees (ABMTs) are economically the most important species within the family of Irvingiaceae. They are priority trees producing non-timber forest products (NTFPs) and widely distributed in the humid lowland forests of West and Central Africa. To boost their production and dev

  11. The Research of Reliability of Trash E-mail Identifier Based on Decision Tree of Continuous Attributes%连续属性决策树所建立的垃圾邮件识别器的稳定性研究

    Institute of Scientific and Technical Information of China (English)

    王星; 谢邦昌

    2005-01-01

    Avoiding spare mial is one of the most critical problem in Internet technology, finding the most important attribute or the attribute combination to identify which email is normal and which email is spam mail, is the bottleneck of discriminate of the spam. Recent years, decision tress is popular used for excellent with good expression and capable to output rules, and then becomes the core technique in predicting spam mail. However, many famous decision trees such as CA .5 and CART is not very robust,that make the output is not stable which distrubing the construction of the identifying classification. In this paper, we studied the robust of CART algorithm, point out the robust problem when using the decision tree classifier on identifying Spam from normal email with interval attribute, then we try to using BAGGING algorithm to gain more robust model, an at the same time increase the performance of the initial models.

  12. 小波分析和决策树在低饱和度气层识别中的应用%Applying the wavelet analysis and decision tree to identify low-saturation natural gas

    Institute of Scientific and Technical Information of China (English)

    贺旭; 李雄炎; 周金煜; 于红岩

    2011-01-01

    The particular reservoir condition and low-amplitude structural trap generate the abundant low saturation natural gas in the Quaternary ot the Sanhu area in the Qaidam basin. It is difficult to accurately delineate reservoirs because of the poor reservoir properties, thin reservoir thickness and limitations of surrounding rocks and logging instrument resolution. The effects of the high shale content, high irreducible water saturation, high formation water salinity, and clay minerals result in the log curves show much ambiguity at Iow-saturation natural gas, so that the identification of low-saturation natural gas is particularly difficult. To solve this problem, this work uses wavelet analysis to reconstruct log curves in order to improve the vertical resolution, makes a comparative analysis with the imaging logging data, and uses improved log curves to accurately delineate reservoirs. At the same time, we employ the decision tree to set up the predictive model of low-saturation natural gas based on the transparency of learning process and intelligibility of study results of the decision tree. This study amends the predictive model based on actual characteristics of reservoirs and achieves the purpose of an accurate identification of low-saturation natural gas. Practical application shows that the wavelet analysis and decision tree can effectively solve the reservoir delineationand identification of low-saturation natural gas problem in the research area.%特殊的成藏条件和低幅度构造圈闭致使柴达木盆地三湖地区第四系存在大量的低饱和度气藏.储层物性较差,储层厚度偏薄,受围岩和测井仪器分辨率的限制,难以准确划分储层;高泥质含量、高束缚水饱和度、高地层水矿化度和粘土矿物的影响,致使测井曲线在低饱和度气层表现出许多模糊性,使低饱和废气层的识别显得尤为困难.针对这一问题,文章采用小波分析对测井曲线进行重构,以提高测井曲

  13. 决策树在居民就诊影响因素研究中的应用%Application of decision tree in study of factors affecting residential medical treatment service

    Institute of Scientific and Technical Information of China (English)

    刘海霞; 钟晓妮; 周燕荣; 田考聪

    2011-01-01

    目的了解影响重庆地区居民就诊服务的主要影响因素,满足更多居民卫生服务需求,提高卫生服务利用率.方法 针对重庆地区不同人群的影响因素,采取不同的卫生政策,构建影响居民就诊率的决策树模型.结果 调查的11 570名居民中,合计就诊人次为2 447人次,平均就诊次数2.1次,两周就诊率为21.15%(城市为12.58%、农村为29.19%),高于全国平均水平,而各年龄段的就诊率呈现中间低两端高的趋势,各年龄段就诊率差异有统计学意义(P<0.05);从决策树模型来看,此决策树共有17个节点,对应17条分类规则,树的根节点为职业类型,此变量对就诊率的影响最大,职业类型、年龄、居民类型、参保情况以及家庭年收入对居民就诊影响较大,且所选出的影响因素对不同人群的影响不同.结论 重庆地区居民就诊卫生服务利用相对较高,且不同人群的影响因素不同,因此,在制订卫生服务规划时应针对不同人群提出相应的卫生政策.%Objective To better know main factors affecting the treatment service of residents to meet the demands of health service of more residents and improve, health service the utilization.Methods Aiming at the different affecting factors of different crowds in Chongqing area and adopting different health polices, the decision tree model affecting the rate of residential seeking medical care was constracted.Results Of 11 570 residents receiving investigation,there were 2 447 person seeing the doctors in total,2.1 times on average, the rate of 2-week seeking medical care was 21.15% (12.58% in city and 29.19% in rural areas) , which was higher than national average.However,the seeking medical care rate for each age section showed the tendency that the middle part was lower than both ends.There were statistical differences for treatment rate of each age section.As far as decision tree model was concerned,there were 17 nodal points in the

  14. Research on Urban Water Body Extraction Using Knowledge-based Decision Tree%基于知识决策树的城市水体提取方法研究

    Institute of Scientific and Technical Information of China (English)

    陈静波; 刘顺喜; 汪承义; 尤淑撑; 王忠武

    2013-01-01

    针对城市水体与建筑物阴影、沥青路面和浓密植被等暗地物的光谱混淆性,构建了结合光谱特征和空间特征的城市水体提取知识决策树.其基本思路为:首先利用短波红外波段提取暗地物,其次分别利用浓密植被在近红外波段和沥青路面在红波段中的反射率剔除这两类暗地物,再次利用空间密度特征剔除建筑物阴影,最后根据面积对水体进行补充识别.与现有方法相比,本方法提出了城市水体提取中需关注的暗地物类型并开展针对性特征分析,并利用由噪声环境下密度聚类方法(DBSCAN)描述的空间密度特征区分城市水体和建筑物阴影.对北京城区SPOT 5多光谱影像开展的实验得到的检测率为86.18%,虚警率为13.82%,表明本方法是基于中分辨率多光谱影像提取城市水体的有效方法.%In view of the spectral mixing between water body,building shadow,asphalt road and dense vegetation in urban environments knowledge-based decision tree combining spectral and spatial features is constructed to extract water body thematic information in this paper. Firstly,dark objects in urban environment are extracted using threshold of reflectance in SWIR. Secondly,dense vegetation and asphalt road are eliminated according to their reflectance in NIR and R respectively. Thirdly, differences in spatial density are used to eliminate building shadow. Finally,area threshold is used for supplementary recognition of water body. The consideration of dark objects in urban water body extraction,and the using of spatial density described by DBSCAN in discriminating water body from building shadow are two main differences between the proposed decision tree and state-of-art methods. SPOT-5 multispectral imagery of Beijing is used to validate the proposed knowledge-based decision tree. The detection rate is 86.18% and false alarm rate is 13. 82%. It can be concluded that the proposed model is an effective method in

  15. Holy Trees

    OpenAIRE

    Elosua, Miguel

    2013-01-01

    Puxi's streets are lined with plane trees, especially in the former French Concession (and particularly in the Luwan and Xuhui districts). There are a few different varieties of plane tree, but the one found in Shanghai, is the hybrid platane hispanica. In China they are called French Plane trees (faguo wutong - 法国梧桐), for they were first planted along the Avenue Joffre (now Huai Hai lu - 淮海路) in 1902 by the French. Their life span is long, over a thousand years, and they may grow as high as ...

  16. Face Alignment Using Boosting and Evolutionary Search

    NARCIS (Netherlands)

    Zhang, Hua; Liu, Duanduan; Poel, Mannes; Nijholt, Anton; Zha, H.; Taniguchi, R.-I.; Maybank, S.

    2010-01-01

    In this paper, we present a face alignment approach using granular features, boosting, and an evolutionary search algorithm. Active Appearance Models (AAM) integrate a shape-texture-combined morphable face model into an efficient fitting strategy, then Boosting Appearance Models (BAM) consider the f

  17. 一种用于网络取证分析的模糊决策树推理方法%Fuzzy Decision Tree Based Inference Techniques for Network Forensic Analysis

    Institute of Scientific and Technical Information of China (English)

    刘在强; 林东岱; 冯登国

    2007-01-01

    网络取证是对现有网络安全体系的必要扩展,已日益成为研究的重点.但目前在进行网络取证时仍存在很多挑战:如网络产生的海量数据;从已收集数据中提取的证据的可理解性;证据分析方法的有效性等.针对上述问题,利用模糊决策树技术强大的学习能力及其分析结果的易理解性,开发了一种基于模糊决策树的网络取证分析系统,以协助网络取证人员在网络环境下对计算机犯罪事件进行取证分析.给出了该方法的实验结果以及与现有方法的对照分析结果.实验结果表明,该系统可以对大多数网络事件进行识别(平均正确分类率为91.16%),能为网络取证人员提供可理解的信息,协助取证人员进行快速高效的证据分析.%Network forensics is an important extension to present security infrastructure,and is becoming the research focus of forensic investigators and network security researchers.However many challenges still exist in conducting network forensics:The sheer amount of data generated by the network;the comprehensibility of evidences extracted from collected data;the efficiency of evidence analysis methods,etc.Against above challenges,by taking the advantage of both the great learning capability and the comprehensibility of the analyzed results of decision tree technology and fuzzy logic,the researcher develops a fuzzy decision tree based network forensics system to aid an investigator in analyzing computer crime in network environments and automatically extract digital evidence.At the end of the paper,the experimental comparison results between our proposed method and other popular methods are presented.Experimental results show that the system can classify most kinds of events (91.16% correct classification rate on average),provide analyzed and comprehensible information for a forensic expert and automate or semi-automate the process of forensic analysis.

  18. Redundant Data Mining Based on Residual Data Merging in Decision Tree%决策树下引入残差数据合并的冗余数据挖掘

    Institute of Scientific and Technical Information of China (English)

    王倩

    2014-01-01

    提出采用残差数据合并技术的冗余数据优化挖掘算法,利用训练集建立决策树模型,引入C4.5决策树模型进行冗余数据主特征建模,在主分量特征决策树下,引入残差数据合并技术,设定数据残差特征伴随追踪模式,把传统方法中用于滤除的数据信息进行拼接伴随追踪定位,实现了冗余数据特征的优化挖掘。把方法应用到网络流量时间序列数据处理中实现网络异常监测,仿真实验表明,新的数据挖掘算法能有效提取到冗余数据特征作为有用检测特征,数据挖掘效率大幅提高,有效促进了海量数据隐藏特征的挖掘和应用,设计的网络流量监测软件能提高网络管理和监测实效性。%An improved optimization data mining algorithm based on redundant data merging technology was proposed. The training set was used to build the decision tree model, the C4.5 decision tree model was used for redundant data main fea-ture modeling. The accompanied tracking model of residual feature was set, and the information was used for tracking and positioning with data splicing. The optimization of redundant data mining was realized finally. It was applied into the net-work traffic anomaly detection, simulation result shows that improved method can extract the effective redundant data fea-ture as useful feature, and data mining efficiency is improved greatly. It can promote the massive data mining development with using the hidden features. And the designed network traffic monitoring software can improve the effectiveness of net-work management and monitoring.

  19. 基于决策树体系的预想故障集下风电场扰动风险测度评估%Disturbance Risk Measure of Wind Farm Based on Decision Trees under Contingency

    Institute of Scientific and Technical Information of China (English)

    卓毅鑫; 徐铝洋; 张伟; 林湘宁; 李正天

    2015-01-01

    With the development of wind power and scale of wind farm, the spatial distribution difference between wind turbines also increase. Besides, wind turbine trip-off and damage accidents has occurred frequently because of the severe wind conditions, having adverse impacts on the stability and safety operation of power grid. Therefore, it is necessary to study the online risk assessment method for power system with wind energy. Considering the wind turbine spatial distribution difference, this paper proposed an online disturbance risk measure of wind farm based on decision trees, which can perform data mining on online information, and make fast judgement on voltage violation and wind turbine trip-off. Furthermore, according to the judgement of decision trees, disturbance risk measure indices are proposed, which are visualized and provide supportive information for wind farm and power system operators.%随着风力发电的大力发展及风电场规模的持续增加,风机的空间分布差异性愈发显著。此外,风机运行状态易受风电场元件故障、电网扰动等诸多因素的影响,因此,建立实时在线评估方法和预警机制已成为当务之急。该文考虑了风电场风机分布的离散特性,建立了风电场动态安全决策树体系,并提出风电场扰动风险测度指标。该决策树体系可利用在线信息进行数据挖掘,针对预想故障集下的风机电压越限与脱网状况进行快速分析判断,并根据判断结果输出扰动测度指标,为电网及风电场运行人员提供直观地风险水平及决策参考。通过风电场算例分析,验证了所提方法的有效性。

  20. ATLAS boosted object tagging 2

    CERN Document Server

    Caudron, Julien; The ATLAS collaboration

    2015-01-01

    A detailed study into the optimal techniques for identifying boosted hadronically decaying W or Z bosons is presented. Various algorithms for reconstructing, grooming and tagging bosonic jets are compared for W bosons with a wide range of transverse momenta using 8 TeV data and 8 TeV and 13 TeV MC simulations. In addition, given that a hadronic jet has been identified as resulting from the hadronic decay of a W or Z, a technique is developed to discriminate between W and Z bosons. The modeling of the tagging variables used in this technique is studied using 8 TeV pp collision data and systematic uncertainties for the tagger efficiency and fake rates are evaluated.

  1. Implementation of Fuzzy Logic controller in Photovoltaic Power generation using Boost Converter and Boost Inverter

    Directory of Open Access Journals (Sweden)

    Abubakkar Siddik A

    2012-06-01

    Full Text Available Increasing in power demand and shortage of conventional energy sources, researchers are focused on renewable energy. The proposed solar power generation circuit consists of solar array, boost converter and boost inverter. Low voltage, of photovoltaic array, is boosted using dc-dc boost converter to charge the battery and boost inverter convert this battery voltage to high quality sinusoidal ac voltage. The output of solar power fed from boost inverter feed to autonomous load without any intermediate conversion stage and a filter. For boost converter operation duty cycle is varied through fuzzy logic controller and PWM block to regulate the converter output voltage. The ac voltage total harmonic distortion (THD obtained using this configuration is quite acceptable. The proposed power generation system has several desirable features such as low cost and compact size as number of switches used, are limited to four as against six switches used in classical two-stage inverters.

  2. Electron Tree

    DEFF Research Database (Denmark)

    Appelt, Ane L; Rønde, Heidi S

    2013-01-01

    The photo shows a close-up of a Lichtenberg figure – popularly called an “electron tree” – produced in a cylinder of polymethyl methacrylate (PMMA). Electron trees are created by irradiating a suitable insulating material, in this case PMMA, with an intense high energy electron beam. Upon discharge......, during dielectric breakdown in the material, the electrons generate branching chains of fractures on leaving the PMMA, producing the tree pattern seen. To be able to create electron trees with a clinical linear accelerator, one needs to access the primary electron beam used for photon treatments. We...... appropriated a linac that was being decommissioned in our department and dismantled the head to circumvent the target and ion chambers. This is one of 24 electron trees produced before we had to stop the fun and allow the rest of the accelerator to be disassembled....

  3. Feedback of trees on nitrogen mineralization to restrict the advance of trees in C4 savannahs.

    Science.gov (United States)

    Higgins, Steven I; Keretetse, Moagi; February, Edmund C

    2015-08-01

    Remote sensing studies suggest that savannahs are transforming into more tree-dominated states; however, progressive nitrogen limitation could potentially retard this putatively CO2-driven invasion. We analysed controls on nitrogen mineralization rates in savannah by manipulating rainfall and the cover of grass and tree elements against the backdrop of the seasonal temperature and rainfall variation. We found that the seasonal pattern of nitrogen mineralization was strongly influenced by rainfall, and that manipulative increases in rainfall could boost mineralization rates. Additionally, mineralization rates were considerably higher on plots with grasses and lower on plots with trees. Our findings suggest that shifting a savannah from a grass to a tree-dominated state can substantially reduce nitrogen mineralization rates, thereby potentially creating a negative feedback on the CO2-induced invasion of savannahs by trees.

  4. 基于神经网络与决策树的土壤粗糙度测量%Soil surface roughness measuring method based on neural network and decision tree

    Institute of Scientific and Technical Information of China (English)

    李俐; 王荻; 潘彩霞; 王鹏新

    2015-01-01

    Soil surface roughness is one of the important indices commonly used to describe soil hydrological characteristics and Lambert characteristic. In microwave quantitative remote sensing application, it affects the microwave scattering values and therefore impacts the accuracy of soil moisture retrieved using microwave sensing data. Therefore, measuring soil surface roughness has become one of the research hotspots in the field of microwave remote sensing. Two kinds of techniques are used to calculate soil surface roughness, including contact method, such as the pin meter and profile meter, and non-contact method, such as ultrasonic measurement, laser scanning, three-dimensional photography, infrared measurement and radar measurement method. All these methods need some special device. The development of image processing technology and the popularization of digital camera provide a simple measuring method which only needs a reference whiteboard and a camera. However, the detailed scale information commonly used on the reference whiteboard increases the requirements for data acquisition and data processing. The purpose of this study is to provide a method to obtain the soil surface image with a simplified reference whiteboard and then to measure soil surface roughness in the presence of field environmental noise. Therefore, a simple image acquisition method is introduced and then an image processing method combining the neural network and the decision tree is proposed. The neural network is built to detect image edge points. To reduce the environmental noise effect, the input characteristic parameters of the neural network are selected carefully, which include not only gradient information, but also image direction and neighborhood consistency information. The cutting of the background section on the original image based on image edge detection result improves the computing speed effectively. A decision tree model is introduced to divide image segments into 4 classes

  5. Game tree algorithms and solution trees

    NARCIS (Netherlands)

    W.H.L.M. Pijls (Wim); A. de Bruin (Arie)

    1998-01-01

    textabstractIn this paper, a theory of game tree algorithms is presented, entirely based upon the concept of solution tree. Two types of solution trees are distinguished: max and min trees. Every game tree algorithm tries to prune nodes as many as possible from the game tree. A cut-off criterion in

  6. 基于 C4.5决策树的股票数据挖掘%Stock Data Mining Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    王领; 胡扬

    2015-01-01

    由于目前利用数据挖掘算法对股票分析和预测存在数据量及技术指标等方面的问题,本文基于对股市数据的分析,适当选取某些指标作为决策属性,利用C4.5决策树分类算法进行分类预测。主要对股票技术指标进行介绍和优化,对C4.5算法的效率进行改进。改进后的算法结合优化的技术指标不仅能够提高数据挖掘的执行效率,同时也能在股票预测方面得到更高的收益。%Using data mining algorithms to analze and forecast the stock still has problems in technical indicators and quantity of data.Based on the analysis of stock market data, this paper selected certain indicators as decision attribute, and used C4.5 deci-sion tree to classify and forecast the stock.This article mainly optimized the indicators of stock, and improved the efficiency of C4.5 algorithm.Optimized algorithm combining with improved indicators not only enhances the efficiency of data mining, also gets better returns in stock forecasting.

  7. Orthodontics Align Crooked Teeth and Boost Self-Esteem

    Science.gov (United States)

    ... desktop! more... Orthodontics Align Crooked Teeth and Boost Self- esteem Article Chapters Orthodontics Align Crooked Teeth and Boost Self- esteem Orthodontics print full article print this chapter email ...

  8. 基于决策树的虚拟咨询团队成员选择路径%The Decision Tree-based Path for Selecting Virtual Consulting Team Members

    Institute of Scientific and Technical Information of China (English)

    尚珊; 胡贵玲; 崔洁

    2012-01-01

    This paper expatiates on the importance of the virtual consulting team in the development of the virtual consulting enterprise.Based on the comparative analysis of the virtual consulting enterprises themselves with the entity consulting enterprises and virtual enterprises,this paper discusses the existing problems in virtual consulting enterprises nowadays,and points out that the virtual team cooperation in virtual consulting enterprises is an important approach to solve these problems.The paper gives the selection process of virtual consulting team cooperation,and for the first time puts forward the specific practice of using decision tree to select team members.%阐述虚拟咨询团队在虚拟咨询企业发展中的重要作用,通过对虚拟咨询企业自身及与实体咨询企业、虚拟企业的对比分析,探讨虚拟咨询企业现今存在的问题,并提出虚拟咨询企业实现虚拟团队合作是解决这些问题的一条重要途径,给出虚拟咨询团队合作的选择流程,并且首次提出利用决策树来选择团队成员的具体做法。

  9. Application of decision tree classification to rubber plantations extraction with remote sensing%基于决策树分类的橡胶林地遥感识别

    Institute of Scientific and Technical Information of China (English)

    刘晓娜; 封志明; 姜鲁光

    2013-01-01

    . Based on Landsat remote sensing image data and MODIS-NDVI data, rubber plantations were extracted by the decision tree classification method in BRCLM using spectral features and texture characteristics. The results showed that: (1) On account of spectral differences between rubber forests at different growth stages, we were able to extract rubber plantations according to young rubber forest (<10 a) and mature rubber forest (≥10 a) respectively. The optimum temporal window to discriminate rubber plantations was from early January to late March, which is especially appropriate for mature rubber forest. Mature rubber forest, dry land with high vegetation cover, and forest land were prone to misclassification. Meanwhile, young rubber forest, tea plantation, shrubland and grassland were confused with each type in spectral characteristics according to the index of NDVI. (2) Based on the original spectral characteristics, normalized indices, K-T transform indices, and texture features, we established young rubber forest and mature rubber forest decision tree classification models respectively. The overall accuracy of the mature rubber forest went beyond 90%, and the young rubber forest beyond 75%, which meant that the decision tree method was better for mature rubber forest extraction. The rubber plantation distribution maps were obtained using the established decision tree models in 1980, 1990, and 2000 with high classification accuracy, which indicated that the models were simple and efficient for extracting rubber plantations in the tropical areas. This is an effective method for perennial vegetation extraction and classification accuracy verification. (3) From 1980 to 2010, the size of rubber plantations in BRCLM increased nearly nine times, from 705 km2 to 6 014 km2, and the expansion rate of the young rubber forest was faster than that of the mature rubber forest. National differences of rubber plantations in BRCLM were significant, and the cross-border planting

  10. 决策树 ID3算法在客户信息分类中的应用%Application of decision tree ID3 algorithm in classification of customer information

    Institute of Scientific and Technical Information of China (English)

    吴建源

    2014-01-01

    In modern enterprises, how to retain ecustomers is important research direction of the enterprise customer management.This paper uses the decision tree ID3 algorithm to analyze characteristics of customer attributes, realize the classification of customer information, find out the characteristics of all kinds of customers, and specifically improve the relationship with the customers, so as to avoid the customer loss, and improve the market share.%在现代企业,如何保留客户是企业客户管理的重要研究方向。使用决策树 ID3算法,分析客户的属性特征,实现客户信息的分类,找出各类客户的特征,有针对性地改善客户关系,从而避免客户流失,提高市场的占有率。

  11. Bank Customer Churn Decision Tree Prediction Algorithm under Data mining Technology%数据挖掘技术下的银行客户流失决策树预测算法

    Institute of Scientific and Technical Information of China (English)

    石杨; 岳嘉佳

    2014-01-01

    在银行客户流失预测系统中经常要通过客户数据对未知客户的服务信息进行预测,以对银行今后的经营策略提供依据。在对客户的预测中,经常需要对他们的某种分类属性进行分类规则挖掘。该文主要探讨使用决策树这种常用的有效的方法来对客户数据进行分类规则挖掘。%In the bank customer churn prediction system often unknown by the customer data to predict customer service infor-mation in order to provide the basis for the bank in the future business strategy. In the customer's forecast, they often need to clas-sify certain classification rule mining properties. This paper discusses the use of this common and effective decision tree approach to classification rule mining of customer data.

  12. 基于C5.0决策树的税务稽查研究%Tax Inspection Research Based on C5.0 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    陈仕鸿; 刘晓庆

    2011-01-01

    The principle of C5.0 decision tree is analyzed and used in tax inspection. Through its model financial statements and tax declarations of 80 businesses and enterprises are analyzed and compared with binary Logistic regression. The result shows the model can assist the inspection and improve efficiency and effectiveness of checking case selection.%简要分析了C5.0决策树原理,并将它应用于税务稽查中,通过C5.0决策树模型,对80个商业企业的财务报表和纳税申报袁的分析,再与二分类Logistic回归法进行比较,结论表明该模型方法能够辅助稽查选案,提高稽查选案工作的效率和效果。

  13. Big Fish and Prized Trees Gain Protection

    Institute of Scientific and Technical Information of China (English)

    Fred Pearce; 吴敏

    2004-01-01

    @@ Decisions made at a key conservation① meeting are good news for big and quirky② fish and commercially prized trees. Several species will enjoy extra protection against trade following rulings made at the Convention on International Trade in Endangered Species (CITES).

  14. Riemann curvature of a boosted spacetime geometry

    Science.gov (United States)

    Battista, Emmanuele; Esposito, Giampiero; Scudellaro, Paolo; Tramontano, Francesco

    2016-10-01

    The ultrarelativistic boosting procedure had been applied in the literature to map the metric of Schwarzschild-de Sitter spacetime into a metric describing de Sitter spacetime plus a shock-wave singularity located on a null hypersurface. This paper evaluates the Riemann curvature tensor of the boosted Schwarzschild-de Sitter metric by means of numerical calculations, which make it possible to reach the ultrarelativistic regime gradually by letting the boost velocity approach the speed of light. Thus, for the first time in the literature, the singular limit of curvature, through Dirac’s δ distribution and its derivatives, is numerically evaluated for this class of spacetimes. Moreover, the analysis of the Kretschmann invariant and the geodesic equation shows that the spacetime possesses a “scalar curvature singularity” within a 3-sphere and it is possible to define what we here call “boosted horizon”, a sort of elastic wall where all particles are surprisingly pushed away, as numerical analysis demonstrates. This seems to suggest that such “boosted geometries” are ruled by a sort of “antigravity effect” since all geodesics seem to refuse to enter the “boosted horizon” and are “reflected” by it, even though their initial conditions are aimed at driving the particles toward the “boosted horizon” itself. Eventually, the equivalence with the coordinate shift method is invoked in order to demonstrate that all δ2 terms appearing in the Riemann curvature tensor give vanishing contribution in distributional sense.

  15. (In)direct Detection of Boosted Dark Matter

    CERN Document Server

    Agashe, Kaustubh; Necib, Lina; Thaler, Jesse

    2014-01-01

    We initiate the study of novel thermal dark matter (DM) scenarios where present-day annihilation of DM in the galactic center produces boosted stable particles in the dark sector. These stable particles are typically a subdominant DM component, but because they are produced with a large Lorentz boost in this process, they can be detected in large volume terrestrial experiments via neutral-current-like interactions with electrons or nuclei. This novel DM signal thus combines the production mechanism associated with indirect detection experiments (i.e. galactic DM annihilation) with the detection mechanism associated with direct detection experiments (i.e. DM scattering off terrestrial targets). Such processes are generically present in multi-component DM scenarios or those with non-minimal DM stabilization symmetries. As a proof of concept, we present a model of two-component thermal relic DM, where the dominant heavy DM species has no tree-level interactions with the standard model and thus largely evades dir...

  16. Doubly robust survival trees.

    Science.gov (United States)

    Steingrimsson, Jon Arni; Diao, Liqun; Molinaro, Annette M; Strawderman, Robert L

    2016-09-10

    Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing 'loss based' recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new 'doubly robust' extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27037609

  17. Boosting Wigner's nj-symbols

    CERN Document Server

    Speziale, Simone

    2016-01-01

    We study the SL(2,C) Clebsch-Gordan coefficients appearing in the lorentzian EPRL spin foam amplitudes for loop quantum gravity. We show how the amplitudes decompose into SU(2) nj-symbols at the vertices and integrals over boosts at the edges. The integrals define edge amplitudes that can be evaluated analytically using and adapting results in the literature, leading to a pure state sum model formulation. This procedure introduces virtual representations which, in a manner reminiscent to virtual momenta in Feynman amplitudes, are off-shell of the simplicity constraints present in the theory, but with the integrands that peak at the on-shell values. We point out some properties of the edge amplitudes which are helpful for numerical and analytical evaluations of spin foam amplitudes, and suggest among other things a simpler model useful for calculations of certain lowest order amplitudes. As an application, we estimate the large spin scaling behaviour of the simpler model, on a closed foam with all 4-valent edg...

  18. Modeling of asymmetrical boost converters

    Directory of Open Access Journals (Sweden)

    Eliana Isabel Arango Zuluaga

    2014-03-01

    Full Text Available The asymmetrical interleaved dual boost (AIDB is a fifth-order DC/DC converter designed to interface photovoltaic (PV panels. The AIDB produces small current harmonics to the PV panels, reducing the power losses caused by the converter operation. Moreover, the AIDB provides a large voltage conversion ratio, which is required to step-up the PV voltage to the large dc-link voltage used in grid-connected inverters. To reject irradiance and load disturbances, the AIDB must be operated in a closed-loop and a dynamic model is required. Given that the AIDB converter operates in Discontinuous Conduction Mode (DCM, classical modeling approaches based on Continuous Conduction Mode (CCM are not valid. Moreover, classical DCM modeling techniques are not suitable for the AIDB converter. Therefore, this paper develops a novel mathematical model for the AIDB converter, which is suitable for control-pur-poses. The proposed model is based on the calculation of a diode current that is typically disregarded. Moreover, because the traditional correction to the second duty cycle reported in literature is not effective, a new equation is designed. The model accuracy is contrasted with circuital simulations in time and frequency domains, obtaining satisfactory results. Finally, the usefulness of the model in control applications is illustrated with an application example.

  19. Avoiding Anemia: Boost Your Red Blood Cells

    Science.gov (United States)

    ... link, please review our exit disclaimer . Subscribe Avoiding Anemia Boost Your Red Blood Cells If you’re ... and sluggish, you might have a condition called anemia. Anemia is a common blood disorder that many ...

  20. Anemia Boosts Stroke Death Risk, Study Finds

    Science.gov (United States)

    ... page: https://medlineplus.gov/news/fullstory_160476.html Anemia Boosts Stroke Death Risk, Study Finds Blood condition ... 2016 (HealthDay News) -- Older stroke victims suffering from anemia -- a lack of red blood cells -- may have ...