WorldWideScience

Sample records for boosted decision trees

  1. Reweighting with Boosted Decision Trees

    CERN Document Server

    Rogozhnikov, A

    2016-01-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the "signal" events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  2. Studies of Boosted Decision Trees for MiniBooNE Particle Identification

    OpenAIRE

    Yang, Hai-Jun; Roe, Byron P.; Zhu, Ji

    2005-01-01

    Boosted decision trees are applied to particle identification in the MiniBooNE experiment operated at Fermi National Accelerator Laboratory (Fermilab) for neutrino oscillations. Numerous attempts are made to tune the boosted decision trees, to compare performance of various boosting algorithms, and to select input variables for optimal performance.

  3. Supervised hashing using graph cuts and boosted decision trees.

    Science.gov (United States)

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data. PMID:26440270

  4. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    Science.gov (United States)

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods. PMID:26737960

  5. Effect of training characteristics on object classification: an application using Boosted Decision Trees

    CERN Document Server

    Sevilla-Noarbe, Ignacio

    2015-01-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects tha...

  6. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    Science.gov (United States)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  7. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  8. Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data

    OpenAIRE

    Spanakis, Gerasimos; Weiss, Gerhard; Roefs, Anne

    2016-01-01

    Ecological Momentary Assessment (EMA) data is organized in multiple levels (per-subject, per-day, etc.) and this particular structure should be taken into account in machine learning algorithms used in EMA like decision trees and its variants. We propose a new algorithm called BBT (standing for Bagged Boosted Trees) that is enhanced by a over/under sampling method and can provide better estimates for the conditional class probability function. Experimental results on a real-world dataset show...

  9. Measurement of the t-channel single top-quark production using boosted decision trees in ATLAS experiment at √(s)=7 TeV

    International Nuclear Information System (INIS)

    This thesis presents a measurement of the cross section of t-channel single top-quark production using 1.04 fb-1 data collected by the ATLAS detector at the LHC with proton-proton collision at center-of-mass √(s)=7 TeV. Selected events contain one lepton, missing transverse energy, and two or three jets, one of them b-tagged. The background model consists of multi-jets, W+jets and top-quark pair events, with smaller contributions from Z+jets and di-boson events. By using a selection based on the distribution of a multivariate discriminant constructed with the boosted decision trees, the cross section of t-channel single top-quark production is measured: σt = (97.3 +30.7 -30.2) pb, which is in good agreement with the prediction of the Standard Model. Assuming that the top-quark-related CKM matrix elements obey the relation |Vtb|>> |Vts|, |Vtd|, the coupling strength at the Wtb vertex is extracted from the measured cross section, |Vtb| = (1.23 +0.20 -0.19). If it is assumed that |Vtb| ≤ 1 a lower limit of |Vtb| > 0.61 is obtained at the 95% confidence level. (author)

  10. Boosting bonsai trees for handwritten/printed text discrimination

    Science.gov (United States)

    Ricquebourg, Yann; Raymond, Christian; Poirriez, Baptiste; Lemaitre, Aurélie; Coüasnon, Bertrand

    2013-12-01

    Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.

  11. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  12. Human decision error (HUMDEE) trees

    International Nuclear Information System (INIS)

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision

  13. Human decision error (HUMDEE) trees

    Energy Technology Data Exchange (ETDEWEB)

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  14. Objective consensus from decision trees

    International Nuclear Information System (INIS)

    Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties

  15. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  16. A Comparison of Boosting Tree and Gradient Treeboost Methods for Carpal Tunnel Syndrome

    Directory of Open Access Journals (Sweden)

    Gülhan OREKİCİ TEMEL

    2014-10-01

    Full Text Available Objective: Boosting is one of the most successful combining methods. The principal aim of these combining algorithms is to obtain a strong classifier with small estimation error from the combination of weak classifiers. Boosting based on combining tree has many advantages. Data sets can contain mixtures of nominal, ordinal and numerical variables. AdaBoost and Gradient TreeBoost are commonly used boosting procedure. Both methods are a stage wise additive model fitting procedure. Our goal in this study is to explain the both method and to compare the algorithm results on a neurology data set on the purpose of classification. Material and Methods: The data set consists of 4076 incidences in total. The condition of being a patient with Carpal Tunnel Syndrome (CTS or not was considered as the dependent variable. Boosting Tree and Gradient TreeBoost applications were conducted in Statistica 7.0 and Salford Predictive Modeler: TreeNet (R trial version 6.6.0.091. Results: In AdaBoost and Gradient TreeBoost algorithm, multiple trees are grown of the training data. 200 trees are produced for both models. 70 trees in the AdaBoost Algorithm and 196 trees in the Gradient TreeBoost algorithm are chosen as the optimal trees. Conclusion: The sensitivity or specify values in the test data of Gradient TreeBoost are high indicates that they can be used as a successful method in CTS diagnosis. . It is believed that the boosting methods will become very more and more popular in health science due to its easy implementation and high predictive performance.

  17. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  18. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  19. Prediction of fishing effort distributions using boosted regression trees.

    Science.gov (United States)

    Soykan, Candan U; Eguchi, Tomoharu; Kohin, Suzanne; Dewar, Heidi

    2014-01-01

    Concerns about bycatch of protected species have become a dominant factor shaping fisheries management. However, efforts to mitigate bycatch are often hindered by a lack of data on the distributions of fishing effort and protected species. One approach to overcoming this problem has been to overlay the distribution of past fishing effort with known locations of protected species, often obtained through satellite telemetry and occurrence data, to identify potential bycatch hotspots. This approach, however, generates static bycatch risk maps, calling into question their ability to forecast into the future, particularly when dealing with spatiotemporally dynamic fisheries and highly migratory bycatch species. In this study, we use boosted regression trees to model the spatiotemporal distribution of fishing effort for two distinct fisheries in the North Pacific Ocean, the albacore (Thunnus alalunga) troll fishery and the California drift gillnet fishery that targets swordfish (Xiphias gladius). Our results suggest that it is possible to accurately predict fishing effort using < 10 readily available predictor variables (cross-validated correlations between model predictions and observed data -0.6). Although the two fisheries are quite different in their gears and fishing areas, their respective models had high predictive ability, even when input data sets were restricted to a fraction of the full time series. The implications for conservation and management are encouraging: Across a range of target species, fishing methods, and spatial scales, even a relatively short time series of fisheries data may suffice to accurately predict the location of fishing effort into the future. In combination with species distribution modeling of bycatch species, this approach holds promise as a mitigation tool when observer data are limited. Even in data-rich regions, modeling fishing effort and bycatch may provide more accurate estimates of bycatch risk than partial observer coverage

  20. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  1. Measuring Intuition: Nonconscious Emotional Information Boosts Decision Accuracy and Confidence.

    Science.gov (United States)

    Lufityanto, Galang; Donkin, Chris; Pearson, Joel

    2016-05-01

    The long-held popular notion of intuition has garnered much attention both academically and popularly. Although most people agree that there is such a phenomenon as intuition, involving emotionally charged, rapid, unconscious processes, little compelling evidence supports this notion. Here, we introduce a technique in which subliminal emotional information is presented to subjects while they make fully conscious sensory decisions. Our behavioral and physiological data, along with evidence-accumulator models, show that nonconscious emotional information can boost accuracy and confidence in a concurrent emotion-free decision task, while also speeding up response times. Moreover, these effects were contingent on the specific predictive arrangement of the nonconscious emotional valence and motion direction in the decisional stimulus. A model that simultaneously accumulates evidence from both physiological skin conductance and conscious decisional information provides an accurate description of the data. These findings support the notion that nonconscious emotions can bias concurrent nonemotional behavior-a process of intuition. PMID:27052557

  2. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  3. Using Decision Trees for Coreference Resolution

    CERN Document Server

    McCarthy, J F; Carthy, Joseph F. Mc; Lehnert, Wendy G.

    1995-01-01

    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.

  4. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  5. Decision Tree Based Algorithm for Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Kajal Rai

    2016-01-01

    Full Text Available An Intrusion Detection System (IDS is a defense measure that supervises activities of the computer network and reports the malicious activities to the network administrator. Intruders do many attempts to gain access to the network and try to harm the organization’s data. Thus the security is the most important aspect for any type of organization. Due to these reasons, intrusion detection has been an important research issue. An IDS can be broadly classified as Signature based IDS and Anomaly based IDS. In our proposed work, the decision tree algorithm is developed based on C4.5 decision tree approach. Feature selection and split value are important issues for constructing a decision tree. In this paper, the algorithm is designed to address these two issues. The most relevant features are selected using information gain and the split value is selected in such a way that makes the classifier unbiased towards most frequent values. Experimentation is performed on NSL-KDD (Network Security Laboratory Knowledge Discovery and Data Mining dataset based on number of features. The time taken by the classifier to construct the model and the accuracy achieved is analyzed. It is concluded that the proposed Decision Tree Split (DTS algorithm can be used for signature based intrusion detection.

  6. Fuzzy Decision Tree based Effective IMine Indexing

    Directory of Open Access Journals (Sweden)

    Peer Fatima,

    2012-02-01

    Full Text Available Data base management system is a set of programs that allows storing, modify, and obtain information from a database. With the huge increase in the amount of information, it is very difficult to manage these databases. Hence there is a need for an effective indexing technique. The advantage of using index lies in the fact is that index makes search operation very fast. In this paper proposed the IMine index (a common and compressed structure which presents close integration of item set mining structure by using Fuzzy Decision Tree (FDT and I-Tree. Previous approaches have used Prefix Hash Tree (PHT and FP-Bonsai Tree but it exhibits long delays and unnecessary use of available memory size. FDT uses certain rules to generate the tree structure and hence it is easy to read the index based on the rule. FDT allows selective reading on the I-Tree. The experimental results proves that the use of FDT in the IMine provides low reading cost, very low utilization of available memory, and hence very low computation time.

  7. On the Complexity of Decision Making in Possibilistic Decision Trees

    CERN Document Server

    Fargier, Helene; Guezguez, Wided

    2012-01-01

    When the information about uncertainty cannot be quantified in a simple, probabilistic way, the topic of possibilistic decision theory is often a natural one to consider. The development of possibilistic decision theory has lead to a series of possibilistic criteria, e.g pessimistic possibilistic qualitative utility, possibilistic likely dominance, binary possibilistic utility and possibilistic Choquet integrals. This paper focuses on sequential decision making in possibilistic decision trees. It proposes a complexity study of the problem of finding an optimal strategy depending on the monotonicity property of the optimization criteria which allows the application of dynamic programming that offers a polytime reduction of the decision problem. It also shows that possibilistic Choquet integrals do not satisfy this property, and that in this case the optimization problem is NP - hard.

  8. STUDY ON DECISION TREE COMPETENT DATA CLASSIFICATION

    OpenAIRE

    Vanitha, A.; S.Niraimathi

    2013-01-01

    Data mining is a process where intelligent methods are applied in order to extract data patterns.This is used in cases of discovering patterns and trends among large datasets. Data classification involvescategorization of data into different category according to protocols. They are many classification algorithmsavailable and among the decision tree is the most commonly used method. Classification of data objectsbased on a predefined knowledge of objects is a data mining. This paper discussed...

  9. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  10. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  11. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  12. Boosted Regression Trees in the H$\\rightarrow \\tau\\tau$ decay channel

    CERN Document Server

    Hedrich, Natascha Sylvia

    2013-01-01

    This report examines the application of a multivariate analysis technique, known as Boosted Regression Trees (BRT's) to the reconstruction of the Higgs mass. BRT's being evaluated as a competing method to the Missing Mass Calculator, which is currently being used in the H $\\rightarrow \\tau\\tau$ channel. The effects of the regression target distribution, input variables and training parameters on the regression performance is also investigated. BRT's are a promising technique and further studies will aim to better understand potential biases.

  13. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  14. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  15. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  16. Statistical Decision-Tree Models for Parsing

    CERN Document Server

    Magerman, D M

    1995-01-01

    Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...

  17. Identification of individuals with ADHD using the Dean-Woodcock sensory motor battery and a boosted tree algorithm.

    Science.gov (United States)

    Finch, Holmes W; Davis, Andrew; Dean, Raymond S

    2015-03-01

    The accurate and early identification of individuals with pervasive conditions such as attention deficit hyperactivity disorder (ADHD) is crucial to ensuring that they receive appropriate and timely assistance and treatment. Heretofore, identification of such individuals has proven somewhat difficult, typically involving clinical decision making based on descriptions and observations of behavior, in conjunction with the administration of cognitive assessments. The present study reports on the use of a sensory motor battery in conjunction with a recursive partitioning computer algorithm, boosted trees, to develop a prediction heuristic for identifying individuals with ADHD. Results of the study demonstrate that this method is able to do so with accuracy rates of over 95 %, much higher than the popular logistic regression model against which it was compared. Implications of these results for practice are provided. PMID:24771321

  18. Extracting decision rules from police accident reports through decision trees.

    Science.gov (United States)

    de Oña, Juan; López, Griselda; Abellán, Joaquín

    2013-01-01

    Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules. PMID:23021419

  19. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  20. Application of portfolio theory in decision tree analysis.

    Science.gov (United States)

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation. PMID:1894809

  1. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  2. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  3. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  4. Boosting bonsai trees for efficient features combination : application to speaker role identification

    OpenAIRE

    Laurent, Antoine; Camelin, Nathalie; Raymond, Christian

    2014-01-01

    In this article, we tackle the problem of speaker role detection from broadcast news shows. In the literature, many proposed solutions are based on the combination of various features coming from acoustic, lexical and semantic information with a machine learning algorithm. Many previous studies mention the use of boosting over decision stumps to combine efficiently these features. In this work, we propose a modification of this state-of-the-art machine learning algorithm changing the weak lea...

  5. Decision tree approach to power systems security assessment

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1993-01-01

    An overview of the general decision tree approach to power system security assessment is presented. The general decision tree methodology is outlined, modifications proposed in the context of transient stability assessment are embedded, and further refinements are considered. The approach is then suitably tailored to handle other specifics of power systems security, relating to both preventive and emergency voltage control, in addition to transient stability. Trees are accordingly built in th...

  6. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  7. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  8. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  9. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  10. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    OpenAIRE

    Bruno Carneiro da Rocha; Rafael Timóteo de Sousa Júnior

    2010-01-01

    This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of t...

  11. Decision tree methods:applicaitons for classiifcaiton and prediciton

    Institute of Scientific and Technical Information of China (English)

    Yan-yan SONG; Ying LU

    2015-01-01

    Summary:Decision tree methodology is a commonly used data mining method for establishing classiifcaiton systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can effciently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validaiton datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the opitmal ifnal model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  12. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    G KISHOR KUMAR; P VISWANATH; A ANANDA RAO

    2016-03-01

    For classification, decision trees have become very popular because of its simplicity, interpret-ability and good performance. To induce a decision tree classifier for data having continuous valued attributes, the most common approach is, split the continuous attribute range into a hard (crisp) partition having two or more blocks, using one or several crisp (sharp) cut points. But, this can make the resulting decision tree, very sensitive to noise.An existing solution to this problem is to split the continuous attribute into a fuzzy partition (soft partition) using soft or fuzzy cut points which is based on fuzzy set theory and to use fuzzy decisions at nodes of the tree. Theseare called soft decision trees in the literature which are shown to perform better than conventional decision trees, especially in the presence of noise. Current paper, first proposes to use an ensemble of soft decision trees forrobust classification where the attribute, fuzzy cut point, etc. parameters are chosen randomly from a probability distribution of fuzzy information gain for various attributes and for their various cut points. Further, the paperproposes to use probability based information gain to achieve better results. The effectiveness of the proposed method is shown by experimental studies carried out using three standard data sets. It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  13. Generating Optimized Decision Tree Based on Discrete Wavelet Transform

    Directory of Open Access Journals (Sweden)

    Kiran Kumar Reddi

    2010-03-01

    Full Text Available Increasing growth of functionality in current IT trends proved the decision making operations through mass data mining techniques. There is still a requirement for further efficiency and optimization. The problem of constructing the optimization decision tree is now an active research area. Generating an efficient and optimized decision tree with multi-attribute data source is considered as one of the shortcomings. This paper emphasizes to propose a multivariate statistical method Discrete Wavelet Transform on multi-attribute data for reducing dimensionality and to transform traditional decision tree algorithm to form a new algorithmic model. The experimental results described that this method can not only optimizes the structure of the decision tree, but also improves the problems existing in pruning and to mine the better rule set without effecting the purpose of prediction accuracy altogether.

  14. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    OpenAIRE

    D.Shanth; Dr.G.Sahoo; Dr.N.Saravanan

    2011-01-01

    Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods ...

  15. Data mining with decision trees theory and applications

    CERN Document Server

    Rokach, Lior

    2014-01-01

    Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced. This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new

  16. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    Science.gov (United States)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  17. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  18. Decision Trees and Transient Stability of Electric Power Systems

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1991-01-01

    An inductive inference method for the automatic building of decision trees is investigated. Among its various tasks, the splitting and the stop splitting criteria successively applied to the nodes of a grown tree, are found to play a crucial role on its overall shape and performances. The application of this general method to transient stability is systematically explored. Parameters related to the stop splitting criterion, to the learning set and to the tree classes are thus considered, a...

  19. Confidence sets for split points in decision trees

    OpenAIRE

    Banerjee, Moulinath; McKeague, Ian W.

    2007-01-01

    We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others cali...

  20. Detection and Extraction of Videos using Decision Trees

    Directory of Open Access Journals (Sweden)

    Sk.Abdul Nabi

    2011-12-01

    Full Text Available This paper addresses a new multimedia data mining framework for the extraction of events in videos by using decision tree logic. The aim of our DEVDT (Detection and Extraction of Videos using Decision Trees system is for improving the indexing and retrieval of multimedia information. The extracted events can be used to index the videos. In this system we have considered C4.5 Decision tree algorithm [3] which is used for managing both continuous and discrete attributes. In this process, firstly we have adopted an advanced video event detection method to produce event boundaries and some important visual features. This rich multi-modal feature set is filtered by a pre-processing step to clean the noise as well as to reduce the irrelevant data. This will improve the performance of both Precision and Recall. After producing the cleaned data, it will be mined and classified by using a decision tree model. The learning and classification steps of this Decision tree are simple and fast. The Decision Tree has good accuracy. Subsequently, by using our system we will reach maximum Precision and Recall i.e. we will extract pure video events effectively and proficiently.

  1. Development and Test of Fixed Average K-means Base Decision Trees Grouping Method by Improving Decision Tree Clustering Method

    OpenAIRE

    Jai-Houng Leu; Chih-Yao Lo; Chi-Hau Liu

    2009-01-01

    New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees) on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning poten...

  2. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    Energy Technology Data Exchange (ETDEWEB)

    Rauzy, Antoine B., E-mail: Antoine.Rauzy@lix.polytechnique.f [LIX-CNRS, Computer Science, Ecole Polytechnique, 91128 Palaiseau Cedex (France)

    2011-07-15

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  3. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    Directory of Open Access Journals (Sweden)

    D.Shanth

    2010-12-01

    Full Text Available Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.

  4. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    Directory of Open Access Journals (Sweden)

    D.Shanthi

    2011-02-01

    Full Text Available Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.

  5. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  6. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    Directory of Open Access Journals (Sweden)

    Bruno Carneiro da Rocha

    2010-10-01

    Full Text Available This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DMmanagement model of data mining in large operational databases logged from internet banktransactions.

  7. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  8. Decision support and data warehousing tools boost competitive advantage.

    Science.gov (United States)

    Waldo, B H

    1998-01-01

    The ability to communicate across the care continuum is fast becoming an integral component of the successful health enterprise. As integrated delivery systems are formed and patient care delivery is restructured, health care professionals must be able to distribute, access, and evaluate information across departments and care settings. The Aberdeen Group, a computer and communications research and consulting organization, believes that "the single biggest challenge for next-generation health care providers is to improve on how they consolidate and manage information across the continuum of care. This involves building a strategic warehouse of clinical and financial information that can be shared and leveraged by health care professionals, regardless of the location or type of care setting" (Aberdeen Group, Inc., 1997). The value and importance of data and systems integration are growing. Organizations that create a strategy and implement DSS tools to provide decision-makers with the critical information they need to face the competition and maintain quality and costs will have the advantage. PMID:9592525

  9. Smart City Mobility Application—Gradient Boosting Trees for Mobility Prediction and Analysis Based on Crowdsourced Data

    Science.gov (United States)

    Semanjski, Ivana; Gautama, Sidharta

    2015-01-01

    Mobility management represents one of the most important parts of the smart city concept. The way we travel, at what time of the day, for what purposes and with what transportation modes, have a pertinent impact on the overall quality of life in cities. To manage this process, detailed and comprehensive information on individuals’ behaviour is needed as well as effective feedback/communication channels. In this article, we explore the applicability of crowdsourced data for this purpose. We apply a gradient boosting trees algorithm to model individuals’ mobility decision making processes (particularly concerning what transportation mode they are likely to use). To accomplish this we rely on data collected from three sources: a dedicated smartphone application, a geographic information systems-based web interface and weather forecast data collected over a period of six months. The applicability of the developed model is seen as a potential platform for personalized mobility management in smart cities and a communication tool between the city (to steer the users towards more sustainable behaviour by additionally weighting preferred suggestions) and users (who can give feedback on the acceptability of the provided suggestions, by accepting or rejecting them, providing an additional input to the learning process). PMID:26151209

  10. Smart City Mobility Application--Gradient Boosting Trees for Mobility Prediction and Analysis Based on Crowdsourced Data.

    Science.gov (United States)

    Semanjski, Ivana; Gautama, Sidharta

    2015-01-01

    Mobility management represents one of the most important parts of the smart city concept. The way we travel, at what time of the day, for what purposes and with what transportation modes, have a pertinent impact on the overall quality of life in cities. To manage this process, detailed and comprehensive information on individuals' behaviour is needed as well as effective feedback/communication channels. In this article, we explore the applicability of crowdsourced data for this purpose. We apply a gradient boosting trees algorithm to model individuals' mobility decision making processes (particularly concerning what transportation mode they are likely to use). To accomplish this we rely on data collected from three sources: a dedicated smartphone application, a geographic information systems-based web interface and weather forecast data collected over a period of six months. The applicability of the developed model is seen as a potential platform for personalized mobility management in smart cities and a communication tool between the city (to steer the users towards more sustainable behaviour by additionally weighting preferred suggestions) and users (who can give feedback on the acceptability of the provided suggestions, by accepting or rejecting them, providing an additional input to the learning process). PMID:26151209

  11. Smart City Mobility Application—Gradient Boosting Trees for Mobility Prediction and Analysis Based on Crowdsourced Data

    Directory of Open Access Journals (Sweden)

    Ivana Semanjski

    2015-07-01

    Full Text Available Mobility management represents one of the most important parts of the smart city concept. The way we travel, at what time of the day, for what purposes and with what transportation modes, have a pertinent impact on the overall quality of life in cities. To manage this process, detailed and comprehensive information on individuals’ behaviour is needed as well as effective feedback/communication channels. In this article, we explore the applicability of crowdsourced data for this purpose. We apply a gradient boosting trees algorithm to model individuals’ mobility decision making processes (particularly concerning what transportation mode they are likely to use. To accomplish this we rely on data collected from three sources: a dedicated smartphone application, a geographic information systems-based web interface and weather forecast data collected over a period of six months. The applicability of the developed model is seen as a potential platform for personalized mobility management in smart cities and a communication tool between the city (to steer the users towards more sustainable behaviour by additionally weighting preferred suggestions and users (who can give feedback on the acceptability of the provided suggestions, by accepting or rejecting them, providing an additional input to the learning process.

  12. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  13. Decision tree ensembles for online operation of large smart grids

    International Nuclear Information System (INIS)

    Highlights: ► We present a new technique for the online control of large smart grids. ► We use a Decision Tree Ensemble in a Receding Horizon Controller. ► Decision Trees can approximate online optimisation approaches. ► Decision Trees can make adjustments to their output in real time. ► The new technique outperforms heuristic online optimisation approaches. - Abstract: Smart grids utilise omnidirectional data transfer to operate a network of energy resources. Associated technologies present operators with greater control over system elements and more detailed information on the system state. While these features may improve the theoretical optimal operating performance, determining the optimal operating strategy becomes more difficult. In this paper, we show how a decision tree ensemble or ‘forest’ can produce a near-optimal control strategy in real time. The approach substitutes the decision forest for the simulation–optimisation sub-routine commonly employed in receding horizon controllers. The method is demonstrated on a small and a large network, and compared to controllers employing particle swarm optimisation and evolutionary strategies. For the smaller network the proposed method performs comparably in terms of total energy usage, but delivers a greater demand deficit. On the larger network the proposed method is superior with respect to all measures. We conclude that the method is useful when the time required to evaluate possible strategies via simulation is high.

  14. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  15. Visualization method and tool for interactive learning of large decision trees

    Science.gov (United States)

    Nguyen, Trong Dung; Ho, TuBao

    2002-03-01

    When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficiently trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. This paper presents a visualization method and tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (stands for Tress 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for the issues (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.

  16. Matching in Vitro Bioaccessibility of Polyphenols and Antioxidant Capacity of Soluble Coffee by Boosted Regression Trees.

    Science.gov (United States)

    Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A

    2015-11-01

    The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods. PMID:26457815

  17. Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees

    Science.gov (United States)

    Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.

    2015-12-01

    In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.

  18. Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees.

    Science.gov (United States)

    Saha, Dibakar; Alluri, Priyanka; Gan, Albert

    2015-06-01

    The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. PMID:25823903

  19. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  20. Soil Organic Matter Mapping by Decision Tree Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHOU Bin; ZHANG Xing-Gang; WANG Fan; WANG Ren-Chao

    2005-01-01

    Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data.The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform,geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative,landscape factor-associated regular system. This system could be used to predict continuous SOM spatial distribution.By analyzing factors such as elevation, geological unit, soil type, land use, remotely sensed data, upslope contributing area, slope, aspect, planform curvature, and profile curvature, the decision tree could predict distribution of soil organic matter levels. Among these factors, elevation, land use, aspect, soil type, the first principle component of bitemporal Landsat TM, and upslope contributing area were considered the most important variables for predicting SOM. Results of the prediction between SOM content and landscape types sorted by the decision tree showed a close relationship with an accuracy of 81.1%.

  1. Construction of a decision tree in linear programming problems

    International Nuclear Information System (INIS)

    The dependence of the solution of a linear programming problem on its parameter has been analyzed. An algorithm for the construction of a decision tree has been proposed with the use of the simplex method together with the validity support system

  2. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    Hoogh, de Sebastiaan; Schoenmakers, Berry; Chen, Ping; Akker, op den Harm

    2014-01-01

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approa

  3. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  4. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  5. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  6. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    Tree (CART), has gained increasing interests because of its high performance in terms of computational efficiency, uncertainty manageability, and interpretability. This paper presents an overview of a variety of DT applications to power systems for better interfacing of power systems with data......The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression...... analytics. The fundamental knowledge of CART algorithm is also introduced which is then followed by examples of both classification tree and regression tree with the help of case study for security assessment of Danish power system....

  7. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  8. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  9. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  10. Distributed Decision-Tree Induction in Peer-to-Peer Systems

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in...

  11. Web People Search Using Ontology Based Decision Tree

    Directory of Open Access Journals (Sweden)

    Mrunal Patil

    2012-09-01

    Full Text Available Nowadays, searching for people on web is the most common activity done by most of the users. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web page of interest is left on the user. In this paper, we develop a technique for web people search which clusters the web pages based on semantic information and maps them using ontology based decision tree making the user to access the information in more easy way. This technique uses the concept of ontology thus reducing the number of inconsistencies. The result proves that ontology based decision tree and clustering helps in increasing the efficiency of the overall search.

  12. Web People Search Using Ontology Based Decision Tree

    Directory of Open Access Journals (Sweden)

    Mrunal Patil

    2012-03-01

    Full Text Available Nowadays, searching for people on web is the most common activity done by most of the users. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web page of interest is left on the user. In this paper, we develop a technique for web people search which clusters the web pages based on semantic information and maps them using ontology based decision tree making the user to access the information in more easy way. This technique uses the concept of ontology thus reducing the number of inconsistencies. The result proves that ontology based decision tree and clustering helps in increasing the efficiency of the overall search.

  13. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  14. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    OpenAIRE

    Ali Idri; Sanaa Elyassami

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the...

  15. Rule Extraction in Transient Stability Study Using Linear Decision Trees

    Institute of Scientific and Technical Information of China (English)

    SUN Hongbin; WANG Kang; ZHANG Boming; ZHAO Feng

    2011-01-01

    Traditional operation rules depend on human experience, which are relatively fixed and difficult to fulfill the new demand of the modern power grid. In order to formulate suitable and quickly refreshed operation rules, a method of linear decision tree based on support samples is proposed for rule extraction in this paper. The operation rules extracted by this method have advantages of refinement and intelligence, which helps the dispatching center meet the requirement of smart grid construction.

  16. DECISION TREE ANALYSIS OF THE PREDICTORS OF INTERNET AFFINITY

    OpenAIRE

    BUBAŠ, Goran; Kliček, Božidar; Hutinski, Željko

    2001-01-01

    A recently developed model of Internet affinity was used for survey design and data collection on variables that have potential influence on affinity for Internet use. A total of 600 Croatian students with access to the Internet at their college participated in this survey. The collected data were used for investigation of the relation between decision tree analysis and regression analysis of predictor variables of Internet affinity. Different predictors were found to influence two distinct c...

  17. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-07-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  18. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    CERN Document Server

    Elyassami, Sanaa

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  19. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  20. Development and Test of Fixed Average K-means Base Decision Trees Grouping Method by Improving Decision Tree Clustering Method

    Directory of Open Access Journals (Sweden)

    Jai-Houng Leu

    2009-01-01

    Full Text Available New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning potential, high stability and high emotional quotient are selected, the return of investment in human resources will be more apparent. If employees of the above mentioned traits can be well utilized and incubated, the industry competitiveness of the enterprise will be enhanced effectively. From the personality specialty point of view, its function is to predict the efficiency of the personal achievement in correlation to his some implying personality specialties (blood group, constellation, etc.. The main purpose of this research is to get the useful information and important message about human performance from their historical records with this method. The Decision Tree Clustering Method data mining skills were improved and applied to get the critical factors that affect the human traits for its feasibility in this study.

  1. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  2. Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference

    OpenAIRE

    Tran, Tung; Yang, Bo-Suk; Oh, Myung-Suck; Tan, Andy Chit Chiow

    2009-01-01

    This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and le...

  3. Decision Tree Classifiers for Star/Galaxy Separation

    CERN Document Server

    Vasconcellos, E C; Gal, R R; LaBarbera, F L; Capelato, H V; Velho, H F Campos; Trevisan, M; Ruiz, R S R

    2010-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14\\le r\\le21$ ($85.2%$) and $r\\ge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15\\le r\\le21$, with m...

  4. Influence diagrams and decision trees for severe accident management

    International Nuclear Information System (INIS)

    A review of relevant methodologies based on Influence Diagrams (IDs), Decision Trees (DTs), and Containment Event Trees (CETs) was conducted to assess the practicality of these methods for the selection of effective strategies for Severe Accident Management (SAM). The review included an evaluation of some software packages for these methods. The emphasis was on possible pitfalls of using IDs and on practical aspects, the latter by performance of a case study that was based on an existing Level 2 Probabilistic Safety Assessment (PSA). The study showed that the use of a combined ID/DT model has advantages over CET models, in particular when conservatisms in the Level 2 PSA have been identified and replaced by fair assessments of the uncertainties involved. It is recommended to use ID/DT models complementary to CET models. (orig.)

  5. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  6. Data acquisition in modeling using neural networks and decision trees

    Directory of Open Access Journals (Sweden)

    R. Sika

    2011-04-01

    Full Text Available The paper presents a comparison of selected models from area of artificial neural networks and decision trees in relation with actualconditions of foundry processes. The work contains short descriptions of used algorithms, their destination and method of data preparation,which is a domain of work of Data Mining systems. First part concerns data acquisition realized in selected iron foundry, indicating problems to solve in aspect of casting process modeling. Second part is a comparison of selected algorithms: a decision tree and artificial neural network, that is CART (Classification And Regression Trees and BP (Backpropagation in MLP (Multilayer Perceptron networks algorithms.Aim of the paper is to show an aspect of selecting data for modeling, cleaning it and reducing, for example due to too strong correlationbetween some of recorded process parameters. Also, it has been shown what results can be obtained using two different approaches:first when modeling using available commercial software, for example Statistica, second when modeling step by step using Excel spreadsheetbasing on the same algorithm, like BP-MLP. Discrepancy of results obtained from these two approaches originates from a priorimade assumptions. Mentioned earlier Statistica universal software package, when used without awareness of relations of technologicalparameters, i.e. without user having experience in foundry and without scheduling ranks of particular parameters basing on acquisition, can not give credible basis to predict the quality of the castings. Also, a decisive influence of data acquisition method has been clearly indicated, the acquisition should be conducted according to repetitive measurement and control procedures. This paper is based on about 250 records of actual data, for one assortment for 6 month period, where only 12 data sets were complete (including two that were used for validation of neural network and useful for creating a model. It is definitely too

  7. Fault trees for decision making in systems analysis

    International Nuclear Information System (INIS)

    The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut sets according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure. (80 figures, 20 tables)

  8. FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ferdaws Ezzi

    2016-01-01

    Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.

  9. Porting Decision Tree Algorithms to Multicore using FastFlow

    CERN Document Server

    aldinucci, Marco; Torquati, Massimo

    2010-01-01

    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.

  10. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    Directory of Open Access Journals (Sweden)

    O.BENCHAREF

    2011-09-01

    Full Text Available The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

  11. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  12. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  13. Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

    OpenAIRE

    De Rosa, Rocco

    2016-01-01

    Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have b...

  14. An Applied Research of Decision Tree Algorithm in Track and Field Equipment Training

    OpenAIRE

    Liu Shaoqing; Wang Kebin

    2015-01-01

    This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering c...

  15. Distribution-Specific Agnostic Boosting

    CERN Document Server

    Feldman, Vitaly

    2009-01-01

    We consider the problem of boosting the accuracy of weak learning algorithms in the agnostic learning framework of Haussler (1992) and Kearns et al. (1992). Known algorithms for this problem (Ben-David et al., 2001; Gavinsky, 2002; Kalai et al., 2008) follow the same strategy as boosting algorithms in the PAC model: the weak learner is executed on the same target function but over different distributions on the domain. We demonstrate boosting algorithms for the agnostic learning framework that only modify the distribution on the labels of the points (or, equivalently, modify the target function). This allows boosting a distribution-specific weak agnostic learner to a strong agnostic learner with respect to the same distribution. When applied to the weak agnostic parity learning algorithm of Goldreich and Levin (1989) our algorithm yields a simple PAC learning algorithm for DNF and an agnostic learning algorithm for decision trees over the uniform distribution using membership queries. These results substantia...

  16. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  17. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    CERN Document Server

    Farid, Dewan Md; Rahman, Mohammad Zahidur; 10.5121/ijnsa.2010.2202

    2010-01-01

    In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learn...

  18. BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES

    Directory of Open Access Journals (Sweden)

    Amaranatha Reddy P

    2015-11-01

    Full Text Available This research paper proposes an algorithm to find association rules for incremental databases. Most of the transaction databases are often dynamic. Suppose consider super market customers daily purchase transactions. Day to day customer’s behaviour to purchase items may change and new products replace old products. In this scenario static data mining algorithms doesn't make good significance. If an algorithm continuously learns day to day, then we can get most updated knowledge. This is very much helpful in present fast updating world. Famous and benchmarked algorithms for Association rules mining are Apriory and FP- Growth. However, the major drawback in Appriory and FP-Growth is, they must be rebuilt all over again once the original database is changed. Therefore, in this paper we introduce an efficient algorithm called Binary Decision Tree (BDT to process incremental data. To process continuously data we need so much of processing and storage resources. In this algorithm we scan data base only once by which we construct dynamic growing binary tree to find association rules with better performance and optimum storage. We can apply for static data also, but our main intension is to give optimum solution for incremental data.

  19. Classification of Liss IV Imagery Using Decision Tree Methods

    Science.gov (United States)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  20. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  1. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    Science.gov (United States)

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  2. Binary Decision Tree Development for Probabilistic Safety Assessment Applications

    International Nuclear Information System (INIS)

    The aim of this article is to describe state of the development for the relatively new approach in the probabilistic safety analysis (PSA). This approach is based on the application of binary decision diagrams (BDD) representation for the logical function on the quantitative and qualitative analysis of complex systems that are presented by fault trees and event trees in the PSA applied for the nuclear power plants risk determination. Even BDD approach offers full solution comparing to the partial one from the conventional quantification approach there are still problems to be solved before new approach could be fully implemented. Major problem with full application of BDD is difficulty of getting any solution for the PSA models of certain complexity. This paper is comparing two approaches in PSA quantification. Major focus of the paper is description of in-house developed BDD application with implementation of the original algorithms. Resulting number of nodes required to represent the BDD is extremely sensitive to the chosen order of variables (i.e., basic events in PSA). The problem of finding an optimal order of variables that form the BDD falls under the class of NP-complete complexity. This paper presents an original approach to the problem of finding the initial order of variables utilized for the BDD construction by various dynamical reordering schemes. Main advantage of this approach compared to the known methods of finding the initial order is with better results in respect to the required working memory and time needed to finish the BDD construction. Developed method is compared against results from well known methods such as depth-first, breadth-first search procedures. Described method may be applied in finding of an initial order for fault trees/event trees being created from basic events by means of logical operations (e.g. negation, and, or, exclusive or). With some testing models a significant reduction of used memory has been achieved, sometimes

  3. Computer Crime Forensics Based on Improved Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Ying Wang

    2014-04-01

    Full Text Available To find out the evidence of crime-related evidence and association rules among massive data, the classic decision tree algorithms such as ID3 for classification analysis have appeared in related prototype systems. So how to make it more suitable for computer forensics in variable environments becomes a hot issue. When selecting classification attributes, ID3 relies on computation of information entropy. Then the attributes owning more value are selected as classification nodes of the decision tress. Such classification is unrealistic under many cases. During the process of ID3 algorithm there are too many logarithms, so it is complicated to handle with the dataset which has various classification attributes. Therefore, contraposing the special demand for computer crime forensics, ID3 algorithm is improved and a novel classification attribute selection method based on Maclaurin-Priority Value First method is proposed. It adopts the foot changing formula and infinitesimal substitution to simplify the logarithms in ID3. For the errors generated in this process, an apposite constant is introduced to be multiplied by the simplified formulas for compensation. The idea of Priority Value First is introduced to solve the problems of value deviation. The performance of improved method is strictly proved in theory. Finally, the experiments verify that our scheme has advantage in computation time and classification accuracy, compared to ID3 and two existing algorithms

  4. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087

  5. Decision-tree induction from self-mapping space based on web

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shu-yu; ZHU Zhong-ying

    2007-01-01

    An improved decision tree method for web information retrieval with self-mapping attributes is proposed. The self-mapping tree has a value of self-mapping attribute in its internal node, and information based on dissimilarity between a pair of mapping sequences. This method selects self-mapping which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that the improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that the selfmapping decision tree is promising for data mining and knowledge discovery.

  6. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  7. Case Study on High Dimensional Data Analysis Using Decision Tree Model

    OpenAIRE

    Smitha.T; Sundaram, V.

    2012-01-01

    The major aspire of this paper is to build a model to predict the chances of occurrences of disease in an area. This paper mainly concentrating the data mining technique-Decision tree model to identify the significant parameters for prediction process. The decision tree model created with the help of ID3 algorithm.

  8. Decision Rules, Trees and Tests for Tables with Many-valued Decisions–comparative Study

    KAUST Repository

    Azad, Mohammad

    2013-10-04

    In this paper, we present three approaches for construction of decision rules for decision tables with many-valued decisions. We construct decision rules directly for rows of decision table, based on paths in decision tree, and based on attributes contained in a test (super-reduct). Experimental results for the data sets taken from UCI Machine Learning Repository, contain comparison of the maximum and the average length of rules for the mentioned approaches.

  9. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  10. Searches for Supersymmetric Particles with the ATLAS Detector Using Boosted Decay Tree Topologies

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00399438; De, Kaushik; Hadavand, Haleh; Musielak, Zdzislaw; White, Andrew

    The existence of a scalar Higgs particle poses a challenge to the Standard Model through an unnatural hierarchy problem with quadratic divergence. A supersymmetric framework, proposing heavy partners to every Standard Model particle, can solve this problem by introducing new loop diagrams that involve a new fermion-boson symmetry. The LHC has the potential to probe the energy scale necessary for creation of these particles and the ATLAS experiment is poised for discovery. The detected particles are studied by reconstructing the detected events in boosted frames that approximate each decay frame of the interaction with pairs of heavy, invisible particles. This Razor method was used in the analysis of data from 2011 and 2012 and then generalized to the Recursive Jigsaw method in 2015.

  11. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Dewan Md. Farid

    2010-04-01

    Full Text Available In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposedalgorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS. We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR andsignificant reduce false positives (FP for different types of network intrusions using limited computational resources

  12. Social Impact on Android Applications using Decision Tree

    Directory of Open Access Journals (Sweden)

    Waseem Iqbal

    2015-11-01

    Full Text Available Mobile phones have evolved very rapidly from black and white to smart phones. Google has launched Android operating system (OS, based on Linux targeting the smart phones. After this, people became addicted to these smart phones due to the facilities provided by these phones. But the security leaks possess in Android are the big hurdle to use it in a secured way. The Android operating system is mostly used because it is an Open Source/freeware and most of its applications are also freely available on different online applications stores. To install  any application,  we  must accept  the  terms  and conditions  regarding  the access  to multiple part of device and personal information, otherwise unable to install these free or paid applications.  The main problem is that when we allow the access to multiple parts of our device and our personal information, the inherited security leaks become more vulnerable to threat.  A very simple and handy solution is that we only install the applications that are positively reviewed by other users who already installed and are still using these applications. We implement the Decision Tree, a machine learning technique, to analyze these positively reviewed application and make a recommendation  whether to install them in the device or not.

  13. Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems

    CERN Document Server

    Gupta, Anupam; Nagarajan, Viswanath; Ravi, R

    2010-01-01

    We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disease, what is a good adaptive strategy to perform these tests to minimize the expected cost to identify the disease? We settle the approximability of this problem by giving a tight $O(\\log m)$-approximation algorithm. We also consider a more substantial generalization, the Adaptive TSP problem. Given an underlying metric space, a random subset $S$ of cities is drawn from a known distribution, but $S$ is initially unknown to us--we get information about whether any city is in $S$ only when we visit the city in question. What is a good adaptive way of visiting all the cities in the random subset $S$ while minimizing the expected distance traveled? For this problem, we give the first poly-logarithmic approximation, and show that this algorithm is best possible unless w...

  14. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  15. Efficient OCR using simple features and decision trees with backtracking

    International Nuclear Information System (INIS)

    In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)

  16. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  17. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    Science.gov (United States)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  18. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  19. Iron Supplementation and Altitude: Decision Making Using a Regression Tree

    Directory of Open Access Journals (Sweden)

    Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore

    2016-03-01

    Full Text Available Altitude exposure increases the body’s need for iron (Gassmann and Muckenthaler, 2015, primarily to support accelerated erythropoiesis, yet clear supplementation guidelines do not exist. Athletes are typically recommended to ingest a daily oral iron supplement to facilitate altitude adaptations, and to help maintain iron balance. However, there is some debate as to whether athletes with otherwise healthy iron stores should be supplemented, due in part to concerns of iron overload. Excess iron in vital organs is associated with an increased risk of a number of conditions including cancer, liver disease and heart failure. Therefore clear guidelines are warranted and athletes should be discouraged from ‘self-prescribing” supplementation without medical advice. In the absence of prospective-controlled studies, decision tree analysis can be used to describe a data set, with the resultant regression tree serving as guide for clinical decision making. Here, we present a regression tree in the context of iron supplementation during altitude exposure, to examine the association between pre-altitude ferritin (Ferritin-Pre and the haemoglobin mass (Hbmass response, based on daily iron supplement dose. De-identified ferritin and Hbmass data from 178 athletes engaged in altitude training were extracted from the Australian Institute of Sport (AIS database. Altitude exposure was predominantly achieved via normobaric Live high: Train low (n = 147 at a simulated altitude of 3000 m for 2 to 4 weeks. The remaining athletes engaged in natural altitude training at venues ranging from 1350 to 2800 m for 3-4 weeks. Thus, the “hypoxic dose” ranged from ~890 km.h to ~1400 km.h. Ethical approval was granted by the AIS Human Ethics Committee, and athletes provided written informed consent. An in depth description and traditional analysis of the complete data set is presented elsewhere (Govus et al., 2015. Iron supplementation was prescribed by a sports physician

  20. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  1. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  2. In vivo quantitative evaluation of vascular parameters for angiogenesis based on sparse principal component analysis and aggregated boosted trees

    International Nuclear Information System (INIS)

    To solve the multicollinearity issue and unequal contribution of vascular parameters for the quantification of angiogenesis, we developed a quantification evaluation method of vascular parameters for angiogenesis based on in vivo micro-CT imaging of hindlimb ischemic model mice. Taking vascular volume as the ground truth parameter, nine vascular parameters were first assembled into sparse principal components (PCs) to reduce the multicolinearity issue. Aggregated boosted trees (ABTs) were then employed to analyze the importance of vascular parameters for the quantification of angiogenesis via the loadings of sparse PCs. The results demonstrated that vascular volume was mainly characterized by vascular area, vascular junction, connectivity density, segment number and vascular length, which indicated they were the key vascular parameters for the quantification of angiogenesis. The proposed quantitative evaluation method was compared with both the ABTs directly using the nine vascular parameters and Pearson correlation, which were consistent. In contrast to the ABTs directly using the vascular parameters, the proposed method can select all the key vascular parameters simultaneously, because all the key vascular parameters were assembled into the sparse PCs with the highest relative importance. (paper)

  3. Reconstructing palaeoclimatic variables from fossil pollen using boosted regression trees: comparison and synthesis with other quantitative reconstruction methods

    Science.gov (United States)

    Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.

    2014-03-01

    We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.

  4. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  5. Constructing decision trees for user behavior prediction in the online consumer market

    OpenAIRE

    Fokin, Dennis; Hagrot, Joel

    2016-01-01

    This thesis intends to investigate the usefulness of various aspects of product data for user behavior prediction in the online shopping market. Specifically, a data set from BestBuy was used, containing information regarding what product a user clicked on given their search query. Decision trees are machine learning algorithms used for making predictions. The decision tree algorithm ID3 was used because of its simplicity and interpretability. It uses information gain to measure how different...

  6. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    OpenAIRE

    Dong-sheng Liu; Shu-jiang Fan

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context cla...

  7. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  8. A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams

    Institute of Scientific and Technical Information of China (English)

    Xue-Gang Hu; Pei-Pei Li; Xin-Dong Wu; Gong-Qing Wu

    2007-01-01

    Mining with streaming data is a hot topic in data mining. When performing classification on data streams,traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

  9. Production of diagnostic rules from a neurotologic database with decision trees.

    Science.gov (United States)

    Kentala, E; Viikki, K; Pyykkö, I; Juhola, M

    2000-02-01

    A decision tree is an artificial intelligence program that is adaptive and is closely related to a neural network, but can handle missing or nondecisive data in decision-making. Data on patients with Meniere's disease, vestibular schwannoma, traumatic vertigo, sudden deafness, benign paroxysmal positional vertigo, and vestibular neuritis were retrieved from the database of the otoneurologic expert system ONE for the development and testing of the accuracy of decision trees in the diagnostic workup. Decision trees were constructed separately for each disease. The accuracies of the best decision trees were 94%, 95%, 99%, 99%, 100%, and 100% for the respective diseases. The most important questions concerned the presence of vertigo, hearing loss, and tinnitus; duration of vertigo; frequency of vertigo attacks; severity of rotational vertigo; onset and type of hearing loss; and occurrence of head injury in relation to the timing of onset of vertigo. Meniere's disease was the most difficult to classify correctly. The validity and structure of the decision trees are easily comprehended and can be used outside the expert system. PMID:10685569

  10. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  11. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Science.gov (United States)

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  12. Outsourcing the Portal: Another Branch in the Decision Tree.

    Science.gov (United States)

    McMahon, Tim

    2000-01-01

    Discussion of the management of information resources in organizations focuses on the use of portal technologies to update intranet capabilities. Considers application outsourcing decisions, reviews benefits (including reducing costs) as well as concerns, and describes application service providers (ASPs). (LRW)

  13. Applying decision tree models to SMEs: A statistics-based model for customer relationship management

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-07-01

    Full Text Available Customer Relationship Management (CRM has been an important part of enterprise decision-making and management. In this regard, Decision Tree (DT models are the most common tools for investigating CRM and providing an appropriate support for the implementation of CRM systems. Yet, this method does not yield any estimate of the degree of separation of different subgroups involved in analysis. In this research, we compute three decision-making models in SMEs, analyzing different decision tree methods (C&RT, C4.5 and ID3. The methods are then used to compute ME and VoE for the models and they were then used to calculate the Mean Errors (ME and Variance of Errors (VoE estimates to investigate the predictive power of these methods. These decision tree methods were used to analyze small- and medium-sized enterprises (SME’s datasets. The paper proposes a powerful technical support for better directing market tends and mining in CRM. According to the findings, C&RT shows a better degree of separation. As a result, we recommend using decision tree methods together with ME and VoE to determine CRM factors.

  14. Normal form backward induction for decision trees with coherent lower previsions.

    OpenAIRE

    Huntley, Nathan; Troffaes, Matthias C. M.

    2012-01-01

    We examine normal form solutions of decision trees under typical choice functions induced by lower previsions. For large trees, finding such solutions is hard as very many strategies must be considered. In an earlier paper, we extended backward induction to arbitrary choice functions, yielding far more efficient solutions, and we identified simple necessary and sufficient conditions for this to work. In this paper, we show that backward induction works for maximality and E-admissibility, but ...

  15. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  16. Post-event human decision errors: operator action tree/time reliability correlation

    Energy Technology Data Exchange (ETDEWEB)

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  17. Post-event human decision errors: operator action tree/time reliability correlation

    International Nuclear Information System (INIS)

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations

  18. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  19. Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules

    Directory of Open Access Journals (Sweden)

    G.MeeraGandhi

    2010-11-01

    Full Text Available In the era of information society, computer networks and their related applications are the emerging technologies. Network Intrusion Detection aims at distinguishing the behavior of the network. As the network attacks have increased in huge numbers over the past few years, Intrusion Detection System (IDS is increasingly becoming a critical component to secure the network. Owing to large volumes of security audit data in a network in addition to intricate and vibrant properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, the field of machine learning attempts to characterize how such changes can occur by designing, implementing, running, and analyzing algorithms that can be run on computers. The discipline draws on ideas, with the goal of understanding the computational character of learning. Learning always occurs in the context of some performance task, and that a learning method should always be coupled with a performance element that uses the knowledge acquired during learning. In this research, machine learning is being investigated as a technique for making the selection, using as training data and their outcome. In this paper, we evaluate the performance of a set of classifier algorithms of rules (JRIP, Decision Tabel, PART, and OneR and trees (J48, RandomForest, REPTree, NBTree. Based on the evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The empirical simulation result shows the comparison between the noticeable performance improvements. The classification models were trained using the data collected from Knowledge Discovery Databases (KDD for Intrusion Detection. The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The

  20. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models

    Directory of Open Access Journals (Sweden)

    Hu Yun-tao

    2009-09-01

    Full Text Available Abstract Background In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Methods Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. Results After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608 and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806. The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. Conclusion ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.

  1. Decision tree based knowledge acquisition and failure diagnosis using a PWR loop vibration model

    International Nuclear Information System (INIS)

    An analytical vibration model of the primary system of a 1300 MW PWR was used for simulating mechanical faults. Deviations in the calculated power density spectra and coherence functions are determined and classified. The decision tree technique is then used for a personal computer supported knowledge presentation and for optimizing the logical relationships between the simulated faults and the observed symptoms. The optimized decision tree forms the knowledge base and can be used to diagnose known cases as well as to include new data into the knowledge base if new faults occur. (author)

  2. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    OpenAIRE

    Oral, L. O.; V. Tecim

    2013-01-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from h...

  3. Assisting Sustainable Forest Management and Forest Policy Planning with the Sim4Tree Decision Support System

    OpenAIRE

    Floris Dalemans; Paul Jacxsens; Jos Van Orshoven; Vincent Kint; Pieter Moonen; Bart Muys

    2015-01-01

    As European forest policy increasingly focuses on multiple ecosystem services and participatory decision making, forest managers and policy planners have a need for integrated, user-friendly, broad spectrum decision support systems (DSS) that address risks and uncertainties, such as climate change, in a robust way and that provide credible advice in a transparent manner, enabling effective stakeholder involvement. The Sim4Tree DSS has been accordingly developed as a user-oriented, modular and...

  4. Fuzzy decision trees for planning and autonomous control of a coordinated team of UAVs

    Science.gov (United States)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A fuzzy logic resource manager that enables a collection of unmanned aerial vehicles (UAVs) to automatically cooperate to make meteorological measurements will be discussed. Once in flight no human intervention is required. Planning and real-time control algorithms determine the optimal trajectory and points each UAV will sample, while taking into account the UAVs' risk, risk tolerance, reliability, mission priority, fuel limitations, mission cost, and related uncertainties. The control algorithm permits newly obtained information about weather and other events to be introduced to allow the UAVs to be more effective. The approach is illustrated by a discussion of the fuzzy decision tree for UAV path assignment and related simulation. The different fuzzy membership functions on the tree are described in mathematical detail. The different methods by which this tree is obtained are summarized including a method based on using a genetic program as a data mining function. A second fuzzy decision tree that allows the UAVs to automatically collaborate without human intervention is discussed. This tree permits three different types of collaborative behavior between the UAVs. Simulations illustrating how the tree allows the different types of collaboration to be automated are provided. Simulations also show the ability of the control algorithm to allow UAVs to effectively cooperate to increase the UAV team's likelihood of success.

  5. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Yanfeng Hong; Kun Deng

    2006-01-01

    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  6. Comparison of Attribute Reduction Methods for Coronary Heart Disease Data by Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    ZHENG Gang; HUANG Yalou; WANG Pengtao; SHU Guangfu

    2005-01-01

    Attribute reduction is necessary in decision making system. Selecting right attribute reduction method is more important. This paper studies the reduction effects of principal components analysis (PCA) and system reconstruction analysis (SRA) on coronary heart disease data. The data set contains 1723 records, and 71 attributes in each record. PCA and SRA are used to reduce attributes number (less than 71 ) in the data set. And then decision tree algorithms, C4.5, classification and regression tree ( CART), and chi-square automatic interaction detector ( CHAID), are adopted to analyze the raw data and attribute reduced data. The parameters of decision tree algorithms, including internal node number, maximum tree depth, leaves number, and correction rate are analyzed. The result indicates that, PCA and SRA data can complete attribute reduction work,and the decision-making rate on the reduced data is quicker than that on the raw data; the reduction effect of PCA is better than that of SRA, while the attribute assertion of SRA is better than that of PCA. PCA and SRA methods exhibit goodperformance in selecting and reducing attributes.

  7. Intrusion Detection System using Memtic Algorithm Supporting with Genetic and Decision Tree Algorithms

    OpenAIRE

    K. P. Kaliyamurthie; D. Parameswari; R.M.Suresh

    2012-01-01

    This paper has proposed a technique of combining Decision Tree, Genetic Algorithm, DT-GA and Memtic algorithm to find more accurate models for fitting the behavior of network intrusion detection system. We simulate this sort of integrated algorithm and the results obtained with encouragement to further work.

  8. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  9. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  10. The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring

    Directory of Open Access Journals (Sweden)

    Mohammad Khanbabaei

    2013-11-01

    Full Text Available Decision tree modelling, as one of data mining techniques, is used for credit scoring of bank customers.The main problem is the construction of decision trees that could classify customers optimally. This studypresents a new hybrid mining approach in the designof an effective and appropriate credit scoring model.It is based on genetic algorithm for credit scoringof bank customers in order to offer credit facilities toeach class of customers. Genetic algorithm can helpbanks in credit scoring of customers by selectingappropriate features and building optimum decisiontrees. The new proposed hybrid classification modelisestablished based on a combination of clustering, feature selection, decision trees, and genetic algorithmtechniques. We used clustering and feature selection techniques to pre-process the input samples toconstruct the decision trees in the credit scoringmodel. The proposed hybrid model choices and combinesthe best decision trees based on the optimality criteria. It constructs the final decision tree for creditscoring of customers. Using one credit dataset, results confirm that the classification accuracy of theproposed hybrid classification model is more than almost the entire classification models that have beencompared in this paper. Furthermore, the number ofleaves and the size of the constructed decision tree(i.e. complexity are less, compared with other decision tree models. In this work, one financial dataset waschosen for experiments, including Bank Mellat credit dataset.

  11. An ordering heuristic for building Binary Decision Diagrams for fault-trees

    Energy Technology Data Exchange (ETDEWEB)

    Bouissou, M. [Electricite de France (EDF), 75 - Paris (France)

    1997-12-31

    Binary Decision Diagrams (BDD) have recently made a noticeable entry in the RAMS field. This kind of representation for boolean functions makes possible the assessment of complex fault-trees, both qualitatively (minimal cut-sets search) and quantitatively (exact calculation of top event probability). The object of the paper is to present a pre-processing of the fault-tree which ensures that the results given by different heuristics on the `optimized` fault-tree are not too sensitive to the way the tree is written. This property is based on a theoretical proof. In contrast with some well known heuristics, the method proposed is not based only on intuition and practical experiments. (author) 12 refs.

  12. An ordering heuristic for building Binary Decision Diagrams for fault-trees

    International Nuclear Information System (INIS)

    Binary Decision Diagrams (BDD) have recently made a noticeable entry in the RAMS field. This kind of representation for boolean functions makes possible the assessment of complex fault-trees, both qualitatively (minimal cut-sets search) and quantitatively (exact calculation of top event probability). The object of the paper is to present a pre-processing of the fault-tree which ensures that the results given by different heuristics on the 'optimized' fault-tree are not too sensitive to the way the tree is written. This property is based on a theoretical proof. In contrast with some well known heuristics, the method proposed is not based only on intuition and practical experiments. (author)

  13. The Legacy of Past Tree Planting Decisions for a City Confronting Emerald Ash Borer (Agrilus planipennis) Invasion

    OpenAIRE

    Greene, Christopher S.; Millward, Andrew A.

    2016-01-01

    Management decisions grounded in ecological understanding are essential to the maintenance of a healthy urban forest. Decisions about where and what tree species to plant have both short and long-term consequences for the future function and resilience of city trees. Through the construction of a theoretical damage index, this study examines the legacy effects of a street tree planting program in a densely populated North American city confronting an invasion of emerald ash borer (Agrilus pla...

  14. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    Science.gov (United States)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  15. Analisis Dan Perancangan Sistem Pendukung Keputusan Untuk Menghindari Kredit Macet (Non Performing Loan) Perbankan Menggunakan Algoritma Decision Tree

    OpenAIRE

    Sinuhaji, Andika Rafon

    2010-01-01

    A model of decision maker is needed to help people, especially to make a decission accurate, efficient, and effective, the model called decision support system. The aim of decision support system is to utilize the advantages of human and electronic instrument for solving various unstructured problems. The objective of this study is to avoid non performing loan credit in the proces of granting credit facility. Decision of the study by using decision tree method. The solution method consist of...

  16. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    CERN Document Server

    Rajendran, P

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper. The frequent patterns from the CT scan images are generated by frequent pattern tree (FP-Tree) algorithm that mines the association rules. The decision tree method has been used to classify the medical images for diagnosis. This system enhances the classification process to be more accurate. The hybrid method improves the efficiency of the proposed method than the traditional image mining methods. The experimental result on prediagnosed database of brain images showed 97% sensitivity and 95% accuracy respectively. The ph...

  17. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  18. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  19. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  20. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    Science.gov (United States)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  1. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  2. Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems

    International Nuclear Information System (INIS)

    In this thesis, I studied sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. I investigated on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. In doing so, I obtained a consistent and anytime algorithm, that can easily be combined with existing strong heuristic solvers. (author)

  3. Hyper-Graph Based Documents Categorization on Knowledge from Decision Trees

    Directory of Open Access Journals (Sweden)

    Merjulah Roby

    2012-03-01

    Full Text Available This document has devised a novel representation that compactly captures a Hyper-graph Partitioning and Clustering of the documents based on the weightages. The approach we take integrates data mining and decision making to improve the effectiveness of the approach, we also present a NeC4.5 decision trees. This algorithm is creating the cluster and sub clusters according to the user query. This project is forming sub clustering in the database. Some of the datas in the database may be efficient one, so we are clustering the datas depending upon the ability.

  4. Quantifying human and organizational factors in accident management using decision trees: the HORAAM method

    International Nuclear Information System (INIS)

    In the framework of the level 2 Probabilistic Safety Study (PSA 2) project, the Institute for Nuclear Safety and Protection (IPSN) has developed a method for taking into account Human and Organizational Reliability Aspects during accident management. Actions are taken during very degraded installation operations by teams of experts in the French framework of Crisis Organization (ONC). After describing the background of the framework of the Level 2 PSA, the French specific Crisis Organization and the characteristics of human actions in the Accident Progression Event Tree, this paper describes the method developed to introduce in PSA the Human and Organizational Reliability Analysis in Accident Management (HORAAM). This method is based on the Decision Tree method and has gone through a number of steps in its development. The first one was the observation of crisis center exercises, in order to identify the main influence factors (IFs) which affect human and organizational reliability. These IFs were used as headings in the Decision Tree method. Expert judgment was used in order to verify the IFs, to rank them, and to estimate the value of the aggregated factors to simplify the quantification of the tree. A tool based on Mathematica was developed to increase the flexibility and the efficiency of the study

  5. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    International Nuclear Information System (INIS)

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  6. Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal; Tunç, Murat Mustafa; Tunc, Murat Mustafa

    2012-01-01

    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding ...

  7. An Examination of Mathematically Gifted Students' Learning Styles by Decision Trees

    OpenAIRE

    Esra Aksoy; Serkan Narlı

    2015-01-01

    The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all t...

  8. Comparison of the Bayesian and Randomised Decision Tree Ensembles within an Uncertainty Envelope Technique

    OpenAIRE

    Schetinin, Vitaly; Fieldsend, Jonathan E.; Partridge, Derek; Krzanowski, Wojtek J.; Everson, Richard M.; Bailey, Trevor C; Hernandez, Adolfo

    2005-01-01

    Multiple Classifier Systems (MCSs) allow evaluation of the uncertainty of classification outcomes that is of crucial importance for safety critical applications. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the classifier diversity and the required performance. The interpretability of MCSs can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem t...

  9. Independent Component Analysis and Decision Trees for ECG Holter Recording De-Noising

    OpenAIRE

    Jakub Kuzilek; Vaclav Kremen; Filip Soucek; Lenka Lhotska

    2014-01-01

    We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA). This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE) between origin...

  10. A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

    OpenAIRE

    Ali Idri; Sanaa Elyassami

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the...

  11. A DECISION TREE-BASED CLASSIFICATION APPROACH TO RULE EXTRACTION FOR SECURITY ANALYSIS

    OpenAIRE

    Ren, N.; M. ZARGHAM; Rahimi, S.

    2006-01-01

    Stock selection rules are extensively utilized as the guideline to construct high performance stock portfolios. However, the predictive performance of the rules developed by some economic experts in the past has decreased dramatically for the current stock market. In this paper, C4.5 decision tree classification method was adopted to construct a model for stock prediction based on the fundamental stock data, from which a set of stock selection rules was derived. The experimental results showe...

  12. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    OpenAIRE

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe; Bak, Claus Leth; Thøgersen, Paul

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and futureDanish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then ...

  13. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense

    OpenAIRE

    Pedersen, Ted

    2001-01-01

    This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

  14. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    Directory of Open Access Journals (Sweden)

    Yang Youjin

    2016-01-01

    Full Text Available Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  15. Scenario Analysis, Decision Trees and Simulation for Cost Benefit Analysis of the Cargo Screening Process

    OpenAIRE

    Sherman, Galina; Siebers, Peer-Olaf; Aickelin, Uwe; Menachof, David

    2013-01-01

    In this paper we present our ideas for conducting a cost benefit analysis by using three different methods: scenario analysis, decision trees and simulation. Then we introduce our case study and examine these methods in a real world situation. We show how these tools can be used and what the results are for each of them. Our aim is to conduct a comparison of these different probabilistic methods of estimating costs for port security risk assessment studies. Methodologically, we are trying ...

  16. A Decision Tree Approach to Classify Web Services using Quality Parameters

    OpenAIRE

    Sonawani, Shilpa; Mukhopadhyay, Debajyoti

    2013-01-01

    With the increase in the number of web services, many web services are available on internet providing the same functionality, making it difficult to choose the best one, fulfilling users all requirements. This problem can be solved by considering the quality of web services to distinguish functionally similar web services. Nine different quality parameters are considered. Web services can be classified and ranked using decision tree approach since they do not require long training period and...

  17. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    OpenAIRE

    Tran Hoai Linh; Pham Van Nam; Vuong Hoang Nam

    2014-01-01

    The paper presents a new system for ECG (ElectroCardioGraphy) signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron), modified TSK (Takagi-Sugeno-Kang) and the SVM (Support Vector Machine), will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to...

  18. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    Science.gov (United States)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  19. A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-09-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used, as measures of prediction accuracy, for this study. A series of experiments is reported using Tukutuku software projects dataset. The results are compared with those produced by three crisp versions of decision trees: ID3, C4.5 and CART.

  20. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    Science.gov (United States)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  1. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  2. Teratozoospermia Classification Based on the Shape of Sperm Head Using OTSU Threshold and Decision Tree

    Directory of Open Access Journals (Sweden)

    Masdiyasa I Gede Susrama

    2016-01-01

    Full Text Available Teratozoospermia is one of the results of expert analysis of male infertility, by conducting lab tests microscopically to determine the morphology of spermatozoa, one of which is the normal and abnormal form of the head of spermatozoa. The laboratory test results are in the form of a complete image of spermatozoa. In this study, the shape of the head of spermatozoa was taken from a WHO standards book. The pictures taken had a fairly clear imaging and still had noise, thus to differentiate between the head of normal and abnormal spermatozoa, several processes need to be performed, which include: a pre-process or image adjusting, a threshold segmentation process using Otsu threshold method, and a classification process using a decision tree. Training and test data are presented in stages, from 5 to 20 data. Test results of using Otsu segmentation and a decision tree produced different errors in each level of training data, which were 70%, 75%, and 80% for training data of size 5×2, 10×2, and 20×2, respectively, with an average error of 75%. Thus, this study of using Otsu threshold segmentation and a Decision Tree can classify the form of the head of spermatozoa as abnormal or Normal

  3. A decision tree – based method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  4. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  5. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard;

    2013-01-01

    integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy...... hypothetical conditions influencing their trip decisions, covering the duration of fishing time, choice of fishing ground(s), when to stop fishing and return to port, and the choice of the port for landing. Fleet-based energy and economy efficiency are linked to the decision (choice) dynamics. Larger fuel...... efficiency for the value of catch per unit of fuel consumed is analysed by merging the questionnaire, logbook and VMS (vessel monitoring system) information. Logic decision trees and conditional behaviour probabilities are established from the responses of fishermen regarding a range of sequential...

  6. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients

    OpenAIRE

    Freitas, Alex. A.; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Background Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug’s distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been...

  7. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  8. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    Science.gov (United States)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  9. Effective use of Fibro Test to generate decision trees in hepatitis C

    Institute of Scientific and Technical Information of China (English)

    Dana Lau-Corona; Luís Alberto Pineda; Héctor Hugo Aviés; Gabriela Gutiérrez-Reyes; Blanca Eugenia Farfan-Labonne; Rafael Núnez-Nateras; Alan Bonder; Rosalinda Martínez-García; Clara Corona-Lau; Marco Antonio Olivera-Martíanez; Maria Concepción Gutiérrez-Ruiz; Guillermo Robles-Díaz; David Kershenobich

    2009-01-01

    AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein,haptoglobin, α2 macroglobulin, and γ-glutamyl FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of F0 and F4 were classified with very high accuracy (18/20 for F0, 9/9 for F0-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in F0 and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression.transpeptidase were used as predictors, and the FibroTest

  10. Normal form backward induction for decision trees with coherent lower previsions

    CERN Document Server

    Huntley, Nathan

    2011-01-01

    We examine normal form solutions of decision trees under typical choice functions induced by lower previsions. For large trees, finding such solutions is hard as very many strategies must be considered. In an earlier paper, we extended backward induction to arbitrary choice functions, yielding far more efficient solutions, and we identified simple necessary and sufficient conditions for this to work. In this paper, we show that backward induction works for maximality and E-admissibility, but not for interval dominance and Gamma-maximin. We also show that, in some situations, a computationally cheap approximation of a choice function can be used, even if the approximation violates the conditions for backward induction; for instance, interval dominance with backward induction will yield at least all maximal normal form solutions.

  11. Antibiogram-Derived Radial Decision Trees: An Innovative Approach to Susceptibility Data Display

    Directory of Open Access Journals (Sweden)

    Rocco J. Perla

    2005-01-01

    Full Text Available Hospital antibiograms (ABGMs are often presented in the form of large 2-factor (single organism vs. single antimicrobial tables. Presenting susceptibility data in this fashion, although of value, does have limitations relative to drug resistant subpopulations. As the crisis of antimicrobial drug-resistance continues to escalate globally, clinicians need (1 to have access to susceptibility data that, for isolates resistant to first-line drugs, indicates susceptibility to second line drugs and (2 to understand the probabilities of encountering such organisms in a particular institution. This article describes a strategy used to transform data in a hospital ABGM into a probability-based radial decision tree (RDT that can be used as a guide to empiric antimicrobial therapy. Presenting ABGM data in the form of a radial decision tree versus a table makes it easier to visually organize complex data and to demonstrate different levels of therapeutic decision-making. The RDT model discussed here may also serve as a more effective tool to understand the prevalence of different resistant subpopulations in a given institution compared to the traditional ABGM.

  12. Performance Evaluation of Discriminant Analysis and Decision Tree, for Weed Classification of Potato Fields

    Directory of Open Access Journals (Sweden)

    Farshad Vesali

    2012-09-01

    Full Text Available In present study we tried to recognizing weeds in potato fields to effective use from herbicides. As we know potato is one of the crops which is cultivated vastly all over the world and it is a major world food crop that is consumed by over one billion people world over, but it is threated by weed invade, because of row cropping system applied in potato tillage. Machine vision is used in this research for effective application of herbicides in field. About 300 color images from 3 potato farms of Qorveh city and 2 farms of Urmia University-Iran, was acquired. Images were acquired in different illumination condition from morning to evening in sunny and cloudy days. Because of overlap and shading of plants in farm condition it is hard to use morphologic parameters. In method used for classifying weeds and potato plants, primary color components of each plant were extracted and the relation between them was estimated for determining discriminant function and classifying plants using discrimination analysis. In addition the decision tree method was used to compare results with discriminant analysis. Three different classifications were applied: first, Classification was applied to discriminate potato plant from all other weeds (two groups, the rate of correct classification was 76.67% for discriminant analysis and 83.82% for decision tree; second classification was applied to discriminate potato plant from separate groups of each weed (6 groups, the rate of correct classification was 87%. And the third, Classification of potato plant versus weed species one by one. As the weeds were different, the results of classification were different in this composition. The decision tree in all conditions showed the better result than discriminant analysis.

  13. COMPARING THE PERFORMANCE OF SEMANTIC IMAGE RETRIEVAL USING SPARQL QUERY, DECISION TREE ALGORITHM AND LIRE

    Directory of Open Access Journals (Sweden)

    Magesh

    2013-01-01

    Full Text Available The ontology based framework is developed for representing image domain. The textual features of images are extracted and annotated as the part of the ontology. The ontology is represented in Web Ontology Language (OWL format which is based on Resource Description Framework (RDF and Resource Description Framework Schema (RDFS. Internally, the RDF statements represent an RDF graph which provides the way to represent the image data in a semantic manner. Various tools and languages are used to retrieve the semantically relevant textual data from ontology model. The SPARQL query language is more popular methods to retrieve the textual data stored in the ontology. The text or keyword based search is not adequate for retrieving images. The end users are not able to convey the visual features of an image in SPARQL query form. Moreover, the SPARQL query provides more accurate results by traversing through RDF graph. The relevant images cannot be retrieved by one to one mapping. So the relevancy can be provided by some kind of onto mapping. The relevancy is achieved by applying a decision tree algorithm. This study proposes methods to retrieve the images from ontology and compare the image retrieval performance by using SPARQL query language, decision tree algorithm and Lire which is an open source image search engine. The SPARQL query language is used to retrieving the semantically relevant images using keyword based annotation and the decision tree algorithms are used in retrieving the relevant images using visual features of an image. Lastly, the image retrieval efficiency is compared and graph is plotted to indicate the efficiency of the system.

  14. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    Energy Technology Data Exchange (ETDEWEB)

    Elmer, John; Butherus, Michael [S.M. Stoller Corporation (United States); Barr, Deborah L. [U.S. Department of Energy Office of Legacy Management (United States)

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining

  15. Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    Jing-Fen Zhang; Si-Min He; Jin-Jin Cai; Xing-Jun Cao; Rui-Xiang Sun; Yan Fu; Rong Zeng; Wen Gao

    2005-01-01

    In this study, we present a preprocessing method for quadrupole time-of-flight(Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.

  16. Improvement and analysis of ID3 algorithm in decision-making tree

    Science.gov (United States)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  17. A comprehensive decision approach for rubber tree planting management in Africa

    OpenAIRE

    Valognes, Fabrice; Ferrer, Hélène; Diaby, Moussa; Clément-Demange, André

    2011-01-01

    International audience The main objective of this study is to settle a rigorous field of decision analysis for rubber tree clones selection. Nowadays, there does not exist any process based upon a rigorous method to select the best clone to be plant in order to get the highest return on investment. The only known selection method is to use the experience of different protagonists acting in the plantation. So, we need a tool that takes into account very important criteria in order to achiev...

  18. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe;

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented...... have significant impact on dynamic security of Danish power system in future, if alternative measures are not considered seriously....

  19. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    Science.gov (United States)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  20. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  1. Decision tree analysis of factors influencing rainfall-related building damage

    Directory of Open Access Journals (Sweden)

    M. H. Spekkers

    2014-04-01

    Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of

  2. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  3. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  4. The Legacy of Past Tree Planting Decisions for a City Confronting Emerald Ash Borer (Agrilus planipennis Invasion

    Directory of Open Access Journals (Sweden)

    Christopher Sean Greene

    2016-03-01

    Full Text Available Management decisions grounded in ecological understanding are essential to the maintenance of a healthy urban forest. Decisions about where and what tree species to plant have both short and long-term consequences for the future function and resilience of city trees. Through the construction of a theoretical damage index, this study examines the legacy effects of a street tree planting program in a densely populated North American city confronting an invasion of emerald ash borer (Agrilus planipennis. An investigation of spatial autocorrelation for locations of high damage potential across the City of Toronto, Canada was then conducted using Getis-Ord Gi*. Significant spatial clustering of high damage index values affirmed that past urban tree planting practices placing little emphasis on species diversity have created time-lagged consequences of enhanced vulnerability of trees to insect pests. Such consequences are observed at the geographically local scale, but can easily cascade to become multi-scalar in their spatial extent. The theoretical damage potential index developed in this study provides a framework for contextualizing historical urban tree planting decisions where analysis of damage index values for Toronto reinforces the importance of urban forest management that prioritizes proactive tree planting strategies that consider species diversity in the context of planting location.

  5. Snow event classification with a 2D video disdrometer - A decision tree approach

    Science.gov (United States)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  6. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  7. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  8. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    Directory of Open Access Journals (Sweden)

    Wan-Yu Chang

    2015-09-01

    Full Text Available In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  9. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  10. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    Directory of Open Access Journals (Sweden)

    Jörg Huwyler

    2012-08-01

    Full Text Available Predicting blood-brain barrier (BBB permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction on both descriptor sets. Models with a corrected classification rate (CCR of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP and charge (polar surface area, which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  11. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    Directory of Open Access Journals (Sweden)

    Malueka Rusdy

    2012-03-01

    Full Text Available Abstract Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.

  12. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    Directory of Open Access Journals (Sweden)

    B. Gessesse

    2015-11-01

    Full Text Available Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001. Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  13. The effect of the fragmentation problem in decision tree learning applied to the search for single top quark production

    International Nuclear Information System (INIS)

    Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.

  14. Development of a computer code AFTC for fault tree construction using decision table method and super component concept

    International Nuclear Information System (INIS)

    In system reliability analysis, the construction of fault trees is necessary and has still remained a manual task which usually requires the bulk of time. In this paper, the computer code AFTC is developed to automatically generate fault trees, which will result in the saving of time and efforts to construct fault trees. In the AFTC, components are modeled using decision tables and a system is modeled using flow diagrams. A decision table describes relations between inputs, internals and outputs of a component, and a flow diagram describes connections between components of a system. Super component concept is introduced to model a small subsystem as one component. For common cause failure modeling, the Basic Parameter Method or Binomial Failure Rate Method can be used. The final fault tree is generated using modularization techniques. (author)

  15. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    OpenAIRE

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe; Bak, Claus Leth; Thøgersen, Paul

    2013-01-01

    Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time-domain simulation and the process of data mining, which is then implemented online as guidelines for preventive control schemes. An algorithm named Classification and Regression Trees (CART) is used t...

  16. A decision-making framework for protecting process plants from flooding based on fault tree analysis

    International Nuclear Information System (INIS)

    The protection of process plants from external events is mandatory in the Seveso Directive. Among these events figures the possibility of inundation of a plant, which may cause a hazard by disabling technical components and obviating operator interventions. A methodological framework for dealing with hazards from potential flooding events is presented. It combines an extension of the fault tree method with generic properties of flooding events in rivers and of dikes, which should be adapted to site-specific characteristics in a concrete case. Thus, a rational basis for deciding whether upgrading is required or not and which of the components should be upgraded is provided. Both the deterministic and the probabilistic approaches are compared. Preference is given to the probabilistic one. The conclusions drawn naturally depend on the scope and detail of the model calculations and the decision criterion adopted. The latter has to be supplied from outside the analysis, e.g. by the analyst himself, the plant operator or the competent authority. It turns out that decision-making is only viable if the boundary conditions for both the procedure of analysis and the decision criterion are clear.

  17. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment

    Directory of Open Access Journals (Sweden)

    Hu Yuh-Jyh

    2012-11-01

    Full Text Available Abstract Background Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA, which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. Methods The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. Results The prediction accuracies of total analgesic consumption (continuous dose and PCA dose and PCA analgesic requirement (PCA dose only by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. Conclusion This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA

  18. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    Science.gov (United States)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, Ppoverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  19. StructBoost: Boosting Methods for Predicting Structured Output Variables.

    Science.gov (United States)

    Chunhua Shen; Guosheng Lin; van den Hengel, Anton

    2014-10-01

    Boosting is a method for learning a single accurate predictor by linearly combining a set of less accurate weak learners. Recently, structured learning has found many applications in computer vision. Inspired by structured support vector machines (SSVM), here we propose a new boosting algorithm for structured output prediction, which we refer to as StructBoost. StructBoost supports nonlinear structured learning by combining a set of weak structured learners. As SSVM generalizes SVM, our StructBoost generalizes standard boosting approaches such as AdaBoost, or LPBoost to structured learning. The resulting optimization problem of StructBoost is more challenging than SSVM in the sense that it may involve exponentially many variables and constraints. In contrast, for SSVM one usually has an exponential number of constraints and a cutting-plane method is used. In order to efficiently solve StructBoost, we formulate an equivalent 1-slack formulation and solve it using a combination of cutting planes and column generation. We show the versatility and usefulness of StructBoost on a range of problems such as optimizing the tree loss for hierarchical multi-class classification, optimizing the Pascal overlap criterion for robust visual tracking and learning conditional random field parameters for image segmentation. PMID:26352637

  20. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees

    Science.gov (United States)

    Sayegh, Arwa; Tate, James E.; Ropkins, Karl

    2016-02-01

    Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper

  1. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  2. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  3. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  4. Decision tree method applied to computerized prediction of ternary intermetallic compounds

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Decision tree method and atomic parameters were used to find the regularities of the formation of ternary intermetallic compounds in alloy systems. The criteria of formation can be expressed by a group of inequalities with two kinds of atomic parameters Zl (number of valence electrons in the atom of constituent element) and Ri/Rj (ratio of the atomic radius of constituent element i and j) as independent variables. The data of 2238 known ternary alloy systems were used to extract the empirical rules governing the formation of ternary intermetallic compounds, and the facts of ternary compound formation of other 1334 alloy systems were used as samples to test the reliability of the empirical criteria found. The rate of correctness of prediction was found to be nearly 95%. An expert system for ternary intermetallic compound formation was built and some prediction results of the expert system were confirmed.

  5. A Decision Tree Based Pedometer and its Implementation on the Android Platform

    Directory of Open Access Journals (Sweden)

    Juanying Lin

    2015-02-01

    Full Text Available This paper describes a decision tree (DT based ped ometer algorithm and its implementation on Android. The DT- based pedometer can classify 3 gai t patterns, including walking on level ground (WLG, up stairs (WUS and down stairs (WDS . It can discard irrelevant motion and count user’s steps accurately. The overall classifi cation accuracy is 89.4%. Accelerometer, gyroscope and magnetic field sensors are used in th e device. When user puts his/her smart phone into the pocket, the pedometer can automatica lly count steps of different gait patterns. Two methods are tested to map the acceleration from mobile phone’s reference frame to the direction of gravity. Two significant features are employed to classify different gait patterns.

  6. Multi-output decision trees for lesion segmentation in multiple sclerosis

    Science.gov (United States)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  7. Fault diagnosis method for nuclear power plant based on decision tree and neighborhood rough sets

    International Nuclear Information System (INIS)

    Nuclear power plants (NPP) are very complex system, which need to collect and monitor vast parameters. It's hard to diagnose the faults. A parameter reduction method based on neighborhood rough sets was proposed according to the problem. Granular computing was realized in a real space, so numerical parameters could be directly processed. On this basis, the decision tree was applied to learn from training samples which were the typical faults of nuclear power plant, i.e., loss of coolant accident, feed water pipe rupture,steam generator tube rupture, main steam pipe rupture, and diagnose by using the acquired knowledge. Then the diagnostic results were compared with the results of support vector machine. The simulation results show that this method can rapidly and accurately diagnose the above mentioned faults of the NPP. (authors)

  8. Simulation of human behavior elements in a virtual world using decision trees

    Directory of Open Access Journals (Sweden)

    Sandra Mercado Pérez

    2013-05-01

    Full Text Available Human behavior refers to the way an individual responds to certain events or occurrences, naturally cannot predict how an individual can act, for it the computer simulation is used. This paper presents the development of the simulation of five possible human reactions within a virtual world, as well as the steps needed to create a decision tree that supports the selection of any of any of these reactions. For that creation it proposes three types of attributes, those are the personality, the environment and the level of reaction. The virtual world Second Life was selected because of its internal programming language LSL (Linden Scripting Language which allows the execution of predefined animation sequences or creates your own.

  9. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Meisam Azad-Manjiri

    2014-04-01

    Full Text Available Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory and uses decision tree algorithm to abstract relationships between ethical principles and morality of actions. We apply this architecture to build an agent that provides guidance to health care workers faced with ethical dilemmas. Our results show that the agent is able to learn ethic well.

  10. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    Directory of Open Access Journals (Sweden)

    Tran Hoai Linh

    2014-09-01

    Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution

  11. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  12. A Genetic Algorithm Optimized Decision Tree-SVM based Stock Market Trend Prediction System

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2010-12-01

    Full Text Available Prediction of stock market trends has been an area of great interest both to researchers attempting to uncover the information hidden in the stock market data and for those who wish to profit by trading stocks. The extremely nonlinear nature of the stock market data makes it very difficult to design a system that can predict the future direction of the stock market with sufficient accuracy. This work presents a data mining based stock market trend prediction system, which produces highly accurate stock market forecasts. The proposed system is a genetic algorithm optimized decision tree-support vector machine (SVM hybrid, which can predict one-day-ahead trends in stockmarkets. The uniqueness of the proposed system lies in the use ofthe hybrid system which can adapt itself to the changing market conditions and in the fact that while most of the attempts at stockmarket trend prediction have approached it as a regression problem, present study converts the trend prediction task into a classification problem, thus improving the prediction accuracysignificantly. Performance of the proposed hybrid system isvalidated on the historical time series data from the Bombaystock exchange sensitive index (BSE-Sensex. The system performance is then compared to that of an artificial neural network (ANN based system and a naïve Bayes based system. It is found that the trend prediction accuracy is highest for the hybrid system and the genetic algorithm optimized decision tree- SVM hybrid system outperforms both the artificial neural network and the naïve bayes based trend prediction systems.

  13. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  14. A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree Induction and BPN

    Directory of Open Access Journals (Sweden)

    S. Pitchumani Angayarkanni, V. Saravanan

    2011-02-01

    Full Text Available An intelligent computer-aided diagnosis system can be very helpful for radiologist indetecting and diagnosing micro calcifications patterns earlier and faster than typicalscreening programs. In this paper, we present a system based on fuzzy-C Meansclustering and feature extraction techniques using texture based segmentation andgenetic algorithm for detecting and diagnosing micro calcifications’ patterns in digitalmammograms. We have investigated and analyzed a number of feature extractiontechniques and found that a combination of three features, such as entropy,standard deviation, and number of pixels, is the best combination to distinguish abenign micro calcification pattern from one that is malignant. A fuzzy C Meanstechnique in conjunction with three features was used to detect a micro calcificationpattern and a neural network to classify it into benign/malignant. The system wasdeveloped on a Windows platform. It is an easy to use intelligent system that givesthe user options to diagnose, detect, enlarge, zoom, and measure distances of areasin digital mammograms. The present study focused on the investigation of theapplication of artificial intelligence and data mining techniques to the predictionmodels of breast cancer. The artificial neural network, decision tree, Fuzzy C Means,and genetic algorithm were used for the comparative studies and the accuracy andpositive predictive value of each algorithm were used as the evaluation indicators.699 records acquired from the breast cancer patients at the MIAS database, 9predictor variables, and 1 outcome variable were incorporated for the data analysisfollowed by the 10-fold cross-validation. The results revealed that the accuracies ofFuzzy C Means were 0.9534 (sensitivity 0.98716 and specificity 0.9582, thedecision tree model 0.9634 (sensitivity 0.98615, specificity 0.9305, the neuralnetwork model 0.96502 (sensitivity 0.98628, specificity 0.9473, the geneticalgorithm model 0.9878 (sensitivity 1

  15. Learning Dispatching Rules for Scheduling: A Synergistic View Comprising Decision Trees, Tabu Search and Simulation

    Directory of Open Access Journals (Sweden)

    Atif Shahzad

    2016-02-01

    Full Text Available A promising approach for an effective shop scheduling that synergizes the benefits of the combinatorial optimization, supervised learning and discrete-event simulation is presented. Though dispatching rules are in widely used by shop scheduling practitioners, only ordinary performance rules are known; hence, dynamic generation of dispatching rules is desired to make them more effective in changing shop conditions. Meta-heuristics are able to perform quite well and carry more knowledge of the problem domain, however at the cost of prohibitive computational effort in real-time. The primary purpose of this research lies in an offline extraction of this domain knowledge using decision trees to generate simple if-then rules that subsequently act as dispatching rules for scheduling in an online manner. We used similarity index to identify parametric and structural similarity in problem instances in order to implicitly support the learning algorithm for effective rule generation and quality index for relative ranking of the dispatching decisions. Maximum lateness is used as the scheduling objective in a job shop scheduling environment.

  16. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  17. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  18. Decision Tree and Texture Analysis for Mapping Debris-Covered Glaciers in the Kangchenjunga Area, Eastern Himalaya

    Directory of Open Access Journals (Sweden)

    Adina Racoviteanu

    2012-10-01

    Full Text Available In this study we use visible, short-wave infrared and thermal Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER data validated with high-resolution Quickbird (QB and Worldview2 (WV2 for mapping debris cover in the eastern Himalaya using two independent approaches: (a a decision tree algorithm, and (b texture analysis. The decision tree algorithm was based on multi-spectral and topographic variables, such as band ratios, surface reflectance, kinetic temperature from ASTER bands 10 and 12, slope angle, and elevation. The decision tree algorithm resulted in 64 km2 classified as debris-covered ice, which represents 11% of the glacierized area. Overall, for ten glacier tongues in the Kangchenjunga area, there was an area difference of 16.2 km2 (25% between the ASTER and the QB areas, with mapping errors mainly due to clouds and shadows. Texture analysis techniques included co-occurrence measures, geostatistics and filtering in spatial/frequency domain. Debris cover had the highest variance of all terrain classes, highest entropy and lowest homogeneity compared to the other classes, for example a mean variance of 15.27 compared to 0 for clouds and 0.06 for clean ice. Results of the texture image for debris-covered areas were comparable with those from the decision tree algorithm, with 8% area difference between the two techniques.

  19. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  20. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features

    Directory of Open Access Journals (Sweden)

    D. Mudali

    2015-01-01

    Full Text Available Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson’s disease, multiple system atrophy, and progressive supranuclear palsy compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (fMRI data.

  1. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    Science.gov (United States)

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising. PMID:25865524

  2. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva;

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study to...

  3. Trees

    CERN Document Server

    Epstein, Henri

    2016-01-01

    An algebraic formalism, developped with V.~Glaser and R.~Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  4. An application of the value tree analysis methodology within the integrated risk informed decision making for the nuclear facilities

    International Nuclear Information System (INIS)

    A new framework of integrated risk informed decision making (IRIDM) has been recently developed in order to improve the risk management of the nuclear facilities. IRIDM is a process in which qualitatively different inputs, corresponding to different types of risk, are jointly taken into account. However, the relative importance of the IRIDM inputs and their influence on the decision to be made is difficult to be determined quantitatively. An improvement of this situation can be achieved by application of the Value Tree Analysis (VTA) methods. The aim of this article is to present the VTA methodology in the context of its potential usage in the decision making on nuclear facilities. The benefits of the VTA application within the IRIDM process were identified while making the decision on fuel conversion of the research reactor MARIA. - Highlights: • New approach to risk informed decision making on nuclear facilities was postulated. • Value tree diagram was developed for decision processes on nuclear installations. • An experiment was performed to compare the new approach with the standard one. • Benefits of the new approach were reached in fuel conversion of a research reactor. • The new approach makes the decision making process more transparent and auditable

  5. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  6. A Fuzzy Optimization Technique for the Prediction of Coronary Heart Disease Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Persi Pamela. I

    2013-06-01

    Full Text Available Data mining along with soft computing techniques helps to unravel hidden relationships and diagnose diseases efficiently even with uncertainties and inaccuracies. Coronary Heart Disease (CHD is akiller disease leading to heart attack and sudden deaths. Since the diagnosis involves vague symptoms and tedious procedures, diagnosis is usually time-consuming and false diagnosis may occur. A fuzzy system is one of the soft computing methodologies is proposed in this paper along with a data mining technique for efficient diagnosis of coronary heart disease. Though the database has 76 attributes, only 14 attributes are found to be efficient for CHD diagnosis as per all the published experiments and doctors’ opinion. So only the essential attributes are taken from the heart disease database. From these attributes crisp rules are obtained by employing CART decision tree algorithm, which are then applied to the fuzzy system. A Particle Swarm Optimization (PSO technique is applied for the optimization of the fuzzy membership functions where the parameters of the membership functions are altered to new positions. The result interpreted from the fuzzy system predicts the prevalence of coronary heart disease and also the system’s accuracy was found to be good.

  7. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  8. Using Hybrid Decision Tree -Houph Transform Approach For Automatic Bank Check Processing

    Directory of Open Access Journals (Sweden)

    Heba A. Elnemr

    2012-05-01

    Full Text Available One of the first steps in the realization of an automatic system of bank check processing is the automatic classification of checks and extraction of handwritten area. This paper presents a new hybrid method which couple together the statistical color histogram features, the entropy, the energy and the Houph transform to achieve the automatic classification of checks as well as the segmentation and recognition of the various information on the check. The proposed method relies on two stages. First, a two-step classification algorithm is implemented. In the first step, a decision classification tree is built using the entropy, the energy, the logo location and histogram features of colored bank checks. These features are used to classify checks into several groups. Each group may contain one or more type of checks. Therefore, in the second step the bank logo or bank name are matched against its stored template to identify the correct prototype. Second, Hough transform is utilized to detect lines in the classified checks. These lines are used as indicator to the bank check fields. A group of experiments is performed showing that the proposed technique is promising as regards classifying the bank checks and extracting the important fields in that check.

  9. Quantitative analysis of dynamic fault trees using improved Sequential Binary Decision Diagrams

    International Nuclear Information System (INIS)

    Dynamic fault trees (DFTs) are powerful in modeling systems with sequence- and function dependent failure behaviors. The key point lies in how to quantify complex DFTs analytically and efficiently. Unfortunately, the existing methods for analyzing DFTs all have their own disadvantages. They either suffer from the problem of combinatorial explosion or need a long computation time to obtain an accurate solution. Sequential Binary Decision Diagrams (SBDDs) are regarded as novel and efficient approaches to deal with DFTs, but their two apparent shortcomings remain to be handled: That is, SBDDs probably generate invalid nodes when given an unpleasant variable index and the scale of the resultant cut sequences greatly relies on the chosen variable index. An improved SBDD method is proposed in this paper to deal with the two mentioned problems. It uses an improved ite (If-Then-Else) algorithm to avoid generating invalid nodes when building SBDDs, and a heuristic variable index to keep the scale of resultant cut sequences as small as possible. To confirm the applicability and merits of the proposed method, several benchmark examples are demonstrated, and the results indicate this approach is efficient as well as reasonable. - Highlights: • New ITE method. • Linear complexity-based finding algorithm. • Heuristic variable index

  10. Childhood Cancer-a Hospital based study using Decision Tree Techniques

    Directory of Open Access Journals (Sweden)

    K. Kalaivani

    2011-01-01

    Full Text Available Problem statement: Cancer is generally regarded as a disease of adults. But there being a higher proportion of childhood cancer (ALL-Acute Lymphoblastic Leukemia in India. The incidence of childhood cancer has increased over the last 25 years, but the increase is much larger in females. The aim was to increase our understanding of the determinants of south Indian parental reactions and needs. This facilitates the development of the care and follow-up routines for families, paying attention to both individual risk and resilience factors and to ways in which limitations related to treatment centre and organizational characteristics could be compensated. Approach: Decision Trees may be used for classification, clustering, affinity, grouping, prediction or estimation and description. One of the useful medical applications in India is the management of Leukemia, as it accounts for about 33% of childhood malignancies. Results: Female survivors showed greater functional disability in comparison to male survivors-demonstrated by poorer overall health status. Family stress results from a perceived imbalance between the demands on the family and the resources available to meet such demands. Conclusion: The pattern and severity of health and functional outcomes differed significantly between survivors in diagnostic subgroups. Family impact was aggravated by patients’ lasting sequelae and by parent perceived shortcomings of long-term follow-up. Female survivors were at greater risk for health related late effects.

  11. A decision-tree-based model for evaluating the thermal comfort of horses

    Directory of Open Access Journals (Sweden)

    Ana Paula de Assis Maia

    2013-12-01

    Full Text Available Thermal comfort is of great importance in preserving body temperature homeostasis during thermal stress conditions. Although the thermal comfort of horses has been widely studied, there is no report of its relationship with surface temperature (T S. This study aimed to assess the potential of data mining techniques as a tool to associate surface temperature with thermal comfort of horses. T S was obtained using infrared thermography image processing. Physiological and environmental variables were used to define the predicted class, which classified thermal comfort as "comfort" and "discomfort". The variables of armpit, croup, breast and groin T S of horses and the predicted classes were then subjected to a machine learning process. All variables in the dataset were considered relevant for the classification problem and the decision-tree model yielded an accuracy rate of 74 %. The feature selection methods used to reduce computational cost and simplify predictive learning decreased model accuracy to 70 %; however, the model became simpler with easily interpretable rules. For both these selection methods and for the classification using all attributes, armpit and breast T S had a higher power rating for predicting thermal comfort. Data mining techniques show promise in the discovery of new variables associated with the thermal comfort of horses.

  12. A reduction approach to improve the quantification of linked fault trees through binary decision diagrams

    International Nuclear Information System (INIS)

    Over the last two decades binary decision diagrams have been applied successfully to improve Boolean reliability models. Conversely to the classical approach based on the computation of the MCS, the BDD approach involves no approximation in the quantification of the model and is able to handle correctly negative logic. However, when models are sufficiently large and complex, as for example the ones coming from the PSA studies of the nuclear industry, it begins to be unfeasible to compute the BDD within a reasonable amount of time and computer memory. Therefore, simplification or reduction of the full model has to be considered in some way to adapt the application of the BDD technology to the assessment of such models in practice. This paper proposes a reduction process based on using information provided by the set of the most relevant minimal cutsets of the model in order to perform the reduction directly on it. This allows controlling the degree of reduction and therefore the impact of such simplification on the final quantification results. This reduction is integrated in an incremental procedure that is compatible with the dynamic generation of the event trees and therefore adaptable to the recent dynamic developments and extensions of the PSA studies. The proposed method has been applied to a real case study, and the results obtained confirm that the reduction enables the BDD computation while maintaining accuracy.

  13. Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    CERN Document Server

    Ball, N M; Myers, A D; Tcheng, D; Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-01-01

    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness an...

  14. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    Science.gov (United States)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  15. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  16. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  17. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    Science.gov (United States)

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  18. Autoencoder Trees

    OpenAIRE

    İrsoy, Ozan; Alpaydın, Ethem

    2014-01-01

    We discuss an autoencoder model in which the encoding and decoding functions are implemented by decision trees. We use the soft decision tree where internal nodes realize soft multivariate splits given by a gating function and the overall output is the average of all leaves weighted by the gating values on their path. The encoder tree takes the input and generates a lower dimensional representation in the leaves and the decoder tree takes this and reconstructs the original input. Exploiting t...

  19. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    Science.gov (United States)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  20. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    OpenAIRE

    B. Gessesse; W. Bewket; A. Bräuning

    2015-01-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and bin...

  1. Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models

    OpenAIRE

    Dieu Tien Bui; Biswajeet Pradhan; Owe Lofman; Inge Revhaug

    2012-01-01

    The objective of this study is to investigate and compare the results of three data mining approaches, the support vector machines (SVM), decision tree (DT), and Naïve Bayes (NB) models for spatial prediction of landslide hazards in the Hoa Binh province (Vietnam). First, a landslide inventory map showing the locations of 118 landslides was constructed from various sources. The landslide inventory was then randomly partitioned into 70% for training the models and 30% for the model validation....

  2. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  3. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    Science.gov (United States)

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  4. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    Science.gov (United States)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  5. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  6. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    Science.gov (United States)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  7. Skin autofluorescence based decision tree in detection of impaired glucose tolerance and diabetes.

    Directory of Open Access Journals (Sweden)

    Andries J Smit

    Full Text Available AIM: Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons. METHODS: Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE. RESULTS: 218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%. CONCLUSION: SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.

  8. An Empirical Comparison of Boosting and Bagging Algorithms

    Directory of Open Access Journals (Sweden)

    R. Kalaichelvi Chandrahasan

    2011-11-01

    Full Text Available Classification is one of the data mining techniques that analyses a given data set and induces a model for each class based on their features present in the data. Bagging and boosting are heuristic approaches to develop classification models. These techniques generate a diverse ensemble of classifiers by manipulating the training data given to a base learning algorithm. They are very successful in improving the accuracy of some algorithms in artificial and real world datasets. We review the algorithms such as AdaBoost, Bagging, ADTree, and Random Forest in conjunction with the Meta classifier and the Decision Tree classifier. Also we describe a large empirical study by comparing several variants. The algorithms are analyzed on Accuracy, Precision, Error Rate and Execution Time.

  9. Genetic program based data mining of fuzzy decision trees and methods of improving convergence and reducing bloat

    Science.gov (United States)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  10. Condition monitoring on grinding wheel wear using wavelet analysis and decision tree C4.5 algorithm

    Directory of Open Access Journals (Sweden)

    S.Devendiran

    2013-10-01

    Full Text Available A new online grinding wheel wear monitoring approach to detect a worn out wheel, based on acoustic emission (AE signals processed by discrete wavelet transform and statistical feature extraction carried out using statistical features such as root mean square and standard deviation for each wavelet decomposition level and classified using tree based knowledge representation methodology decision tree C4.5 data mining techniques is proposed. The methodology was validate with AE signal data obtained in Aluminium oxide 99 A(38A grinding wheel which is used in three quarters of majority grinding operations under different grinding conditions to validate the proposed classification system. The results of this scheme with respect to classification accuracy were discussed.

  11. Combined prediction model for supply risk in nuclear power equipment manufacturing industry based on support vector machine and decision tree

    International Nuclear Information System (INIS)

    The prediction index for supply risk is developed based on the factor identifying of nuclear equipment manufacturing industry. The supply risk prediction model is established with the method of support vector machine and decision tree, based on the investigation on 3 important nuclear power equipment manufacturing enterprises and 60 suppliers. Final case study demonstrates that the combination model is better than the single prediction model, and demonstrates the feasibility and reliability of this model, which provides a method to evaluate the suppliers and measure the supply risk. (authors)

  12. Analysis of Human Papillomavirus Using Datamining - Apriori, Decision Tree, and Support Vector Machine (SVM and its Application Field

    Directory of Open Access Journals (Sweden)

    Cho Younghoon

    2016-01-01

    Full Text Available Human Papillomavirus(HPV has various types (compared to other viruses and plays a key role in evoking diverse diseases, especially cervical cancer. In this study, we aim to distinguish the features of HPV of different degree of fatality by analyzing their DNA sequences. We used Decision Tree Algorithm, Apriori Algorithm, and Support Vector Machine in our experiment. By analyzing their DNA sequences, we discovered some relationships between certain types of HPV, especially on the most fatal types, 16 and 18. Moreover, we concluded that it would be possible for scientists to develop more potent HPV cures by applying these relationships and features that HPV virus exhibit.

  13. An analysis and study of decision tree induction operating under adaptive mode to enhance accuracy and uptime in a dataset introduced to spontaneous variation in data attributes

    Directory of Open Access Journals (Sweden)

    Uttam Chauhan

    2011-01-01

    Full Text Available Many methods exist for the purpose of classification of an unknown dataset. Decision tree induction is one of the well-known methods for classification. Decision tree method operates under two different modes: nonadaptive and adaptive mode. The non adaptive mode of operation is applied when the data set is completely mature and available or the data set is static and their will be no changes in dataset attributes. However when the dataset is likely to have changes in the values and attributes leading to fluctuation i.e., monthly, quarterly or annually, then under the circumstances decision tree method operating under adaptive mode needs to be applied, as the conventional non-adaptive method fails, as it needs to be applied once again starting from scratch on the augmented dataset. This makes things expensive in terms of time and space. Sometimes attributesare added into the dataset, at the same time number of records also increases. This paper mainly studies the behavioral aspects of classification model particularly, when number of attr bute in dataset increase due to spontaneous changes in the value(s/attribute(s. Our investigative studies have shown that accuracy of decision tree model can be maintained when number of attributes including class increase in dataset which increases thenumber of records as well. In addition, accuracy also can be maintained when number of values increase in class attribute of dataset. The way Adaptive mode decision tree method operates is that it reads data instance by instance and incorporates the same through absorption to the said model; update the model according to valueof attribute particular and specific to the instance. As the time required to updating decision tree can be less than introducing it from scratch, thus eliminating the problem of introducing decision tree repeatedly from scratch and at the same time gaining upon memory and time.

  14. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    Directory of Open Access Journals (Sweden)

    Hedayetul Islam Shovon

    2012-08-01

    Full Text Available Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student’s GPA and based on that instructor can take necessary step to improve student academic performance

  15. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  16. Network Traffic Classification Using SVM Decision Tree%基于SVM决策树的网络流量分类

    Institute of Scientific and Technical Information of China (English)

    邱婧; 夏靖波; 柏骏

    2012-01-01

    In order to solve the unrecognized area and long training time problems existed when using Support Vector Machine ( SVM) method in network traffic classification, SVM decision tree was used in network traffic classification by using its advantages in multi-class classification. The authoritative flow data sets were tested. The experiment results show that SVM decision tree method has shorter training time and better classification performance than ordinary "one-on-one" and "one-on-more"SVM method in network traffic classification, whose classification accuracy rate can reach 98. 8%.%提出一种用支持向量机(SVM)决策树来对网络流量进行分类的方法,利用SVM决策树在多类分类方面的优势,解决SVM在流量分类中存在的无法识别区域和训练时间较长的问题.对权威流量数据集进行了测试,实验结果表明,SVM决策树在流量分类中比普通的“一对一”和“一对多”SVM方法具有更短的训练时问和更好的分类性能,分类准确率可以达到98.8%.

  17. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    Directory of Open Access Journals (Sweden)

    Somaya Hashem

    2016-01-01

    Full Text Available Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2 or advanced (F3-F4 fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy.

  18. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  19. Cost-effectiveness of exercise 201Tl myocardial SPECT in patients with chest pain assessed by decision-tree analysis

    International Nuclear Information System (INIS)

    To evaluate the potential cost-effectiveness of exercise 201Tl myocardial SPECT in outpatients with angina-like chest pain, we developed a decision-tree model which comprises three 1000-patients groups, i.e., a coronary arteriography (CAG) group, a follow-up group, and a SPECT group, and total cost and cardiac events, including cardiac deaths, were calculated. Variables used for the decision-tree analysis were obtained from references and the data available at out hospital. The sensitivity and specificity of 201Tl SPECT for diagnosing angina pectoris, and its prevalence were assumed to be 95%, 85%, and 33%, respectively. The mean costs were 84.9 x 104 yen/patient in the CAG group, 30.2 x 104 yen/patient in the follow-up group, and 71.0 x 104 yen/patient in the SPECT group. The numbers of cardiac events and cardiac deaths were 56 and 15, respectively in the CAG group, 264 and 81 in the follow-up group, and 65 and 17 in the SPECT group. SPECT increases cardiac events and cardiac deaths by 0.9% and 0.2%, but it reduces the number of CAG studies by 50.3%, and saves 13.8 x 104 yen/patient, as compared to the CAG group. In conclusion, the exercise 201Tl myocardial SPECT strategy for patients with chest pain has the potential to reduce health care costs in Japan. (author)

  20. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients.

    Science.gov (United States)

    Hashem, Somaya; Esmat, Gamal; Elakel, Wafaa; Habashy, Shahira; Abdel Raouf, Safaa; Darweesh, Samar; Soliman, Mohamad; Elhefnawi, Mohamed; El-Adawy, Mohamed; ElHefnawi, Mahmoud

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0-F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy. PMID:26880886

  1. Remote Sensing Image Classification Based on Decision Tree in the Karst Rocky Desertification Areas: A Case Study of Kaizuo Township

    Institute of Scientific and Technical Information of China (English)

    Shuyong; MA; Xinglei; ZHU; Yulun; AN

    2014-01-01

    Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.

  2. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  3. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    Directory of Open Access Journals (Sweden)

    Wided Khiari

    2013-09-01

    Full Text Available This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through the nature (type of information disclosed. Applying the decision tree method, the obtained Tree diagrams provide ways to know the characteristics of a particular firm regardless of its level of disclosure. Obtained results show that the characteristics of corporate governance to achieve good quality of disclosure are not unique for all firms. These structures are not necessarily all of the recommendations of best practices, but converge towards the best combination. Indeed, in practice, there are companies which have a good quality of disclosure but are not well governed. However, we hope that by improving their governance system their level of disclosure may be better. These findings show, in a general way, a convergence towards the standards of corporate governance with a few exceptions related to the specificity of Tunisian listed firms and show the need for the adoption of a code for each context. These findings shed the light on corporate governance features that enhance incentives for good disclosure. It allows identifying, for each firm and in any date, corporate governance determinants of disclosure quality. More specifically, and all being equal, obtained tree makes a rule of decision for the company to know the level of disclosure based on certain characteristics of the governance strategy adopted by the latter.

  4. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    Science.gov (United States)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  5. Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model

    Directory of Open Access Journals (Sweden)

    Takada Masahiro

    2012-06-01

    Full Text Available Abstract Background The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method—the alternating decision tree (ADTree. Methods Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174 and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC curve analysis to discriminate node-positive patients from node-negative patients. Results The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI, 0.689–0.850] for the model training dataset and 0.772 (95% CI: 0.689–0.856 for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763–0.774. Conclusions Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.

  6. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree.

    Science.gov (United States)

    Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung

    2014-10-01

    The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame. PMID:25119239

  7. Assessing soil Cu content and anthropogenic influences using decision tree analysis

    International Nuclear Information System (INIS)

    Recent enhanced urbanization and industrialization in China have greatly influenced soil Cu content. To better understand the magnitude of Cu contamination in soil, it is essential to understand its spatial distribution and estimate its values at unsampled points. However, Kriging often can not achieve satisfactory estimates when soil Cu data have weak spatial dependence. The proposed classification and regression tree method (CART) simulated Cu content using environmental variables, and it had no special data requirements. The Cu concentration classes estimated by CART had accuracy in attribution to the right classes of 80.5%, this is 29.3% better than ordinary Kriging method. Moreover, CART provides some insight into the sources of current soil Cu contents. In our study, low soil Cu accumulation was driven by terrain characteristic, agriculture land uses, and soil properties; while high Cu concentration resulted from industrial and agricultural land uses. - Classification and regression tree (CART) analysis provides insight into sources of soil Cu and rightly predicted Cu concentration classes for 80.5% of the test data

  8. Nitrogen removal influence factors in A/O process and decision trees for nitrification/denitrification system

    Institute of Scientific and Technical Information of China (English)

    MA Yong; PENG Yong-zhen; WANG Shu-ying; WANG Xiao-lian

    2004-01-01

    In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.

  9. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    Science.gov (United States)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  10. FPGA-Based Network Traffic Security:Design and Implementation Using C5.0 Decision Tree Classifier

    Institute of Scientific and Technical Information of China (English)

    Tarek Salah Sobh; Mohamed Ibrahiem Amer

    2013-01-01

    In this work, a hardware intrusion detection system (IDS) model and its implementation are introduced to perform online real-time traffic monitoring and analysis. The introduced system gathers some advantages of many IDSs: hardware based from implementation point of view, network based from system type point of view, and anomaly detection from detection approach point of view. In addition, it can detect most of network attacks, such as denial of services (DoS), leakage, etc. from detection behavior point of view and can detect both internal and external intruders from intruder type point of view. Gathering these features in one IDS system gives lots of strengths and advantages of the work. The system is implemented by using field programmable gate array (FPGA), giving a more advantages to the system. A C5.0 decision tree classifier is used as inference engine to the system and gives a high detection ratio of 99.93%.

  11. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Science.gov (United States)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  12. Analytical solutions of linked fault tree probabilistic risk assessments using binary decision diagrams with emphasis on nuclear safety applications

    International Nuclear Information System (INIS)

    This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these deficiencies

  13. Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data

    OpenAIRE

    Schetinin, Vitaly; Fieldsend, Jonathan E.; Partridge, Derek; Krzanowski, Wojtek J.; Everson, Richard M.; Bailey, Trevor C; Hernandez, Adolfo

    2005-01-01

    Bayesian averaging over classification models allows the uncertainty of classification outcomes to be evaluated, which is of crucial importance for making reliable decisions in applications such as financial in which risks have to be estimated. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the diversity of a classifier ensemble and the required performance. The interpretability of classification models can also give useful in...

  14. Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree

    OpenAIRE

    Huo, Q.; Ma, B.

    1999-01-01

    We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-t...

  15. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  16. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Energy Technology Data Exchange (ETDEWEB)

    Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru [Saint Petersburg Electrotechnical University “LETI” (Russian Federation)

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  17. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    International Nuclear Information System (INIS)

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article

  18. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Science.gov (United States)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  19. Forest or the trees: At what scale do elephants make foraging decisions?

    Science.gov (United States)

    Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David

    2012-07-01

    For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was

  20. To boost or not boost in radiotherapy

    International Nuclear Information System (INIS)

    The aim of this paper it to analyse and discuss standard definition of the 'boost' procedure in relation to clinical results and new forms of the boost designed on physical and radiobiological bases. Seventeen sets of clinical data including over 5000 cases cancer with different tumour stages and locations and treated with various forms of 'boost' method have been subtracted from literature. Effectiveness of boost is analyzed regarding its place in combined treatment, timing and subvolume involved. Radiobiological parameter of D10 and normalization method for biologically equivalent doses and dose intensity are used to simulated cold and not subvolumes (hills and dales) and its influence of effectiveness on the boost delivery. Sequential and concomitant boost using external irradiation, although commonly used, offers LTC benefit lower than expected. Brachytherapy, intraoperative irradiation and concurrent chemotherapy boost methods appear more effective. Conformal radiotherapy, with or without dose-intensity modulation, allows heterogeneous increase in dose intensity within the target volume and can be used to integrate the 'boost dose' into baseline treatment (Simultaneous Integrated Boost and SIB). Analysis of interrelationships between boost-dose; boost volume and its timing shows that a TCP benefit from boosting can be expected when a relatively large part of the target volume is involved. Increase in boost dose above 1.2-1.3 of baseline dose using 'standard' methods does not substantially further increase the achieved TCP benefit unless hypoxic cells are a problem. Any small uncertainties in treatment planning can ruin all potential beneficial effect of the boost. For example, a 50% dose deficit in a very small (e.g. 1%) volume of target can decrease TCP to zero. Therefore boost benefits should be carefully weighed against any risk of cold spots in the target volume. Pros and cons in discussion of the role of boost in radiotherapy lead to the important

  1. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  2. Tailored approach in inguinal hernia repair – Decision tree based on the guidelines

    Directory of Open Access Journals (Sweden)

    FerdinandKöckerling

    2014-06-01

    Full Text Available The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society and the European Association of Endoscopic Surgery. 82 % of experienced hernia surgeons use the "tailored approach", the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anaesthesia possible, recurrence and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarized the recommendations of the various guidelines and to transfer them into a practical dicision tree for the daily work of surgeons performing inguinal hernia repair.

  3. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    Institute of Scientific and Technical Information of China (English)

    潘大胜; 屈迟文

    2016-01-01

    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  4. Detecting subcanopy invasive plant species in tropical rainforest by integrating optical and microwave (InSAR/PolInSAR) remote sensing data, and a decision tree algorithm

    Science.gov (United States)

    Ghulam, Abduwasit; Porton, Ingrid; Freeman, Karen

    2014-02-01

    In this paper, we propose a decision tree algorithm to characterize spatial extent and spectral features of invasive plant species (i.e., guava, Madagascar cardamom, and Molucca raspberry) in tropical rainforests by integrating datasets from passive and active remote sensing sensors. The decision tree algorithm is based on a number of input variables including matching score and infeasibility images from Mixture Tuned Matched Filtering (MTMF), land-cover maps, tree height information derived from high resolution stereo imagery, polarimetric feature images, Radar Forest Degradation Index (RFDI), polarimetric and InSAR coherence and phase difference images. Spatial distributions of the study organisms are mapped using pixel-based Winner-Takes-All (WTA) algorithm, object oriented feature extraction, spectral unmixing, and compared with the newly developed decision tree approach. Our results show that the InSAR phase difference and PolInSAR HH-VV coherence images of L-band PALSAR data are the most important variables following the MTMF outputs in mapping subcanopy invasive plant species in tropical rainforest. We also show that the three types of invasive plants alone occupy about 17.6% of the Betampona Nature Reserve (BNR) while mixed forest, shrubland and grassland areas are summed to 11.9% of the reserve. This work presents the first systematic attempt to evaluate forest degradation, habitat quality and invasive plant statistics in the BNR, and provides significant insights as to management strategies for the control of invasive plants and conversation in the reserve.

  5. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. Khader

    2012-12-01

    Full Text Available Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i ignore the health risk of nitrate contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  6. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  7. Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis

    Science.gov (United States)

    Swift, Andrew J.; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M.

    2016-01-01

    Abstract Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH.

  8. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    International Nuclear Information System (INIS)

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies

  9. Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM

    Directory of Open Access Journals (Sweden)

    Shweta Tiwari

    2012-06-01

    Full Text Available Around the world, trading in the stock market has gained huge attractiveness as a means through which, one can obtain vast profits. Attempting to profitably and precisely predict the financial market has long engrossed the interests and attention of bankers, economists and scientists alike. Stock market prediction is the act of trying, to determine the future value of a company’s stock or other financial instrument traded on a financial exchange. Accurate stock market predictions are important for many reasons. Chief among all is the need for investors, to hedge against potential market risks and the opportunities for arbitrators and speculators, to make profits by trading indexes. Stock Market is a place, where shares are issued and traded. These shares are either traded through Stock exchanges or Overthe-Counter in physical or electronic form. Data mining, as a process of discovering useful patterns, correlations has its own role in financial modeling. Data mining is a discipline in computational intelligence that deals with knowledge discovery, data analysis and full and semi-autonomous decision making. Prediction of stock market by data mining techniques has been receiving a lot of attention recently. This paper presents a hybrid system based on decision tree- rough set, for predicting the trends in the Bombay Stock Exchange (BSESENSEX with the combination of Hierarchical Hidden Markov Model. In this paper we present future trends on the bases of price earnings and dividend. The data on accounting earnings when averaged over many years help to predict the present value of future dividends.

  10. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    Science.gov (United States)

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers. PMID:27386995

  11. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    Science.gov (United States)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  12. Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System

    Directory of Open Access Journals (Sweden)

    Jashan Koshal

    2012-08-01

    Full Text Available Main reason for the attack being introduced to the system is because of popularity of the internet. Information security has now become a vital subject. Hence, there is an immediate need to recognize and detect the attacks. Intrusion Detection is defined as a method of diagnosing the attack and the sign of malicious activity in a computer network by evaluating the system continuously. The software that performs such task can be defined as Intrusion Detection Systems (IDS. System developed with the individual algorithms like classification, neural networks, clustering etc. gives good detection rate and less false alarm rate. Recent studies show that the cascading of multiple algorithm yields much better performance than the system developed with the single algorithm. Intrusion detection systems that uses single algorithm, the accuracy and detection rate were not up to mark. Rise in the false alarm rate was also encountered. Cascading of algorithm is performed to solve this problem. This paper represents two hybrid algorithms for developing the intrusion detection system. C4.5 decision tree and Support Vector Machine (SVM are combined to maximize the accuracy, which is the advantage of C4.5 and diminish the wrong alarm rate which is the advantage of SVM. Results show the increase in the accuracy and detection rate and less false alarm rate.

  13. The use of decision trees in the classification of beach forms/patterns on IKONOS-2 data

    Science.gov (United States)

    Teodoro, A. C.; Ferreira, D.; Gonçalves, H.

    2013-10-01

    Evaluation of beach hydromorphological behaviour and its classification is highly complex. The available beach morphologic and classification models are mainly based on wave, tidal and sediment parameters. Since these parameters are usually unavailable for some regions - such as in the Portuguese coastal zone - a morphologic analysis using remotely sensed data seems to be a valid alternative. Data mining for spatial pattern recognition is the process of discovering useful information, such as patterns/forms, changes and significant structures from large amounts of data. This study focuses on the application of data mining techniques, particularly Decision Trees (DT), to an IKONOS-2 image in order to classify beach features/patterns, in a stretch of the northwest coast of Portugal. Based on the knowledge of the coastal features, five classes were defined: Sea, Suspended-Sediments, Breaking-Zone, Beachface and Beach. The dataset was randomly divided into training and validation subsets. Based on the analysis of several DT algorithms, the CART algorithm was found to be the most adequate and was thus applied. The performance of the DT algorithm was evaluated by the confusion matrix, overall accuracy, and Kappa coefficient. In the classification of beach features/patterns, the algorithm presented an overall accuracy of 98.2% and a kappa coefficient of 0.97. The DTs were compared with a neural network algorithm, and the results were in agreement. The methodology presented in this paper provides promising results and should be considered in further applications of beach forms/patterns classification.

  14. Method for Walking Gait Identification in a Lower Extremity Exoskeleton based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors’ information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person’s motion.

  15. New energy opinion leaders' lifestyles and media usage - applying data mining decision tree analysis for UNIDO - ICHET web site users

    International Nuclear Information System (INIS)

    According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns

  16. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  17. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    Science.gov (United States)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  18. TreeAge Pro软件在医药卫生决策分析中的应用%The Application of TreeAge Pro software in the Medicine & Health Decision Analysis

    Institute of Scientific and Technical Information of China (English)

    李倩; 马爱霞

    2014-01-01

    TreeAge Pro software is widely used in the field of medical decision making .Most decision analysis literatures involved in the application of decision tree model use this software.Whereas,the article about this software is less and less which hinder the application of it.The aim of this essay is to meet the needs for beginners and easy to use through basic introduction.%TreeAge Pro软件在医药卫生决策领域广泛应用。在涉及应用决策树模型进行决策分析的文献中,大都采用了此软件。但是,目前关于此软件介绍性的文章少之又少,从而使得软件在使用过程中遇到了一定的障碍。本文旨在通过对此软件进行基础介绍,满足初学者对软件应用的需要,方便潜在用户使用。

  19. Boosting foundations and algorithms

    CERN Document Server

    Schapire, Robert E

    2012-01-01

    Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate "rules of thumb." A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.

  20. Rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems in the presence of trees.

    Science.gov (United States)

    Scholz, Miklas; Uzomah, Vincent C

    2013-08-01

    The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment. PMID:23697848

  1. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    OpenAIRE

    R. Bou Kheir; P. K. Bøcher; M. B. Greve; M. H. Greve

    2010-01-01

    Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow directio...

  2. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    Science.gov (United States)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  3. On the relationship between the prices of oil and the precious metals: Revisiting with a multivariate regime-switching decision tree

    International Nuclear Information System (INIS)

    This study examines the volatility and correlation and their relationships among the euro/US dollar exchange rates, the S and P500 equity indices, and the prices of WTI crude oil and the precious metals (gold, silver, and platinum) over the period 2005 to 2012. Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. The ensuing Hidden Markov Decision Tree (HMDT) model is in fact an extension of the Hidden Markov Model (HMM) introduced by Jordan et al. (1997). The architecture of this model is the opposite that of the classical deterministic approach based on a binary decision tree and, it allows a probabilistic vision of the relationship between univariate volatility and correlation. Our results are categorized into three groups, namely (1) exchange rates and oil, (2) S and P500 indices, and (3) precious metals. A switching dynamics is seen to characterize the volatilities, while, in the case of the correlations, the series switch from one regime to another, this movement touching a peak during the period of the Subprime crisis in the US, and again during the days following the Tohoku earthquake in Japan. Our findings show that the relationships between volatility and correlation are dependent upon the nature of the series considered, sometimes corresponding to those found in econometric studies, according to which correlation increases in bear markets, at other times differing from them. - Highlights: • This study examines the volatility and correlation and their relationships of precious metals and crude oil. • Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. • This model allows a probabilistic point of view of the relationship between univariate volatility and correlation. • Results show the relationships between volatility and correlation are dependent upon the nature of the series considered

  4. Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments

    Science.gov (United States)

    Lees, Brian G.; Ritman, Kim

    1991-11-01

    The integration of Landsat TM and environmental GIS data sets using artificial intelligence rule-induction and decision-tree analysis is shown to facilitate the production of vegetation maps with both floristic and structural information. This technique is particularly suited to vegetation mapping in disturbed or hilly environments that are unsuited to either conventional remote sensing methods or GIS modeling using environmental data bases.

  5. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  6. Utilizing Home Healthcare Electronic Health Records for Telehomecare Patients With Heart Failure: A Decision Tree Approach to Detect Associations With Rehospitalizations.

    Science.gov (United States)

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H

    2016-04-01

    Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports. PMID:26848645

  7. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    International Nuclear Information System (INIS)

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, ≤Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  8. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    Energy Technology Data Exchange (ETDEWEB)

    Kosuda, Shigeru; Watanabe, Masumi; Kobayashi, Hideo; Kusano, Shoichi [National Defence Medical College, Tokorozawa, Saitama (Japan); Ichihara, Kiyoshi

    1998-07-01

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, {<=}Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  9. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions

    OpenAIRE

    Mekala Sundaram; Willoughby, Janna R; Nathanael I Lichti; Michael A Steele; Swihart, Robert K.

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or c...

  10. Decision tree analysis to assess the cost-effectiveness of yttrium microspheres for treatment of hepatic metastases from colorectal cancer

    International Nuclear Information System (INIS)

    Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  11. Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies

    Directory of Open Access Journals (Sweden)

    Joaquín Texeira Quirós

    2013-09-01

    Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies

  12. Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost

    CERN Document Server

    Li, Ping

    2012-01-01

    Logitboost is an influential boosting algorithm for classification. In this paper, we develop robust logitboost to provide an explicit formulation of tree-split criterion for building weak learners (regression trees) for logitboost. This formulation leads to a numerically stable implementation of logitboost. We then propose abc-logitboost for multi-class classification, by combining robust logitboost with the prior work of abc-boost. Previously, abc-boost was implemented as abc-mart using the mart algorithm. Our extensive experiments on multi-class classification compare four algorithms: mart, abcmart, (robust) logitboost, and abc-logitboost, and demonstrate the superiority of abc-logitboost. Comparisons with other learning methods including SVM and deep learning are also available through prior publications.

  13. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  14. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    Directory of Open Access Journals (Sweden)

    Mekala Sundaram

    Full Text Available The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae to eastern gray squirrels (Sciurus carolinensis, and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%, and combined effects of seed traits and phylogeny of hardwood trees (5-55%. A phylogenetic PCA (pPCA on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is

  15. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    Science.gov (United States)

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  16. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions

    Science.gov (United States)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3- contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3- pollution activities via an unsupervised learning algorithm based on δ15N- and δ18O-NO3- and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3- contamination via a decision tree model. When a combination of δ15N-, δ18O-NO3- and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO42 - and Cl- variables. The NO3- and the δ15N- and δ18O-NO3- variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3- concentrations and isotopic values. Although only the SO42 - and Cl- were selected as important discriminating variables, concentration data alone could not identify the specific NO3- sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3- contamination, an integrated approach should be set-up by combining N and O isotopes of NO3- with land-uses and physico-chemical properties, especially in areas with complex agricultural activities.

  17. Application of breast MRI for prediction of lymph node metastases - systematic approach using 17 individual descriptors and a dedicated decision tree

    International Nuclear Information System (INIS)

    Background: The presence of lymph node metastases (LNMs) is one of the most important prognostic factors in breast cancer. Purpose: To correlate a detailed catalog of 17 descriptors in breast MRI (bMRI) with the presence of LNMs and to identify useful combinations of such descriptors for the prediction of LNMs using a dedicated decision tree. Material and Methods: A standardized protocol and study design was applied in this IRB-approved study (T1-weighted FLASH; 0.1 mmol/kg body weight Gd-DTPA; T2-weighted TSE; histological verification after bMRI). Two experienced radiologists performed prospective evaluation of the previously acquired examination in consensus. In every lesion 17 previously published descriptors were assessed. Subgroups of primary breast cancers with (N+: 97) and without LNM were created (N-: 253). The prevalence and diagnostic accuracy of each descriptor were correlated with the presence of LNM (chi-square test; diagnostic odds ratio/DOR). To identify useful combinations of descriptors for the prediction of LNM a chi-squared automatic interaction detection (CHAID) decision tree was applied. Results: Seven of 17 descriptors were significantly associated with LNMs. The most accurate were 'Skin thickening' (P < 0.001; DOR 5.9) and 'Internal enhancement' (P < 0.001; DOR =13.7). The CHAID decision tree identified useful combinations of descriptors: 'Skin thickening' plus 'Destruction of nipple line' raised the probability of N+ by 40% (P< 0.05). In case of absence of 'Skin thickening', 'Edema', and 'Irregular margins', the likelihood of N+ was 0% (P<0.05). Conclusion: Our data demonstrate the close association of selected breast MRI descriptors with nodal status. If present, such descriptors can be used - as stand alone or in combination - to accurately predict LNM and to stratify the patient's prognosis

  18. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-06-01

    Full Text Available Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation. This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to explain organic/mineral field measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in organic/mineral field measurements. The overall accuracy of the predictive organic/inorganic landscapes' map produced (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to facilitate the implementation of pedological/hydrological plans for conservation

  19. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-01-01

    Full Text Available Accurate information about soil organic carbon (SOC, presented in a spatially form, is prerequisite for many land resources management applications (including climate change mitigation. This paper aims to investigate the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes at unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to statistically explain SOC field measurements in hydromorphic landscapes of the chosen Danish area. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in field SOC measurements. The overall accuracy of the produced predictive SOC map (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to help with the implementation of pedological/hydrological plans for conservation and sustainable

  20. Decision Tree, Bagging and Random Forest methods detect TEC seismo-ionospheric anomalies around the time of the Chile, (Mw = 8.8) earthquake of 27 February 2010

    Science.gov (United States)

    Akhoondzadeh, Mehdi

    2016-06-01

    In this paper for the first time ensemble methods including Decision Tree, Bagging and Random Forest have been proposed in the field of earthquake precursors to detect GPS-TEC (Total Electron Content) seismo-ionospheric anomalies around the time and location of Chile earthquake of 27 February 2010. All of the implemented ensemble methods detected a striking anomaly in time series of TEC data, 1 day after the earthquake at 14:00 UTC. The results indicate that the proposed methods due to their performance, speed and simplicity are quite promising and deserve serious attention as a new predictor tools for seismo-ionospheric anomalies detection.

  1. Learning Boost C++ libraries

    CERN Document Server

    Mukherjee, Arindam

    2015-01-01

    If you are a C++ programmer who has never used Boost libraries before, this book will get you up-to-speed with using them. Whether you are developing new C++ software or maintaining existing code written using Boost libraries, this hands-on introduction will help you decide on the right library and techniques to solve your practical programming problems.

  2. A more robust boosting algorithm

    OpenAIRE

    Freund, Yoav

    2009-01-01

    We present a new boosting algorithm, motivated by the large margins theory for boosting. We give experimental evidence that the new algorithm is significantly more robust against label noise than existing boosting algorithm.

  3. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  4. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    International Nuclear Information System (INIS)

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  5. 决策树方法在网球训练中的应用%Application of the Decision Tree in Tennis Trainings

    Institute of Scientific and Technical Information of China (English)

    冯能山; 龙超; 熊金志; 廖国君

    2014-01-01

    数据挖掘在体育领域的应用还比较少。如何利用好体育运动的训练数据,从中挖掘出有用信息,是数据挖掘技术在体育领域中的一项重要任务。决策树方法是一种常用的数据挖掘技术,该文把决策树方法应用于网球训练,对有关数据进行挖掘,形成一棵网球训练的决策树,从而协助体育工作人员更合理地制定网球训练方案,提高网球训练的效率。%Nowadays it is still relatively rare to see the applications of data mining in the field of sports. However, applying data mining in sports can facilitate a more efficient way to use the data of sports training by digging out the relevant information. In this paper, the decision tree approach is applied in the tennis training to form a decision tree through digging out the relevant data. As a result, the application helps the staffs of sports to make a more rational tennis training program whereas the efficiency of ten-nis training can be improved.

  6. 基于LBP和SVM决策树的人脸表情识别%Facial Expression Recognition Based on LBP and SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    李扬; 郭海礁

    2014-01-01

    为了提高人脸表情识别的识别率,提出一种LBP和SVM决策树相结合的人脸表情识别算法。首先利用LBP算法将人脸表情图像转换为LBP特征谱,然后将LBP特征谱转换成LBP直方图特征序列,最后通过SVM决策树算法完成人脸表情的分类和识别,并且在JAFFE人脸表情库的识别中证明该算法的有效性。%In order to improve the recognition rate of facial expression, proposes a facial expression recognition algorithm based on a LBP and SVM decision tree. First facial expression image is converted to LBP characteristic spectrum using LBP algorithm, and then the LBP character-istic spectrum into LBP histogram feature sequence, finally completes the classification and recognition of facial expression by SVM deci-sion tree algorithm, and proves the effectiveness of the proposed method in the recognition of facial expression database in JAFFE.

  7. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. PMID:26796566

  8. Entanglement asymmetry for boosted black branes

    CERN Document Server

    Mishra, Rohit

    2016-01-01

    We study the effects of asymmetry in entanglement thermodynamics of the CFT subsystems. It is found that `boosted' $p$-branes backgrounds give rise to the first law of the entanglement thermodynamics where the CFT pressure plays decisive role in the entanglement. Two different strip like subsystems, one parallel to the boost and the other perpendicular, are studied in the perturbative regime, where $T_{thermal}\\ll T_E$. We also discuss the AdS-wave backgrounds where some universal bounds can be obtained.

  9. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  10. Online Gradient Boosting

    OpenAIRE

    Beygelzimer, Alina; Hazan, Elad; Kale, Satyen; Luo, Haipeng

    2015-01-01

    We extend the theory of boosting for regression problems to the online learning setting. Generalizing from the batch setting for boosting, the notion of a weak learning algorithm is modeled as an online learning algorithm with linear loss functions that competes with a base class of regression functions, while a strong learning algorithm is an online learning algorithm with convex loss functions that competes with a larger class of regression functions. Our main result is an online gradient b...

  11. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    Science.gov (United States)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  12. Classification of Questions Consulted by Patients with Decision-making Tree%利用决策树的患者咨询问题分类

    Institute of Scientific and Technical Information of China (English)

    吴东东; 刘锋; 于鸿飞; 黄昊

    2016-01-01

    目的:对患者咨询的问题进行分类,了解患者就医过程中的盲点,解决按照医疗分类造成的无效分类、交叉分类和分类繁琐的问题。方法:利用CLS(Concept Learning System)算法,通过建立初始决策树,利用测试样本进行测试,最终获得分类决策树和咨询信息分类类别。结果:通过决策树分类得出的患者咨询信息类别对应于就医流程的各个环节,最终分类类别可以覆盖所有的数据样本,各类别之间不存在交叉。结论:患者咨询的问题涉及到就诊过程中的27个环节,医院需要有的放矢,针对患者提问中涉及到的就医流程细节进行讲解、宣教,医院可以将患者咨询所涉及到的信息公布在网站、APP、微网站等各类信息平台,便于患者查询。%Objective:Classify the questions which are asked by patients, get out the blindness of patients in the hospital and solve the problem of Over-Categorization, crossed classification and tedious classification. Methods:Using the CLS (Concept Learning System) algorithm to build decision-making tree by initial decision-making tree and sample test, classify the new data samples. Result: The category can be get out with decision-making tree, which are parallel to part of hospital works. These categories can cover all massage from patients and there are no overlaps. Conclusion: Patients' questions involved in 27 hospital works. Hospital need to publish the information patients asked on hospital information platform such as website, APP, interactive platform, to provide easy access for patients.

  13. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain;

    2014-01-01

    system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost...

  14. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  15. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    Science.gov (United States)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 ‑3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for

  16. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  17. Design of a new hybrid artificial neural network method based on decision trees for calculating the Froude number in rigid rectangular channels

    Directory of Open Access Journals (Sweden)

    Ebtehaj Isa

    2016-09-01

    Full Text Available A vital topic regarding the optimum and economical design of rigid boundary open channels such as sewers and drainage systems is determining the movement of sediment particles. In this study, the incipient motion of sediment is estimated using three datasets from literature, including a wide range of hydraulic parameters. Because existing equations do not consider the effect of sediment bed thickness on incipient motion estimation, this parameter is applied in this study along with the multilayer perceptron (MLP, a hybrid method based on decision trees (DT (MLP-DT, to estimate incipient motion. According to a comparison with the observed experimental outcome, the proposed method performs well (MARE = 0.048, RMSE = 0.134, SI = 0.06, BIAS = -0.036. The performance of MLP and MLP-DT is compared with that of existing regression-based equations, and significantly higher performance over existing models is observed. Finally, an explicit expression for practical engineering is also provided.

  18. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  19. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with rlearning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  20. Analytical solutions of linked fault tree probabilistic risk assessments using binary decision diagrams with emphasis on nuclear safety applications[Dissertation 17286

    Energy Technology Data Exchange (ETDEWEB)

    Nusbaumer, O. P. M

    2007-07-01

    This study is concerned with the quantification of Probabilistic Risk Assessment (PRA) using linked Fault Tree (FT) models. Probabilistic Risk assessment (PRA) of Nuclear Power Plants (NPPs) complements traditional deterministic analysis; it is widely recognized as a comprehensive and structured approach to identify accident scenarios and to derive numerical estimates of the associated risk levels. PRA models as found in the nuclear industry have evolved rapidly. Increasingly, they have been broadly applied to support numerous applications on various operational and regulatory matters. Regulatory bodies in many countries require that a PRA be performed for licensing purposes. PRA has reached the point where it can considerably influence the design and operation of nuclear power plants. However, most of the tools available for quantifying large PRA models are unable to produce analytically correct results. The algorithms of such quantifiers are designed to neglect sequences when their likelihood decreases below a predefined cutoff limit. In addition, the rare event approximation (e.g. Moivre's equation) is typically implemented for the first order, ignoring the success paths and the possibility that two or more events can occur simultaneously. This is only justified in assessments where the probabilities of the basic events are low. When the events in question are failures, the first order rare event approximation is always conservative, resulting in wrong interpretation of risk importance measures. Advanced NPP PRA models typically include human errors, common cause failure groups, seismic and phenomenological basic events, where the failure probabilities may approach unity, leading to questionable results. It is accepted that current quantification tools have reached their limits, and that new quantification techniques should be investigated. A novel approach using the mathematical concept of Binary Decision Diagram (BDD) is proposed to overcome these

  1. Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods

    OpenAIRE

    Khan, Saqib Hussain

    2010-01-01

    Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the databas...

  2. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    OpenAIRE

    Hedayetul Islam Shovon; Mahfuza Haque

    2012-01-01

    Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA) in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information sho...

  3. Cost effectiveness of community-based therapeutic care for children with severe acute malnutrition in Zambia: decision tree model

    OpenAIRE

    Bachmann Max O

    2009-01-01

    Abstract Background Children aged under five years with severe acute malnutrition (SAM) in Africa and Asia have high mortality rates without effective treatment. Primary care-based treatment of SAM can have good outcomes but its cost effectiveness is largely unknown. Method This study estimated the cost effectiveness of community-based therapeutic care (CTC) for children with severe acute malnutrition in government primary health care centres in Lusaka, Zambia, compared to no care. A decision...

  4. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    Science.gov (United States)

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value. PMID:27221983

  5. Remote Sensing Data Binary Classification Using Boosting with Simple Classifiers

    Science.gov (United States)

    Nowakowski, Artur

    2015-10-01

    Boosting is a classification method which has been proven useful in non-satellite image processing while it is still new to satellite remote sensing. It is a meta-algorithm, which builds a strong classifier from many weak ones in iterative way. We adapt the AdaBoost.M1 boosting algorithm in a new land cover classification scenario based on utilization of very simple threshold classifiers employing spectral and contextual information. Thresholds for the classifiers are automatically calculated adaptively to data statistics. The proposed method is employed for the exemplary problem of artificial area identification. Classification of IKONOS multispectral data results in short computational time and overall accuracy of 94.4% comparing to 94.0% obtained by using AdaBoost.M1 with trees and 93.8% achieved using Random Forest. The influence of a manipulation of the final threshold of the strong classifier on classification results is reported.

  6. A best-first soft/hard decision tree searching MIMO decoder for a 4 × 4 64-QAM system

    KAUST Repository

    Shen, Chungan

    2012-08-01

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. © 2011 IEEE.

  7. Boost IORT in Breast Cancer: Body of Evidence

    OpenAIRE

    Felix Sedlmayer; Roland Reitsamer; Christoph Fussl; Ingrid Ziegler; Franz Zehentmayr; Heinz Deutschmann; Peter Kopp; Gerd Fastner

    2014-01-01

    The term IORT (intraoperative radiotherapy) is currently used for various techniques that show decisive differences in dose delivery. The largest evidence for boost IORT preceding whole breast irradiation (WBI) originates from intraoperative electron treatments with single doses around 10 Gy, providing outstandingly low local recurrence rates in any risk constellation also at long term analyses. Compared to other boost methods, an intraoperative treatment has evident advantages as follows. Pr...

  8. A boost for KAON

    International Nuclear Information System (INIS)

    Earlier this year, a report by a specially-formed subcommittee of the US Nuclear Science Advisory Committee gave an important boost to the proposal to build a high intensity particle beam 'factory' at the Canadian TRIUMF laboratory in Vancouver. (orig./HSI).

  9. Tree sets

    OpenAIRE

    Diestel, Reinhard

    2015-01-01

    We study an abstract notion of tree structure which generalizes tree-decompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked to graph-theoretical trees, these `tree sets' can provide a suitable formalization of tree structure also for infinite graphs, matroids, or set partitions, as well as for other discrete structures, such as order trees. In this first of two papers we introduce tree sets, establish their relation to graph and order trees, and show h...

  10. Breast boost - why, how, when...?

    International Nuclear Information System (INIS)

    Background: Breast conservation management including tumorectomy or quadrantectomy and external beam radiotherapy with a dose of 45 to 50 Gy in the treatment of small breast carcinomas is generally accepted. The use of a radiation boost - in particular for specific subgroups - has not been clarified. With regard to the boost technique there is some controversy between groups emphasizing the value of electron boost treatment and groups pointing out the value of interstitial boost treatment. This controversy has become even more complicated as there is an increasing number of institutions reporting the use of HDR interstitial brachytherapy for boost treatment. The most critical issue with regard to interstitial HDR brachytherapy is the assumed serious long-term morbidity after a high single radiation dose as used in HDR-treatments. Methods and Results: This article gives a perspective and recommendations on some aspects of this issue (indication, timing, target volume, dose and dose rate). Conclusion: More information about the indication for a boost is to be expected from the EORTC trial 22881/10882. Careful selection of treatment procedures for specific subgroups of patients and refinement in surgical procedures and radiotherapy techniques may be useful in improving the clinical and cosmetic results in breast conservation therapy. Prospective trials comparing on the one hand different boost techniques and on the other hand particular morphologic criteria in treatments with boost and without boost are needed to give more detailed recommendations for boost indication and for boost techniques. (orig.)

  11. Diversity-Based Boosting Algorithm

    Directory of Open Access Journals (Sweden)

    Jafar A. Alzubi

    2016-05-01

    Full Text Available Boosting is a well known and efficient technique for constructing a classifier ensemble. An ensemble is built incrementally by altering the distribution of training data set and forcing learners to focus on misclassification errors. In this paper, an improvement to Boosting algorithm called DivBoosting algorithm is proposed and studied. Experiments on several data sets are conducted on both Boosting and DivBoosting. The experimental results show that DivBoosting is a promising method for ensemble pruning. We believe that it has many advantages over traditional boosting method because its mechanism is not solely based on selecting the most accurate base classifiers but also based on selecting the most diverse set of classifiers.

  12. Boosting Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Elkin Eduardo García Díaz

    2006-11-01

    Full Text Available En este artículo, se presenta un algoritmo de clasificación binaria basado en Support Vector Machines (Máquinas de Vectores de Soporte que combinado apropiadamente con técnicas de Boosting consigue un mejor desempeño en cuanto a tiempo de entrenamiento y conserva características similares de generalización con un modelo de igual complejidad pero de representación más compacta./ In this paper we present an algorithm of binary classification based on Support Vector Machines. It is combined with a modified Boosting algorithm. It run faster than the original SVM algorithm with a similar generalization error and equal complexity model but it has more compact representation.

  13. Analytic Boosted Boson Discrimination

    CERN Document Server

    Larkoski, Andrew J; Neill, Duff

    2015-01-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, $D_2$, valid for both signal and background jets. Our factorization theorem simultaneously describes the production of both collinear and soft subjets, and we introduce a novel zero-bin procedure to correctly describe the transition region between these limits. By proving an all orders factorization theorem, we enable a systematically improvable description, and allow for precision comparisons between data, Monte Carlo, and first principles QCD calculations for jet substructure observables. Using our factorization theorem, we present numerical results for the discrimination of a boosted $Z...

  14. Analytic boosted boson discrimination

    OpenAIRE

    Andrew J. Larkoski; Moult, Ian; Neill, Duff

    2015-01-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, $D_2$, valid for both signal and background jets. Our factorization t...

  15. SUSY using boosted techniques

    CERN Document Server

    Stark, Giordon; The ATLAS collaboration

    2016-01-01

    In this talk, I present a discussion of techniques used in supersymmetry searches in papers published by the ATLAS Collaboration from late Run 1 to early Run 2. The goal is to highlight concepts the analyses have in common, why/how they work, and possible SUSY searches that could benefit from boosted studies. Theoretical background will be provided for reference to encourage participants to explore in depth on their own time.

  16. Short-Time Fourier Transform and Decision Tree-Based Pattern Recognition for Gas Identification Using Temperature Modulated Microhotplate Gas Sensors

    Directory of Open Access Journals (Sweden)

    Aixiang He

    2016-01-01

    Full Text Available Because the sensor response is dependent on its operating temperature, modulated temperature operation is usually applied in gas sensors for the identification of different gases. In this paper, the modulated operating temperature of microhotplate gas sensors combined with a feature extraction method based on Short-Time Fourier Transform (STFT is introduced. Because the gas concentration in the ambient air usually has high fluctuation, STFT is applied to extract transient features from time-frequency domain, and the relationship between the STFT spectrum and sensor response is further explored. Because of the low thermal time constant, the sufficient discriminatory information of different gases is preserved in the envelope of the response curve. Feature information tends to be contained in the lower frequencies, but not at higher frequencies. Therefore, features are extracted from the STFT amplitude values at the frequencies ranging from 0 Hz to the fundamental frequency to accomplish the identification task. These lower frequency features are extracted and further processed by decision tree-based pattern recognition. The proposed method shows high classification capability by the analysis of different concentration of carbon monoxide, methane, and ethanol.

  17. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  18. The use of a decision tree based on the rabies diagnosis scenario, to assist the implementation of alternatives to laboratory animals.

    Science.gov (United States)

    Bones, Vanessa C; Molento, Carla Forte Maiolino

    2016-05-01

    Brazilian federal legislation makes the use of alternatives mandatory, when there are validated methods to replace the use of laboratory animals. The objective of this paper is to introduce a novel decision tree (DT)-based approach, which can be used to assist the replacement of laboratory animal procedures in Brazil. This project is based on a previous analysis of the rabies diagnosis scenario, in which we identified certain barriers that hinder replacement, such as: a) the perceived higher costs of alternative methods; b) the availability of staff qualified in these methods; c) resistance to change by laboratory staff; d) regulatory obstacles, including incompatibilities between the Federal Environmental Crimes Act and specific norms and working practices relating to the use of laboratory animals; and e) the lack of government incentives. The DT represents a highly promising means to overcome these reported barriers to the replacement of laboratory animal use in Brazil. It provides guidance to address the main obstacles, and, followed step-by-step, would lead to the implementation of validated alternative methods (VAMs), or their development when such alternatives do not exist. The DT appears suitable for application to laboratory animal use scenarios where alternative methods already exist, such as in the case of rabies diagnosis, and could contribute to increase compliance with the Three Rs principles in science and with the current legal requirements in Brazil. PMID:27256454

  19. Cost Effectiveness of Imiquimod 5% Cream Compared with Methyl Aminolevulinate-Based Photodynamic Therapy in the Treatment of Non-Hyperkeratotic, Non-Hypertrophic Actinic (Solar) Keratoses: A Decision Tree Model

    OpenAIRE

    Wilson, Edward C F

    2010-01-01

    Background: Actinic keratosis (AK) is caused by chronic exposure to UV radiation (sunlight). First-line treatments are cryosurgery, topical 5-fluorouracil (5-FU) and topical diclofenac. Where these are contraindicated or less appropriate, alternatives are imiquimod and photodynamic therapy (PDT). Objective: To compare the cost effectiveness of imiquimod and methyl aminolevulinate-based PDT (MAL-PDT) from the perspective of the UK NHS. Methods: A decision tree model was populated with data fro...

  20. Boost C++ application development cookbook

    CERN Document Server

    Polukhin, Antony

    2013-01-01

    This book follows a cookbook approach, with detailed and practical recipes that use Boost libraries.This book is great for developers new to Boost, and who are looking to improve their knowledge of Boost and see some undocumented details or tricks. It's assumed that you will have some experience in C++ already, as well being familiar with the basics of STL. A few chapters will require some previous knowledge of multithreading and networking. You are expected to have at least one good C++ compiler and compiled version of Boost (1.53.0 or later is recommended), which will be used during the exer

  1. Gradient boosting machines, a tutorial.

    Science.gov (United States)

    Natekin, Alexey; Knoll, Alois

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed. PMID:24409142

  2. Gradient Boosting Machines, A Tutorial

    Directory of Open Access Journals (Sweden)

    Alexey eNatekin

    2013-12-01

    Full Text Available Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. A set of practical examples of gradient boosting applications are presented and comprehensively analyzed.

  3. Analytic boosted boson discrimination

    Science.gov (United States)

    Larkoski, Andrew J.; Moult, Ian; Neill, Duff

    2016-05-01

    Observables which discriminate boosted topologies from massive QCD jets are of great importance for the success of the jet substructure program at the Large Hadron Collider. Such observables, while both widely and successfully used, have been studied almost exclusively with Monte Carlo simulations. In this paper we present the first all-orders factorization theorem for a two-prong discriminant based on a jet shape variable, D 2, valid for both signal and background jets. Our factorization theorem simultaneously describes the production of both collinear and soft subjets, and we introduce a novel zero-bin procedure to correctly describe the transition region between these limits. By proving an all orders factorization theorem, we enable a systematically improvable description, and allow for precision comparisons between data, Monte Carlo, and first principles QCD calculations for jet substructure observables. Using our factorization theorem, we present numerical results for the discrimination of a boosted Z boson from massive QCD background jets. We compare our results with Monte Carlo predictions which allows for a detailed understanding of the extent to which these generators accurately describe the formation of two-prong QCD jets, and informs their usage in substructure analyses. Our calculation also provides considerable insight into the discrimination power and calculability of jet substructure observables in general.

  4. 基于优化的决策树算法在热轧工艺中的应用%Application of improved decision tree on the hot rolling process

    Institute of Scientific and Technical Information of China (English)

    钟蜜; 刘斌

    2011-01-01

    Decision tree classification method is a very effective machine learning methods, with a classification of high precision, good noise robustness of the data and the formation of the advantages of a tree model. The optimization of decision tree algorithms are mainly from the choice of the branch properties standards, decision tree pruning, and the introduction of fuzzy theory, rough set theory, genetic algorithm and neural network algorithms to optimize several aspects. This article introduces the properties of rough set theory, the importance of the principle to optimize the decision tree, first calculated for each condition attribute importance to classification, and then importance sample set size of a filter, without prejudice to the classification accuracy rate while reducing the size of tree. The algorithm in Visual C + + 6. 0 programming environment, and is applied to hot rolling model, data processing by hot rolling to verify the validity of the algorithm.%决策树分类方法是一种非常有效的机器学习方法,具有分类精度高、对噪声数据有很好的健壮性以及形成树状模式等优点,对决策树算法的优化也主要是从分支属性的选择标准,对决策树的修剪,以及引入模糊理论、粗糙集理论、遗传算法和神经网络算法等几个方面进行优化.引入粗糙集理论中的属性重要性原理来对决策树进行优化,首先计算出每个条件属性对分类的重要度,然后根据重要度大小来对样本集进行一个筛选,在不损害分类准确率的同时减小决策树的规模.整个算法在Visual C++6.0环境下编程实现,并应用于热轧工艺模型中,通过对热轧数据的处理,验证了算法的有效性.

  5. Ultrarelativistic boost with scalar field

    Science.gov (United States)

    Svítek, O.; Tahamtan, T.

    2016-02-01

    We present the ultrarelativistic boost of the general global monopole solution which is parametrized by mass and deficit solid angle. The problem is addressed from two different perspectives. In the first one the primary object for performing the boost is the metric tensor while in the second one the energy momentum tensor is used. Since the solution is sourced by a triplet of scalar fields that effectively vanish in the boosting limit we investigate the behavior of a scalar field in a simpler setup. Namely, we perform the boosting study of the spherically symmetric solution with a free scalar field given by Janis, Newman and Winicour. The scalar field is again vanishing in the limit pointing to a broader pattern of scalar field behaviour during an ultrarelativistic boost in highly symmetric situations.

  6. Boosted Higgs channels

    International Nuclear Information System (INIS)

    In gluon fusion both a modified top Yukawa and new colored particles can alter the cross section. However in a large set of composite Higgs models and in realistic areas of the MSSM parameter space, these two effects can conspire and hide new physics in a Standard Model-like inclusive cross section. We first show that it is possible to break this degeneracy in the couplings by demanding a boosted Higgs recoiling against a high-pT jet. Subsequently we propose an analysis based on this idea in the H→2l+ET channels. This measurement allows an alternative determination of the important top Yukawa besides the t anti tH channel.

  7. Boosted Higgs shapes

    International Nuclear Information System (INIS)

    The inclusive Higgs production rate through gluon fusion has been measured to be in agreement with the Standard Model (SM). We show that even if the inclusive Higgs production rate is very SM-like, a precise determination of the boosted Higgs transverse momentum shape offers the opportunity to see effects of natural new physics. These measurements are generically motivated by effective field theory arguments and specifically in extensions of the SM with a natural weak scale, like composite Higgs models and natural supersymmetry. We show in detail how a measurement at high transverse momentum of H→2l+pT via H→ττ and H→WW* could be performed and demonstrate that it offers a compelling alternative to the t anti tH channel. We discuss the sensitivity to new physics in the most challenging scenario of an exactly SM-like inclusive Higgs cross-section.

  8. Biased Range Trees

    CERN Document Server

    Dujmovic, Vida; Morin, Pat

    2008-01-01

    A data structure, called a biased range tree, is presented that preprocesses a set S of n points in R^2 and a query distribution D for 2-sided orthogonal range counting queries. The expected query time for this data structure, when queries are drawn according to D, matches, to within a constant factor, that of the optimal decision tree for S and D. The memory and preprocessing requirements of the data structure are O(n log n).

  9. Detection of Illegitimate Emails using Boosting Algorithm

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we...... propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations....

  10. The potential impact of improving appropriate treatment for fever on malaria and non-malarial febrile illness management in under-5s: a decision-tree modelling approach.

    Directory of Open Access Journals (Sweden)

    V Bhargavi Rao

    Full Text Available BACKGROUND: As international funding for malaria programmes plateaus, limited resources must be rationally managed for malaria and non-malarial febrile illnesses (NMFI. Given widespread unnecessary treatment of NMFI with first-line antimalarial Artemisinin Combination Therapies (ACTs, our aim was to estimate the effect of health-systems factors on rates of appropriate treatment for fever and on use of ACTs. METHODS: A decision-tree tool was developed to investigate the impact of improving aspects of the fever care-pathway and also evaluate the impact in Tanzania of the revised WHO malaria guidelines advocating diagnostic-led management. RESULTS: Model outputs using baseline parameters suggest 49% malaria cases attending a clinic would receive ACTs (95% Uncertainty Interval:40.6-59.2% but that 44% (95% UI:35-54.8% NMFI cases would also receive ACTs. Provision of 100% ACT stock predicted a 28.9% increase in malaria cases treated with ACT, but also an increase in overtreatment of NMFI, with 70% NMFI cases (95% UI:56.4-79.2% projected to receive ACTs, and thus an overall 13% reduction (95% UI:5-21.6% in correct management of febrile cases. Modelling increased availability or use of diagnostics had little effect on malaria management outputs, but may significantly reduce NMFI overtreatment. The model predicts the early rollout of revised WHO guidelines in Tanzania may have led to a 35% decrease (95% UI:31.2-39.8% in NMFI overtreatment, but also a 19.5% reduction (95% UI:11-27.2%, in malaria cases receiving ACTs, due to a potential fourfold decrease in cases that were untested or tested false-negative (42.5% vs.8.9% and so untreated. DISCUSSION: Modelling multi-pronged intervention strategies proved most effective to improve malaria treatment without increasing NMFI overtreatment. As malaria transmission declines, health system interventions must be guided by whether the management priority is an increase in malaria cases receiving ACTs (reducing the

  11. 基于粗糙变精度的食品安全决策树研究%Research on Decision Tree for Food Safety Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    鄂旭; 任骏原; 毕嘉娜; 沈德海

    2014-01-01

    Food safety decision is an important content of food safety research. Based on variable precision rough sets model,a method of building decision tree with rules that have definite confidence is proposed for food safety analysis. It is an improvement for decision tree inducing approach presented in traditional methods. Present a new algorithm for constructing decision tree with variable precision weighted mean roughness as the criteria for selecting attribute. The new algorithm used variable precision approximate accuracy instead the approxi-mate accuracy. Noisy data of training sets are considered enough. Limited inconsistency is allowed to existed examples of the positive re-gions. So the decision tree is simplified and its extensive ability is improved and more comprehensible. Experiments show that the algo-rithm is feasible and effective.%食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。

  12. Efficent-cutting packet classification algorithm based on the statistical decision tree%基于统计的高效决策树分组分类算法

    Institute of Scientific and Technical Information of China (English)

    陈立南; 刘阳; 马严; 黄小红; 赵庆聪; 魏伟

    2014-01-01

    Packet classification algorithms based on decision tree are easy to implement and widely employed in high-speed packet classification. The primary objective of constructing a decision tree is minimal storage and searching time complexity. An improved decision-tree algorithm is proposed based on statistics and evaluation on filter sets. HyperEC algorithm is a multiple dimensional packet classification algorithm. The proposed algorithm allows the tradeoff between storage and throughput during constructing decision tree. For it is not sensitive to IP address length, it is suitable for IPv6 packet classifi-cation as well as IPv4. The algorithm applies a natural and performance-guided decision-making process. The storage budget is preseted and then the best throughput is achieved. The results show that the HyperEC algorithm outperforms the HiCuts and HyperCuts algorithm, improving the storage and throughput performance and scalable to large filter sets.%基于决策树的分组分类算法因易于实现和高效性,在快速分组分类中广泛使用。决策树算法的基本目标是构造一棵存储高效且查找时间复杂度低的决策树。设计了一种基于规则集统计特性和评价指标的决策树算法——HyperEC 算法。HyperEC算法避免了在构建决策树过程中决策树高度过高和存储空间膨胀的问题。HyperEC算法对IP地址长度不敏感,同样适用于IPv6的多维分组分类。实验证明,HyperEC算法当规则数量较少时,与HyperCuts基本相同,但随着规则数量的增加,该算法在决策树高度、存储空间占用和查找性能方面都明显优于经典的决策树算法。

  13. Physics with boosted top quarks

    CERN Document Server

    Kuutmann, Elin Bergeaas

    2014-01-01

    The production at the LHC of boosted top quarks (top quarks with a transverse momentum that greatly exceeds their rest mass) is a promising process to search for phenomena beyond the Standard Model. In this contribution several examples are discussed of new techniques to reconstruct and identify (tag) the collimated decay topology of the boosted hadronic decays of top quarks. Boosted top reconstruction techniques have been utilized in searches for new physical phenomena. An overview is given of searches by ATLAS, CDF and CMS for heavy new particles decaying into a top and an anti-top quark, vector-like quarks and supersymmetric partners to the top quark.

  14. 决策树算法在团购商品销售预测中的应用%Application of Sales Volume Forecast of Group Purchase Based on Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    费斐; 叶枫

    2013-01-01

      网络团购,指的是互相不认识的消费者在特定的时间内在同一网站上共同购买同一种商品,以求得最优价格的一种网络购物方式。现如今,作为平台方的团购网站在面对大量报名参加团购的商品,审核过程中需要介入大量人力,对经验过于依赖。利用决策树算法,对影响团购商品销量水平的变量进行分析,生成可读的决策树,用以辅助决策,筛选出优质的商品。%Group purchase is a shopping mode that customers buying goods which been selling at a discount in a limited period of time and specific website. Nowadays, facing the large number of application of commodity. Group purchase website as a Platform, which has to intervene a lot of manpower for product review. Also may excessively dependent on the former experience. This paper intends to use the decision tree algorithm to analyse the sales volume of the group purchase goods. Generate readable decision tree to make a strategic decision and select the high quality goods.

  15. Planting Trees

    OpenAIRE

    Relf, Diane

    2009-01-01

    The key aspects in planning a tree planting are determining the function of the tree, the site conditions, that the tree is suited to site conditions and space, and if you are better served by a container-grown. After the tree is planted according to the prescribed steps, you must irrigate as needed and mulch the root zone area.

  16. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  17. Fuzzy Decision Tree Model for Driver Behavior Confronting Yellow Signal at Signalized Intersection%交叉口黄灯期间驾驶员行为的模糊决策树模型

    Institute of Scientific and Technical Information of China (English)

    龙科军; 赵文秀; 肖向良

    2011-01-01

    Drivers decision to go or stop during the yellow interval belongs to uncertain decision making. This paper collects drivers behavior data at four similar intersections. Fuzzy Decision Tree(FDT) is applied to model driver behavior at signalized intersection. Considering vehicle location,velocity and countdown timer as the influencing factors, the FDT model is constructed using FID3 algorithm, and decision roles are generated as well. Test sample is applied to test FDT model, and results indicate that FDT model can predict drivers' decision with overall accuracy of 84.8%.%采集黄灯期间驾驶员行为的相关数据,考虑车辆位置、车速、倒计时表3个影响因素,分别设定其隶属度函数,应用模糊决策树中的FID3算法,以模糊信息熵为启发信息,构建驾驶员选择的模糊决策树模型,生成决策规则.利用测试样本对模型进行检验,结果表明,基于模糊决策树的预测结果准确率总体达到84.8%.

  18. 基于决策树方法的银行客户关系管理的研究和应用%Research and Application of Bank Customer Relationship Management based on the Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    李明辉

    2012-01-01

      Decision tree algorithm in data mining is a very important value in the banking industry. Decision tree technology for the banking industry, through the analysis of specific customer background information, predict the customer's customer categories in order to take the appropriate business strategy, both to improve the service level of banking services, development of client resources, to avoid the loss of customers, to conserve resources, use of a minimum investment to get a larger income. Bank lending to judge whether the borrowers have the risk of the loan proposal is feasible, customers will be classified in accordance with the actual needs of the bank, these problems can be resolved through the decision tree algorithm%  数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。

  19. Rosacea Might Boost Parkinson's Risk

    Science.gov (United States)

    ... medlineplus/news/fullstory_157883.html Rosacea Might Boost Parkinson's Risk: Study Research found an association, but did ... may be linked to an increased risk for Parkinson's disease, a large, new study suggests. Among more ...

  20. Gradient boosting machines, a tutorial

    OpenAIRE

    Natekin, Alexey; Knoll, Alois

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with de...

  1. Gradient Boosting Machines, A Tutorial

    OpenAIRE

    Alexey Natekin; Alois Knoll

    2013-01-01

    Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all th...

  2. Machine learning approximation techniques using dual trees

    OpenAIRE

    Ergashbaev, Denis

    2015-01-01

    This master thesis explores a dual-tree framework as applied to a particular class of machine learning problems that are collectively referred to as generalized n-body problems. It builds a new algorithm on top of it and improves existing Boosted OGE classifier.

  3. Value tree analysis

    International Nuclear Information System (INIS)

    What are the targets and criteria on which national energy policy should be based. What priorities should be set, and how can different social interests be matched. To answer these questions, a new instrument of decision theory is presented which has been applied with good results to controversial political issues in the USA. The new technique is known under the name of value tree analysis. Members of important West German organisations (BDI, VDI, RWE, the Catholic and Protestant Church, Deutscher Naturschutzring, and ecological research institutions) were asked about the goals of their organisations. These goals were then ordered systematically and arranged in a hierarchical tree structure. The value trees of different groups can be combined into a catalogue of social criteria of acceptability and policy assessment. The authors describe the philosophy and methodology of value tree analysis and give an outline of its application in the development of a socially acceptable energy policy. (orig.)

  4. The relationships of modern pollen spectra to vegetation and climate along a steppe-forest-tundra transition in southern Siberia, explored by decision trees

    Czech Academy of Sciences Publication Activity Database

    Pelánková, Barbora; Kuneš, P.; Chytrý, M.; Jankovská, Vlasta; Ermakov, N.; Svitavská-Svobodová, Helena

    2008-01-01

    Roč. 18, č. 8 (2008), s. 1259-1271. ISSN 0959-6836 R&D Projects: GA AV ČR IAA6163303 Institutional research plan: CEZ:AV0Z60050516 Keywords : Classification and regression trees * pollen/ vegetation relationship * surface pollen samples Subject RIV: EF - Botanics Impact factor: 2.167, year: 2008

  5. Empirically Derived Dehydration Scoring and Decision Tree Models for Children With Diarrhea: Assessment and Internal Validation in a Prospective Cohort Study in Dhaka, Bangladesh

    OpenAIRE

    Levine, Adam C.; Glavis-Bloom, Justin; Modi, Payal; Nasrin, Sabiha; Rege, Soham; Chu, Chieh; Schmid, Christopher H.; Alam, Nur H

    2015-01-01

    The DHAKA Dehydration Score and the DHAKA Dehydration Tree are the first empirically derived and internally validated diagnostic models for assessing dehydration in children with acute diarrhea for use by general practice nurses in a resource-limited setting. Frontline providers can use these new tools to better classify and manage dehydration in children.

  6. Extensions and applications of ensemble-of-trees methods in machine learning

    Science.gov (United States)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  7. Boost.Asio C++ network programming

    CERN Document Server

    Torjo, John

    2013-01-01

    What you want is an easy level of abstraction, which is just what this book provides in conjunction with Boost.Asio. Switching to Boost.Asio is just a few extra #include directives away, with the help of this practical and engaging guide.This book is great for developers that need to do network programming, who don't want to delve into the complicated issues of a raw networking API. You should be familiar with core Boost concepts, such as smart pointers and shared_from_this, resource classes (noncopyable), functors and boost::bind, boost mutexes, and the boost date/time library. Readers should

  8. Odds-On Trees

    CERN Document Server

    Bose, Prosenjit; Douieb, Karim; Dujmovic, Vida; King, James; Morin, Pat

    2010-01-01

    Let R^d -> A be a query problem over R^d for which there exists a data structure S that can compute P(q) in O(log n) time for any query point q in R^d. Let D be a probability measure over R^d representing a distribution of queries. We describe a data structure called the odds-on tree, of size O(n^\\epsilon) that can be used as a filter that quickly computes P(q) for some query values q in R^d and relies on S for the remaining queries. With an odds-on tree, the expected query time for a point drawn according to D is O(H*+1), where H* is a lower-bound on the expected cost of any linear decision tree that solves P. Odds-on trees have a number of applications, including distribution-sensitive data structures for point location in 2-d, point-in-polytope testing in d dimensions, ray shooting in simple polygons, ray shooting in polytopes, nearest-neighbour queries in R^d, point-location in arrangements of hyperplanes in R^d, and many other geometric searching problems that can be solved in the linear-decision tree mo...

  9. Boosted Horizon of a Boosted Space-Time Geometry

    CERN Document Server

    Battista, Emmanuele; Scudellaro, Paolo; Tramontano, Francesco

    2015-01-01

    We apply the ultrarelativistic boosting procedure to map the metric of Schwarzschild-de Sitter spacetime into a metric describing de Sitter spacetime plus a shock-wave singularity located on a null hypersurface, by exploiting the picture of the embedding of an hyperboloid in a five-dimensional Minkowski spacetime. After reverting to the usual four-dimensional formalism, we also solve the geodesic equation and evaluate the Riemann curvature tensor of the boosted Schwarzschild-de Sitter metric by means of numerical calculations, which make it possible to reach the ultrarelativistic regime gradually by letting the boost velocity approach the speed of light. Eventually, the analysis of the Kretschmann invariant (and of the geodesic equation) shows the global structure of space- time, as we demonstrate the presence of a "scalar curvature singularity" within a 3-sphere and find that it is also possible to define what we have called "boosted horizon", a sort of elastic wall where all particles are surprisingly pushe...

  10. Top-down induction of clustering trees

    OpenAIRE

    Blockeel, Hendrik; De Raedt, Luc; Ramon, Jan

    2000-01-01

    An approach to clustering is presented that adapts the basic top-down induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both ...

  11. A Cost-Sensitive Decision Tree Learning Model—An Application to Customer-Value Based Segmentation%基于代价敏感决策树的客户价值细分

    Institute of Scientific and Technical Information of China (English)

    邹鹏; 莫佳卉; 江亦华; 叶强

    2011-01-01

    The objective of this research is to extend the current decision tree learning model, to handle data sets with unequal misclassification costs.The research explores the issue of asymmetric misclassification costs through an application to customer-value based segmentation using empirical data collected from one of the largest credit card issuing banks in China.The data includes attributes from customer satisfaction survey and credit card transaction history is used to validate the proposed model.The results show that the proposed cost-sensitive decision tree for customer-value based segmentation is an effective method compared to the original decision tree learning model.%由于错误分类代价差异和不同价值客户数量的不平衡分布,基于总体准确率的数据挖掘方法不能体现由于客户价值不同对分类效果带来的影响.为了解决错误分类不平衡的数据分类问题,利用代价敏感学习技术扩展现有决策树模型,将这一方法应用在客户价值细分,建立基于客户价值的错分代价矩阵,以分类代价最小化作为决策树分支的标准,建立分类的期望损失函数作为分类效果的评价标准,采用中国某银行的信用卡客户数据进行实验.实验结果表明,与传统决策树方法相比,代价敏感决策树对客户价值细分问题有更好的分类效果,可以更精确地控制代价敏感性和不同种分类错误的分布,降低总体的错误分类代价,使模型能更准确反映分类的代价,有效识别客户价值

  12. Can you boost your metabolism?

    Science.gov (United States)

    ... more calories than fat. So will building more muscle not boost your metabolism? Yes, but only by a small amount. Most ... you burn. Plus, when not in active use, muscles burn very few calories. Most ... most of your metabolism. What to do: Lift weights for stronger bones ...

  13. Boosting Applied to Word Sense Disambiguation

    OpenAIRE

    Escudero, Gerard; Marquez, Lluis; Rigau, German

    2000-01-01

    In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are s...

  14. Class Evolution Tree: A Graphical Tool to Support Decisions on the Number of Classes in Exploratory Categorical Latent Variable Modeling for Rehabilitation Research

    Science.gov (United States)

    Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

    2011-01-01

    The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…

  15. Application of analyzing influencing factors of life pressure in college students by decision tree%决策树分析在高校大学生生活压力影响因素分析中的应用

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

    Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions and poor interpersonal were largest (68.84%). Conclusions Combine with the characteristic of these different sub-health group when we develop mental health education and guiding. Specially, pay more attention to those poor interpersonal relationships, poor economic conditions and the only child college students.%  [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关

  16. 决策树分析在高校大学生生活压力影响因素分析中的应用%Application of analyzing influencing factors of life pressure in college students by decision tree

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

      [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关系;家庭子女数;兼职情况.经济情况差、人际关系差的大学生生活压力所占的比例最大(68.84%).[结论]开展大学生心理健康教育和指导时,要结合不同亚群人群的特点,有针对性开展;要特别关注人际关系差、经济情况差或独生子女的大学生.%Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions

  17. MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS%基于决策树的消费行为因素建模与实现

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  18. 基于决策树和链接相似的DeepWeb查询接口判定%Deep Web query interface identification based on decision tree and link-similar

    Institute of Scientific and Technical Information of China (English)

    李雪玲; 施化吉; 兰均; 李星毅

    2011-01-01

    针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法.该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口.结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率.%In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished, this paper proposed a Deep Web query interface identification method based on decision tree and link-similar. This method used attribute information gain ratio as selection level, built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features, and then used a new method based on link-similar to identify these unidentified again, distinguishing between Deep Web query interface and the interface of search engines. The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.

  19. EVFDT: An Enhanced Very Fast Decision Tree Algorithm for Detecting Distributed Denial of Service Attack in Cloud-Assisted Wireless Body Area Network

    OpenAIRE

    Rabia Latif; Haider Abbas; Seemab Latif; Ashraf Masood

    2015-01-01

    Due to the scattered nature of DDoS attacks and advancement of new technologies such as cloud-assisted WBAN, it becomes challenging to detect malicious activities by relying on conventional security mechanisms. The detection of such attacks demands an adaptive and incremental learning classifier capable of accurate decision making with less computation. Hence, the DDoS attack detection using existing machine learning techniques requires full data set to be stored in the memory and are not app...

  20. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev;

    2010-01-01

    ) topographic parameters were generated from Digital Elevation Models (DEMs) acquired using airborne LIDAR (Light Detection and Ranging) systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type) to explain organic/mineral field...... distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index...... measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186) were developed using (1) all of the parameters, (2) the primary DEM-derived topographic (morphological/hydrological) parameters only, (3) selected pairs of parameters and (4) excluding...

  1. 基于改进决策树算法的Web数据库查询结果自动分类方法%A Categorization Approach Based on Adapted Decision Tree Algorithm for Web Databases Query Results

    Institute of Scientific and Technical Information of China (English)

    孟祥福; 马宗民; 张霄雁; 王星

    2012-01-01

    To deal with the problem that too many results are returned from a Web database in response to a user query, this paper proposes a novel approach based on adapted decision tree algorithm for automatically categorizing Web database query results. The query history of all users in the system is analyzed offline and then similar queries in semantics are merged into the same cluster. Next, a set of tuple clusters over the original data is generated in accordance to the query clusters, each tuple cluster corresponding to one type of user preferences. When a query is coming, based on the tuple clusters generated in the offline time, a labeled and leveled categorization tree, which can enable the user to easily select and locate the information he/she needs, is constructed by using the adapted decision tree algorithm. Experimental results demonstrate that the categorization approach has lower navigational cost and better categorization effectiveness, and can meet different type user's personalized query needs effectively as well.%为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.

  2. Boosting Infrastructure Investments in Africa

    OpenAIRE

    Donald Kaberuka

    2011-01-01

    The absolute and relative lack of infrastructure in Africa suggests that the continent’s competitiveness could be boosted by scaling up investments in infrastructure. Such investments would facilitate domestic and international trade, enhance Africa’s integration into the global economy and promote better human development outcomes, especially, by bringing unconnected rural communities into the mainstream economy. While there are yawning gaps in all infrastructure subsectors, inadequate e...

  3. Holy Trees

    OpenAIRE

    Elosua, Miguel

    2013-01-01

    Puxi's streets are lined with plane trees, especially in the former French Concession (and particularly in the Luwan and Xuhui districts). There are a few different varieties of plane tree, but the one found in Shanghai, is the hybrid platane hispanica. In China they are called French Plane trees (faguo wutong - 法国梧桐), for they were first planted along the Avenue Joffre (now Huai Hai lu - 淮海路) in 1902 by the French. Their life span is long, over a thousand years, and they may grow as high as ...

  4. (In)direct detection of boosted dark matter

    International Nuclear Information System (INIS)

    We initiate the study of novel thermal dark matter (DM) scenarios where present-day annihilation of DM in the galactic center produces boosted stable particles in the dark sector. These stable particles are typically a subdominant DM component, but because they are produced with a large Lorentz boost in this process, they can be detected in large volume terrestrial experiments via neutral-current-like interactions with electrons or nuclei. This novel DM signal thus combines the production mechanism associated with indirect detection experiments (i.e. galactic DM annihilation) with the detection mechanism associated with direct detection experiments (i.e. DM scattering off terrestrial targets). Such processes are generically present in multi-component DM scenarios or those with non-minimal DM stabilization symmetries. As a proof of concept, we present a model of two-component thermal relic DM, where the dominant heavy DM species has no tree-level interactions with the standard model and thus largely evades direct and indirect DM bounds. Instead, its thermal relic abundance is set by annihilation into a subdominant lighter DM species, and the latter can be detected in the boosted channel via the same annihilation process occurring today. Especially for dark sector masses in the 10 MeV–10 GeV range, the most promising signals are electron scattering events pointing toward the galactic center. These can be detected in experiments designed for neutrino physics or proton decay, in particular Super-K and its upgrade Hyper-K, as well as the PINGU/MICA extensions of IceCube. This boosted DM phenomenon highlights the distinctive signatures possible from non-minimal dark sectors

  5. 基于决策树分类的云南省迪庆地区景观类型研究%Exploring Landscapes Based on Decision Tree Classification in the Diqin Region, Yunnan Province

    Institute of Scientific and Technical Information of China (English)

    李亚飞; 刘高焕; 黄翀

    2011-01-01

    Decision tree classification is a type of supervised classification method based on spatial data mining and knowledge discovery. In this paper, the authors examined the landscape pattern of the Diqin region by building the classification decision tree in Yunnan province and using Landsat TM imagery and digital elevation models (DEMs). Subsequently, a landscape distribution map was made. In order to look at the reliability and robustness of the decision tree classification method,the traditional supervised classification was used to derive a landscape distribution map over the region. A multitude of field sampling points were used to evaluate the accuracy of the two classification methods, covering the whole Diqing region and consisting of information regarding geographic coordinates, elevations, and the description of the major landscape types. Results indicate that the overall classification accuracies of the decision tree classification and the traditional supervised classification were 85.5% and 67.4% , respectively. The landscape distribution map derived by the decision tree classification method seems to be reliable in terms of the achievable accuracy. Several conclusions could be drawn by analyzing the derived landscape distribution map as follows. Landscape types in the Diqin region primarily included valley shrub,coniferous forest, sub alpine shrub meadow, alpine snow and ice, bare land, and water body,accounting for 5.5%, 36.16%, 3.4%, 3.7%, 25.4%, and 4.4% of the Diqin region area, respectively.Except bare land and water body, other landscape types varied essentially with elevation and aspect of maintains. The landscape of the largest area was found to be coniferous forest, which was consistent with the landform of alpine and canyon. Coniferous forest was the major landscape in the region, which was distributed over 3000 m above the sea level. In terms of different elevations,the coniferous forest could be conceptually divided into three

  6. Electron Tree

    DEFF Research Database (Denmark)

    Appelt, Ane L; Rønde, Heidi S

    2013-01-01

    The photo shows a close-up of a Lichtenberg figure – popularly called an “electron tree” – produced in a cylinder of polymethyl methacrylate (PMMA). Electron trees are created by irradiating a suitable insulating material, in this case PMMA, with an intense high energy electron beam. Upon discharge......, during dielectric breakdown in the material, the electrons generate branching chains of fractures on leaving the PMMA, producing the tree pattern seen. To be able to create electron trees with a clinical linear accelerator, one needs to access the primary electron beam used for photon treatments. We...... appropriated a linac that was being decommissioned in our department and dismantled the head to circumvent the target and ion chambers. This is one of 24 electron trees produced before we had to stop the fun and allow the rest of the accelerator to be disassembled....

  7. Characterization of African Bush Mango trees with emphasis on the differences between sweet and bitter trees in the Dahomey Gap (West Africa)

    NARCIS (Netherlands)

    Vihotogbe, R.

    2012-01-01

     African bush mango trees (ABMTs) are economically the most important species within the family of Irvingiaceae. They are priority trees producing non-timber forest products (NTFPs) and widely distributed in the humid lowland forests of West and Central Africa. To boost their production and dev

  8. 小波分析和决策树在低饱和度气层识别中的应用%Applying the wavelet analysis and decision tree to identify low-saturation natural gas

    Institute of Scientific and Technical Information of China (English)

    贺旭; 李雄炎; 周金煜; 于红岩

    2011-01-01

    The particular reservoir condition and low-amplitude structural trap generate the abundant low saturation natural gas in the Quaternary ot the Sanhu area in the Qaidam basin. It is difficult to accurately delineate reservoirs because of the poor reservoir properties, thin reservoir thickness and limitations of surrounding rocks and logging instrument resolution. The effects of the high shale content, high irreducible water saturation, high formation water salinity, and clay minerals result in the log curves show much ambiguity at Iow-saturation natural gas, so that the identification of low-saturation natural gas is particularly difficult. To solve this problem, this work uses wavelet analysis to reconstruct log curves in order to improve the vertical resolution, makes a comparative analysis with the imaging logging data, and uses improved log curves to accurately delineate reservoirs. At the same time, we employ the decision tree to set up the predictive model of low-saturation natural gas based on the transparency of learning process and intelligibility of study results of the decision tree. This study amends the predictive model based on actual characteristics of reservoirs and achieves the purpose of an accurate identification of low-saturation natural gas. Practical application shows that the wavelet analysis and decision tree can effectively solve the reservoir delineationand identification of low-saturation natural gas problem in the research area.%特殊的成藏条件和低幅度构造圈闭致使柴达木盆地三湖地区第四系存在大量的低饱和度气藏.储层物性较差,储层厚度偏薄,受围岩和测井仪器分辨率的限制,难以准确划分储层;高泥质含量、高束缚水饱和度、高地层水矿化度和粘土矿物的影响,致使测井曲线在低饱和度气层表现出许多模糊性,使低饱和废气层的识别显得尤为困难.针对这一问题,文章采用小波分析对测井曲线进行重构,以提高测井曲

  9. Duality and Data Dependence in Boosting /

    OpenAIRE

    Telgarsky, Matus

    2013-01-01

    Boosting algorithms produce accurate predictors for complex phenomena by welding together collections of simple predictors. In the classical method AdaBoost, as well as its immediate variants, the welding points are determined by convex optimization; unlike typical applications of convex optimization in machine learning, however, the AdaBoost scheme eschews the usual regularization and constraints used to control numerical and statistical properties. On the other hand, the data and simple pre...

  10. Positive Semidefinite Metric Learning with Boosting

    OpenAIRE

    Shen, Chunhua; Kim, Junae; Wang, Lei; Hengel, Anton van den

    2009-01-01

    The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed \\BoostMetric, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. \\BoostMetric is instead based on a key observat...

  11. Adaptive Sampling for Large Scale Boosting

    OpenAIRE

    Dubout, Charles; Fleuret, Francois

    2014-01-01

    Classical Boosting algorithms, such as AdaBoost, build a strong classifier without concern for the computational cost. Some applications, in particular in computer vision, may involve millions of training examples and very large feature spaces. In such contexts, the training time of off-the-shelf Boosting algorithms may become prohibitive. Several methods exist to accelerate training, typically either by sampling the features or the examples used to train the weak learners. Even if some of th...

  12. Where boosted significances come from

    Science.gov (United States)

    Plehn, Tilman; Schichtel, Peter; Wiegand, Daniel

    2014-03-01

    In an era of increasingly advanced experimental analysis techniques it is crucial to understand which phase space regions contribute a signal extraction from backgrounds. Based on the Neyman-Pearson lemma we compute the maximum significance for a signal extraction as an integral over phase space regions. We then study to what degree boosted Higgs strategies benefit ZH and tt¯H searches and which transverse momenta of the Higgs are most promising. We find that Higgs and top taggers are the appropriate tools, but would profit from a targeted optimization towards smaller transverse momenta. MadMax is available as an add-on to MadGraph 5.

  13. Recursive bias estimation and L2 boosting

    Energy Technology Data Exchange (ETDEWEB)

    Hengartner, Nicolas W [Los Alamos National Laboratory; Cornillon, Pierre - Andre [INRA, FRANCE; Matzner - Lober, Eric [RENNE, FRANCE

    2009-01-01

    This paper presents a general iterative bias correction procedure for regression smoothers. This bias reduction schema is shown to correspond operationally to the L{sub 2} Boosting algorithm and provides a new statistical interpretation for L{sub 2} Boosting. We analyze the behavior of the Boosting algorithm applied to common smoothers S which we show depend on the spectrum of I - S. We present examples of common smoother for which Boosting generates a divergent sequence. The statistical interpretation suggest combining algorithm with an appropriate stopping rule for the iterative procedure. Finally we illustrate the practical finite sample performances of the iterative smoother via a simulation study.

  14. Redundant Data Mining Based on Residual Data Merging in Decision Tree%决策树下引入残差数据合并的冗余数据挖掘

    Institute of Scientific and Technical Information of China (English)

    王倩

    2014-01-01

    提出采用残差数据合并技术的冗余数据优化挖掘算法,利用训练集建立决策树模型,引入C4.5决策树模型进行冗余数据主特征建模,在主分量特征决策树下,引入残差数据合并技术,设定数据残差特征伴随追踪模式,把传统方法中用于滤除的数据信息进行拼接伴随追踪定位,实现了冗余数据特征的优化挖掘。把方法应用到网络流量时间序列数据处理中实现网络异常监测,仿真实验表明,新的数据挖掘算法能有效提取到冗余数据特征作为有用检测特征,数据挖掘效率大幅提高,有效促进了海量数据隐藏特征的挖掘和应用,设计的网络流量监测软件能提高网络管理和监测实效性。%An improved optimization data mining algorithm based on redundant data merging technology was proposed. The training set was used to build the decision tree model, the C4.5 decision tree model was used for redundant data main fea-ture modeling. The accompanied tracking model of residual feature was set, and the information was used for tracking and positioning with data splicing. The optimization of redundant data mining was realized finally. It was applied into the net-work traffic anomaly detection, simulation result shows that improved method can extract the effective redundant data fea-ture as useful feature, and data mining efficiency is improved greatly. It can promote the massive data mining development with using the hidden features. And the designed network traffic monitoring software can improve the effectiveness of net-work management and monitoring.

  15. Big Fish and Prized Trees Gain Protection

    Institute of Scientific and Technical Information of China (English)

    Fred Pearce; 吴敏

    2004-01-01

    @@ Decisions made at a key conservation① meeting are good news for big and quirky② fish and commercially prized trees. Several species will enjoy extra protection against trade following rulings made at the Convention on International Trade in Endangered Species (CITES).

  16. 基于 C4.5决策树的股票数据挖掘%Stock Data Mining Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    王领; 胡扬

    2015-01-01

    由于目前利用数据挖掘算法对股票分析和预测存在数据量及技术指标等方面的问题,本文基于对股市数据的分析,适当选取某些指标作为决策属性,利用C4.5决策树分类算法进行分类预测。主要对股票技术指标进行介绍和优化,对C4.5算法的效率进行改进。改进后的算法结合优化的技术指标不仅能够提高数据挖掘的执行效率,同时也能在股票预测方面得到更高的收益。%Using data mining algorithms to analze and forecast the stock still has problems in technical indicators and quantity of data.Based on the analysis of stock market data, this paper selected certain indicators as decision attribute, and used C4.5 deci-sion tree to classify and forecast the stock.This article mainly optimized the indicators of stock, and improved the efficiency of C4.5 algorithm.Optimized algorithm combining with improved indicators not only enhances the efficiency of data mining, also gets better returns in stock forecasting.

  17. Bank Customer Churn Decision Tree Prediction Algorithm under Data mining Technology%数据挖掘技术下的银行客户流失决策树预测算法

    Institute of Scientific and Technical Information of China (English)

    石杨; 岳嘉佳

    2014-01-01

    在银行客户流失预测系统中经常要通过客户数据对未知客户的服务信息进行预测,以对银行今后的经营策略提供依据。在对客户的预测中,经常需要对他们的某种分类属性进行分类规则挖掘。该文主要探讨使用决策树这种常用的有效的方法来对客户数据进行分类规则挖掘。%In the bank customer churn prediction system often unknown by the customer data to predict customer service infor-mation in order to provide the basis for the bank in the future business strategy. In the customer's forecast, they often need to clas-sify certain classification rule mining properties. This paper discusses the use of this common and effective decision tree approach to classification rule mining of customer data.

  18. Modeling of stage-discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi-Sugeno inference system technique: a comparative study

    Science.gov (United States)

    Al-Abadi, Alaa M.

    2014-12-01

    The potential of using three different data-driven techniques namely, multilayer perceptron with backpropagation artificial neural network (MLP), M5 decision tree model, and Takagi-Sugeno (TS) inference system for mimic stage-discharge relationship at Gharraf River system, southern Iraq has been investigated and discussed in this study. The study used the available stage and discharge data for predicting discharge using different combinations of stage, antecedent stages, and antecedent discharge values. The models' results were compared using root mean squared error (RMSE) and coefficient of determination (R 2) error statistics. The results of the comparison in testing stage reveal that M5 and Takagi-Sugeno techniques have certain advantages for setting up stage-discharge than multilayer perceptron artificial neural network. Although the performance of TS inference system was very close to that for M5 model in terms of R 2, the M5 method has the lowest RMSE (8.10 m3/s). The study implies that both M5 and TS inference systems are promising tool for identifying stage-discharge relationship in the study area.

  19. RBOOST: RIEMANNIAN DISTANCE BASED REGULARIZED BOOSTING.

    Science.gov (United States)

    Liu, Meizhu; Vemuri, Baba C

    2011-03-30

    Boosting is a versatile machine learning technique that has numerous applications including but not limited to image processing, computer vision, data mining etc. It is based on the premise that the classification performance of a set of weak learners can be boosted by some weighted combination of them. There have been a number of boosting methods proposed in the literature, such as the AdaBoost, LPBoost, SoftBoost and their variations. However, the learning update strategies used in these methods usually lead to overfitting and instabilities in the classification accuracy. Improved boosting methods via regularization can overcome such difficulties. In this paper, we propose a Riemannian distance regularized LPBoost, dubbed RBoost. RBoost uses Riemannian distance between two square-root densities (in closed form) - used to represent the distribution over the training data and the classification error respectively - to regularize the error distribution in an iterative update formula. Since this distance is in closed form, RBoost requires much less computational cost compared to other regularized Boosting algorithms. We present several experimental results depicting the performance of our algorithm in comparison to recently published methods, LP-Boost and CAVIAR, on a variety of datasets including the publicly available OASIS database, a home grown Epilepsy database and the well known UCI repository. Results depict that the RBoost algorithm performs better than the competing methods in terms of accuracy and efficiency. PMID:21927643

  20. Boosting as a Product of Experts

    CERN Document Server

    Edakunni, Narayanan U; Kovacs, Tim

    2012-01-01

    In this paper, we derive a novel probabilistic model of boosting as a Product of Experts. We re-derive the boosting algorithm as a greedy incremental model selection procedure which ensures that addition of new experts to the ensemble does not decrease the likelihood of the data. These learning rules lead to a generic boosting algorithm - POE- Boost which turns out to be similar to the AdaBoost algorithm under certain assumptions on the expert probabilities. The paper then extends the POEBoost algorithm to POEBoost.CS which handles hypothesis that produce probabilistic predictions. This new algorithm is shown to have better generalization performance compared to other state of the art algorithms.

  1. Implementation of Fuzzy Logic controller in Photovoltaic Power generation using Boost Converter and Boost Inverter

    Directory of Open Access Journals (Sweden)

    Abubakkar Siddik A

    2012-06-01

    Full Text Available Increasing in power demand and shortage of conventional energy sources, researchers are focused on renewable energy. The proposed solar power generation circuit consists of solar array, boost converter and boost inverter. Low voltage, of photovoltaic array, is boosted using dc-dc boost converter to charge the battery and boost inverter convert this battery voltage to high quality sinusoidal ac voltage. The output of solar power fed from boost inverter feed to autonomous load without any intermediate conversion stage and a filter. For boost converter operation duty cycle is varied through fuzzy logic controller and PWM block to regulate the converter output voltage. The ac voltage total harmonic distortion (THD obtained using this configuration is quite acceptable. The proposed power generation system has several desirable features such as low cost and compact size as number of switches used, are limited to four as against six switches used in classical two-stage inverters.

  2. Advanced Airfoils Boost Helicopter Performance

    Science.gov (United States)

    2007-01-01

    Carson Helicopters Inc. licensed the Langley RC4 series of airfoils in 1993 to develop a replacement main rotor blade for their Sikorsky S-61 helicopters. The company's fleet of S-61 helicopters has been rebuilt to include Langley's patented airfoil design, and the helicopters are now able to carry heavier loads and fly faster and farther, and the main rotor blades have twice the previous service life. In aerial firefighting, the performance-boosting airfoils have helped the U.S. Department of Agriculture's Forest Service control the spread of wildfires. In 2003, Carson Helicopters signed a contract with Ducommun AeroStructures Inc., to manufacture the composite blades for Carson Helicopters to sell

  3. ATLAS boosted object tagging 2

    CERN Document Server

    Caudron, Julien; The ATLAS collaboration

    2015-01-01

    A detailed study into the optimal techniques for identifying boosted hadronically decaying W or Z bosons is presented. Various algorithms for reconstructing, grooming and tagging bosonic jets are compared for W bosons with a wide range of transverse momenta using 8 TeV data and 8 TeV and 13 TeV MC simulations. In addition, given that a hadronic jet has been identified as resulting from the hadronic decay of a W or Z, a technique is developed to discriminate between W and Z bosons. The modeling of the tagging variables used in this technique is studied using 8 TeV pp collision data and systematic uncertainties for the tagger efficiency and fake rates are evaluated.

  4. Fault trees

    International Nuclear Information System (INIS)

    Fault trees are a method of deductive analysis and a means of graphic representation of the reliability and security of systems. The principles of the method are set out and the main points illustrated by many examples of electrical systems, fluids, and mechanical systems as well as everyday occurrences. In addition, some advice is given on the use of the method

  5. Orthodontics Align Crooked Teeth and Boost Self-Esteem

    Science.gov (United States)

    ... desktop! more... Orthodontics Align Crooked Teeth and Boost Self- esteem Article Chapters Orthodontics Align Crooked Teeth and Boost Self- esteem Orthodontics print full article print this chapter email ...

  6. Unimodular Trees versus Einstein Trees

    CERN Document Server

    Alvarez, Enrique; Martin, Carmelo P

    2016-01-01

    The maximally helicity violating (MHV) tree level scattering amplitudes involving three, four or five gravitons are worked out in Unimodular Gravity. They are found to coincide with the corresponding amplitudes in General Relativity. This a remarkable result, insofar as both the propagators and the vertices are quite different in both theories.

  7. 采用决策树分类方法进行煤矸石信息提取研究%Research on using the decision tree classification method to extract coal gangue information

    Institute of Scientific and Technical Information of China (English)

    冯稳; 张志; 乌云其其格; 孟丹

    2011-01-01

    利用遥感技术快速、准确地调查煤矸石堆分布情况,对预防地质灾害以及保护生态环境和居民生命财产安全有着重要的指导意义.基于TM多光谱影像,运用知识决策树分类方法对江西萍乡煤矿区进行煤矸石信息提取试验.首先,在研究区背景知识的基础下,统计分析矿区内煤矸石及其他典型地物在影像上的光谱特征,建立了研究区的分类知识库;其次,在决策树分类模型支撑下,分别运用归一化差异植被指数、改进型归一化差异水体指数以及光谱阈值法对图像进行分类;最后,利用地学知识和几何特征进行分类后处理,分类精度达到82.97%.试验表明,该方法适用于煤矸石信息的自动提取,结合目视解译方法,可以提高解译的效率及准确度.%Using remote sensing technique to survey coal gangue' s distribution quickly and accurately has important guiding significance for the prevention of geological disasters and the protection of the ecological environment and residents' life and property securities. Based on TM multi-spectral image, it is adopted the decision tree classification method to extract Pingxiang coal mining area' coal gangue information in Jiangxi Province. Firstly, under the foundation of study area' s background knowledge, counted and analyzed the area' s coal gangue' s and other typical surface objects' spectral characteristics in RS image, then established the study area' s classification databases.Secondly, on the support of the decision tree classification model, used Normalized Difference Vegetation Index,Modified Normalized Difference Water Index and Spectrum Threshold Method to classify the image respectively. Ultimately, post-process the classified image by using geological knowledge and geometric feature. The total classification accuracy was up to 82. 97%. The experiment demonstrates that this method is suitable for coal gangue information's automatic extraction

  8. Application of vector projection method based on decision-tree-based support vector machines in fault diagnosis for transformer%DTBSVM的向量投影法在变压器故障诊断中的应用

    Institute of Scientific and Technical Information of China (English)

    张翠玲; 王大志; 江雪晨; 宁一

    2013-01-01

    By applying vector projection method in fault diagnosis for transformer ,the problem that how to structure effective SVM hierarchy based on decision-tree-based support vector machines (DTBSVM ) is solved . According to the cross situation between classification and classification sample sets ,Euclidean distance and radial basis function are utilized to calculate spatial distance and divisibility measure between different classifi-cations ,and the sequence on the basis of divisibility measure is made to design more reasonable hierarchy structure for classification .The fault diagnosis model combining one-to-rest with rest-to-rest classification is established by using the method of vector projection on decision-tree-based support vector machines ,and it can solve the multi-classification problem better .The method of vector projection aiming at N classification problem just constructs (N-1) SVM classifiers and has no unrecognized sector ,so the classification process is faster and the generalization ability is better .The test results show that correct-sentence rate increases compa-ring with traditional three-ratio method and neural network method in fault diagnosis ,so the method has bet-ter utility value .%文章将向量投影法应用在变压器故障诊断中,解决了如何构建有效SVM 层次的问题。按照类与类样本集之间的相交情况,利用欧氏距离和径向基函数计算类与类的空间距离和类间可分性测度,根据可分性测度进行排序,设计比较合理的层次结构进行分类。这种方法建立的故障诊断模型,是一种一对多、多对多分类相结合的故障诊断模型,用于解决多分类问题效果较好;这种方法对于 N类分类问题,只需构造(N-1)个SVM分类器,并且不存在不可识别的区域,分类过程比较快速,具有较好的泛化能力。实验证明与传统的三比值法和神经网络方法相比,所提出的方法在故障诊断的正判率

  9. Parameter estimation using B-Trees

    DEFF Research Database (Denmark)

    Schmidt, Albrecht; Bøhlen, Michael H.

    2004-01-01

    This paper presents a method for accelerating algorithms for computing common statistical operations like parameter estimation or sampling on B-Tree indexed data; the work was carried out in the context of visualisation of large scientific data sets. The underlying idea is the following: the shape...... of balanced data structures like B-Trees encodes and reflects data semantics according to the balance criterion. For example, clusters in the index attribute are somewhat likely to be present not only on the data or leaf level of the tree but should propagate up into the interior levels. The paper...... also hints at opportunities and limitations of this approach for visualisation of large data sets. The advantages of the method are manifold. Not only does it enable advanced algorithms through a performance boost for basic operations like density estimation, but it also builds on functionality that is...

  10. 基于邻域粗糙集和决策树算法的核电厂故障诊断方法%Fault Diagnosis Method for Nuclear Power Plant Based on Decision Tree and Neighborhood Rough Sets

    Institute of Scientific and Technical Information of China (English)

    慕昱; 夏虹; 刘永阔

    2011-01-01

    核动力装置系统复杂,需要采集和监测的变量较多,这给装置故障诊断增加了困难.针对该问题提出基于邻域粗糙集的参数约简算法,该算法实现了实数空间的粒度计算,可直接处理数值型参数,无需离散化参数.在此基础上,采用决策树算法对核电厂的失水事故、给水管道破裂、蒸汽发生器U形管破裂和主蒸汽管道破裂等4种典型故障进行训练学习,并将诊断决策结果与支持向量机算法进行对比.仿真结果表明,该算法可快速、准确地诊断出核电厂上述故障.%Nuclear power plants (NPP) are very complex system, which need to collect and monitor vast parameters. It's hard to diagnose the faults. A parameter reduction method based on neighborhood rough sets was proposed according to the problem.Granular computing was realized in a real space, so numerical parameters could be directly processed. On this basis, the decision tree was applied to learn from training samples which were the typical faults of nuclear power plant, i. e. , loss of coolant accident, feed water pipe rupture, steam generator tube rupture, main steam pipe rupture, and diagnose by using the acquired knowledge. Then the diagnostic results were compared with the results of support vector machine. The simulation results show that this method can rapidly and accurately diagnose the above mentioned faults of the NPP.

  11. Decision-tree sensitivity analysis for cost-effectiveness of whole-body FDG PET in the management of patients with non-small-cell lung carcinoma in Japan

    International Nuclear Information System (INIS)

    Whole-body 2-fluoro-2-D-[18F]deoxyglucose[FDG] positron emission tomography (WB-PET) may be more cost-effective than chest PET because WB-PET does not require conventional imaging (CI) for extrathoracic staging. The cost-effectiveness of WB-PET for the management of Japanese patients with non-small-cell lung carcinoma (NSCLC) was assessed. Decision-tree sensitivity analysis was designed, based on the two competing strategies of WB-PET vs. CI. WB-PET was assumed to have a sensitivity and specificity for detecting metastases, of 90% to 100% and CI of 80% to 90%. The prevalences of M1 disease were 34% and 20%. On thousand patients suspected of having NSCLC were simulated in each strategy. We surveyed the relevant literature for the choice of variables. Expected cost saving (CS) and expected life expectancy (LE) for NSCLC patients were calculated. The WB-PET strategy yielded an expected CS of $951 US to $1,493 US per patient and an expected LE of minus 0.0246 years to minus 0.0136 years per patient for the 71.4% NSCLC and 34% M1 disease prevalence at our hospital. PET avoided unnecessary bronchoscopies and thoracotomies for incurable and benign disease. Overall, the CS for each patient was $833 US to $2,010 US at NSCLC prevalences ranging from 10% to 90%. The LE of the WB-PET strategy was similar to that of the CI strategy. The CS and LE minimally varied in the two situations of 34% and 20% M1 disease prevalence. The introduction of a WB-PET strategy in place of CI for managing NSCLC patients is potentially cost-effective in Japan. (author)

  12. 基于决策树数据挖掘算法的大学生消费数据分析%Analysis of College Students Consumption Data Based on Decision Tree Data Mining Algorithm

    Institute of Scientific and Technical Information of China (English)

    黄剑

    2015-01-01

    文章使用决策树数据挖掘算法为基本工具,以近年大学生在校校园卡消费数据为基础,探讨数据挖掘在分析和研究大学生在校消费行为变化、消费特点以及与消费价格之间的深入关系.通过对消费数据的数据挖掘,分析得到近年来大学生消费行为、习惯、消费量的信息,找出其中的内在关联和变化趋势.并使文章结果能够更好、更有效的指导学校餐饮价格波动、菜品的新增;在学生可承受的价格范围内更好的提供餐饮服务.%This paper uses decision tree data mining algorithm as the basic tool. Based on the consumption data of college students in college in recent years, the relationship between college students consumption behavior, consumption characteristics and consumption price is analyzed and studied by data mining. Through data mining of consumption data, the information of College Students' consumption behavior, habits and consumption is analyzed, and the inherent relation and changing trend are found out. And the results of this paper can better and more effectively guide the food price fluctuation and the new dishes, and provide catering service for the students who can afford the price range.

  13. Boosting Wigner's nj-symbols

    CERN Document Server

    Speziale, Simone

    2016-01-01

    We study the SL(2,C) Clebsch-Gordan coefficients appearing in the lorentzian EPRL spin foam amplitudes for loop quantum gravity. We show how the amplitudes decompose into SU(2) nj-symbols at the vertices and integrals over boosts at the edges. The integrals define edge amplitudes that can be evaluated analytically using and adapting results in the literature, leading to a pure state sum model formulation. This procedure introduces virtual representations which, in a manner reminiscent to virtual momenta in Feynman amplitudes, are off-shell of the simplicity constraints present in the theory, but with the integrands that peak at the on-shell values. We point out some properties of the edge amplitudes which are helpful for numerical and analytical evaluations of spin foam amplitudes, and suggest among other things a simpler model useful for calculations of certain lowest order amplitudes. As an application, we estimate the large spin scaling behaviour of the simpler model, on a closed foam with all 4-valent edg...

  14. Boosted Fast Flux Loop Final Report

    International Nuclear Information System (INIS)

    The Boosted Fast Flux Loop (BFFL) project was initiated to determine basic feasibility of designing, constructing, and installing in a host irradiation facility, an experimental vehicle that can replicate with reasonable fidelity the fast-flux test environment needed for fuels and materials irradiation testing for advanced reactor concepts. Originally called the Gas Test Loop (GTL) project, the activity included (1) determination of requirements that must be met for the GTL to be responsive to potential users, (2) a survey of nuclear facilities that may successfully host the GTL, (3) conceptualizing designs for hardware that can support the needed environments for neutron flux intensity and energy spectrum, atmosphere, flow, etc. needed by the experimenters, and (4) examining other aspects of such a system, such as waste generation and disposal, environmental concerns, needs for additional infrastructure, and requirements for interfacing with the host facility. A revised project plan included requesting an interim decision, termed CD-1A, that had objectives of establishing the site for the project at the Advanced Test Reactor (ATR) at the Idaho National Laboratory (INL), deferring the CD 1 application, and authorizing a research program that would resolve the most pressing technical questions regarding GTL feasibility, including issues relating to the use of booster fuel in the ATR. Major research tasks were (1) hydraulic testing to establish flow conditions through the booster fuel, (2) mini-plate irradiation tests and post-irradiation examination to alleviate concerns over corrosion at the high heat fluxes planned, (3) development and demonstration of booster fuel fabrication techniques, and (4) a review of the impact of the GTL on the ATR safety basis. A revised cooling concept for the apparatus was conceptualized, which resulted in renaming the project to the BFFL. Before the subsequent CD-1 approval request could be made, a decision was made in April 2006 that

  15. Avoiding Anemia: Boost Your Red Blood Cells

    Science.gov (United States)

    ... link, please review our exit disclaimer . Subscribe Avoiding Anemia Boost Your Red Blood Cells If you’re ... and sluggish, you might have a condition called anemia. Anemia is a common blood disorder that many ...

  16. Anemia Boosts Stroke Death Risk, Study Finds

    Science.gov (United States)

    ... page: https://medlineplus.gov/news/fullstory_160476.html Anemia Boosts Stroke Death Risk, Study Finds Blood condition ... 2016 (HealthDay News) -- Older stroke victims suffering from anemia -- a lack of red blood cells -- may have ...

  17. Riemann curvature of a boosted spacetime geometry

    CERN Document Server

    Battista, Emmanuele; Scudellaro, Paolo; Tramontano, Francesco

    2014-01-01

    The ultrarelativistic boosting procedure had been applied in the literature to map the metric of Schwarzschild-de Sitter spacetime into a metric describing de Sitter spacetime plus a shock-wave singularity located on a null hypersurface. This paper evaluates the Riemann curvature tensor of the boosted Schwarzschild-de Sitter metric by means of numerical calculations, which make it possible to reach the ultrarelativistic regime gradually by letting the boost velocity approach the speed of light. Thus, for the first time in the literature, the singular limit of curvature through Dirac's delta distribution and its derivatives is numerically evaluated for this class of spacetimes. Eventually, the analysis of the Kteschmann invariant and the geodesic equation show that the spacetime possesses a scalar curvature singularity within a 3-sphere and it is possible to define what we here call boosted horizon, a sort of elastic wall where all particles are surprisingly pushed away, as numerical analysis demonstrates. Thi...

  18. Analisa Performansi menggunakan Algoritma Decision Tree

    OpenAIRE

    Swendy, Maries

    2016-01-01

    Data mining have been implemented to get the information more usefull then using conventional database combine with using human analysis as the user from the organization/ company systems. This Thesis proposes a tools to monitoring and tracking performance from the connectedness rule model of the results of survey, audit data and revenue’s data in organization/ company systems. The more dominant factors that influence the growth revenue as the variable of organization/ compa...

  19. Multitask Efficiencies in the Decision Tree Model

    CERN Document Server

    Drucker, Andrew

    2008-01-01

    In Direct Sum problems [KRW], one tries to show that for a given computational model, the complexity of computing a collection $F = \\{f_i\\}$ of functions on independent inputs is approximately the sum of their individual complexities. In this paper, by contrast, we study the diversity of ways in which the joint computational complexity can behave when all the $f_i$ are evaluated on a \\textit{common} input. Fixing some model of computational cost, let $C_F(X): \\{0, 1\\}^l \\to \\mathbf{R}$ give the cost of computing the subcollection $\\{f_i(x): X_i = 1\\}$, on common input $x$. What constraints do the functions $C_F(X)$ obey, when $F$ is chosen freely? $C_F(X)$ will, for reasonable models, obey nonnegativity, monotonicity, and subadditivity. We show that, in the deterministic, adaptive query model, these are `essentially' the only constraints: for any function $C(X)$ obeying these properties and any $\\epsilon > 0$, there exists a family $F$ of boolean functions and a $T > 0$ such that for all $X \\in \\{0, 1\\}^l$, \\...

  20. On the generator of Lorentz boost

    Institute of Scientific and Technical Information of China (English)

    Wang Zhi-Yong; Xiong Cai-Dong

    2006-01-01

    Traditionally, the theory related to the spatial angular momentum has been studied completely, while the investigation in the generator of Lorentz boost is inadequate. This paper shows that the generator of Lorentz boost has a nontrivial physical significance: it endows a charged system with an electric moment, and has an important significance for the electrical manipulations of electron spin in spintronics. An alternative treatment and interpretation for the traditional Darwin term and spin-orbit coupling are given.

  1. Internationalization of Boost Juice to Malaysia

    OpenAIRE

    Jane L. Menzies; Stuart C. Orr

    2014-01-01

    This case describes the process that the Australian juice retail chain, Boost Juice, has used to internationalize to Malaysia. The main objective of this case is to demonstrate good practice in regard to internationalization. The case provides the background of the juice bar industry in Malaysia and determines that it is an attractive market for new start-up juice bars. An analysis of Boost Juice's capability determined that the company utilized the skills of its staff, product innovations, b...

  2. The Information Extraction of Freshwater Marsh Wetland Based on the Decision Tree Method:Taking Zhalong Wetland as An Example%基于决策树方法的淡水沼泽湿地信息提取——以扎龙湿地为例

    Institute of Scientific and Technical Information of China (English)

    乔艳雯; 臧淑英; 那晓东

    2013-01-01

    In order to achieve timely and accurately basic information about wetland, which can be applied to the dynamic monitoring and protection of the wetland. The author chose zhalong wetland as the research area, during the process of extracting regional remote sensing information by using the TM image data, DEM data, normalized vegetation index, texture information compound identification index, finally the author classified the types of zhanglong wetland through constructing a decision tree model. For checking the feasible degree of method of classification based on decision tree model, the author made a comparison between the traditional maximum classification of supervision and the decision tree model. The results showed that: the decision tree method based on the index was used to classify, classification accuracy increased by 14.6%, overall Kappa coefficient increased by 0.1751, supervised classification accuracy was improved noticeably. Building decision tree classification which adopted multi-source data for extracting information of inland freshwater mire wetland was a very effective approach.%为了及时准确地获取湿地基础信息,对湿地进行动态监测和保护.以扎龙湿地为研究区,以区域湿地遥感信息提取为目标,采用TM影像数据、DEM数据、归一化植被指数、纹理信息等复合识别指标构建决策树模型,对研究区不同地类进行分类.然后与传统的最大监督分类法所得到的结果进行对比.结果表明,采用基于指数的决策树分类方法对扎龙湿地类型进行分类,较传统的最大似然监督分类精度提高了14.6%;总体Kappa系数提高了0.1751,分类精度较监督分类有明显的提高,证明基于多源数据决策树分类方法是内陆淡水沼泽湿地信息提取的有效手段.

  3. Nomogram to predict ipsilateral breast relapse based on pathology review from the EORTC 22881-10882 boost versus no boost trial

    International Nuclear Information System (INIS)

    Background and purpose: The EORTC 22881-10882 trial showed that for patients treated with breast conserving therapy (BCT), a 16 Gy boost dose significantly improved local control, but increased the risk of breast fibrosis. A model to estimate the risk of ipsilateral breast relapse (IBR) already exists, but now a model has been developed which takes boost treatment into account and is based on centrally reviewed pathology. Materials and methods: A Cox model was developed based on central pathology review data and clinical data of 1603 patients from the EORTC 22881-10882 trial with a median follow-up of 11.5 years. From a predefined set of variables, predictors with a maximal effect on 10-year IBR rate >4% were retained in the model. Bootstrap re-sampling was used to assess model calibration and discrimination. The results are presented in the form of a nomogram. Results: Apart from young age and no boost, presence of DCIS adjacent to the invasive tumor was associated with increased risk of IBR (HR 1.96, p = 0.001). Patients with high grade invasive tumors were younger than patients with low/intermediate grade (p < 0.0001). The nomogram includes histologic grade, DCIS, tumor diameter, age, tamoxifen, chemotherapy, and boost with a concordance probability estimate of 0.68. Conclusions: The nomogram for predicting IBR 10 years after BCT includes seven factors, with young age, presence of DCIS and boost treatment as the most dominant factors. The nomogram estimates IBR and confirms the importance of a boost dose. Combined with a model to predict fibrosis published previously, the nomogram presented here may assist in decision making for individual patients.

  4. 基于决策树法的北京城市居民通勤距离模式挖掘%Data mining on commuting distance mode of urban residents based on the analysis of decision tree

    Institute of Scientific and Technical Information of China (English)

    王茂军; 宋国庆; 许洁

    2009-01-01

    以问卷调查数据为基础,引进决策树分析方法,讨论了北京市城市居民通勤距离模式.研究发现:第一,在设定的修剪纯度下,北京城市居民通勤距离远近与出行工具、居住地变更、职业、居住地就业率、最小孩子求学状况、住房而积、家庭月收入、机动车利用状况密切相关;第二,在影响城市居民通勤距离的变量中,出行工具变量的重要性最大,其次是住房而积变量、最小孩子求学变量,再次为居住地变更变量、职业变量,家庭月收入变量为第四等级,机动车使用变量和本地就业率为第五等级.第三,因住房产权复杂性、迁居原因的多样性、被动郊区化以及生产、育儿福利及家庭内部事务分工等因素的影响,住房面积、迁居史、家庭生命周期、职业与通勤距离的关系与国内已有结论相悖,部分变量因子对短距离通勤具有决定性影响,部分变量对于长距离通勤有决定性影响.%With the development of suburbanization, urban residents now have more choices in jobs and housing locations. Nowadays, scholars increasingly pay attention to the studies on citizens' commuting mode. The analysis of commuting space characteristics belongs to the study of geography. Based on questionnaire survey, this paper first makes a descrip-tive analysis of people's commuting variables, distances, and directions. Then it discusses the commuters of Beijing by decision tree analysis and data mining. Conclusions are ob-tained as follows:First, under the fixed pruning severity, people's commuting distance is related to their traveling vehicles, resident locations, jobs, youngest child's education conditions, living space, family incomes, usage of cars, and employment rate on local areas. Factors such as gender, educational level, marital status, housing property are not involved in the mode. Second, our study of the relations between the eight variables and commuting distance is

  5. Boosted Jets at the LHC

    Science.gov (United States)

    Larkoski, Andrew

    2015-04-01

    Jets are collimated streams of high-energy particles ubiquitous at any particle collider experiment and serve as proxy for the production of elementary particles at short distances. As the Large Hadron Collider at CERN continues to extend its reach to ever higher energies and luminosities, an increasingly important aspect of any particle physics analysis is the study and identification of jets, electroweak bosons, and top quarks with large Lorentz boosts. In addition to providing a unique insight into potential new physics at the tera-electron volt energy scale, high energy jets are a sensitive probe of emergent phenomena within the Standard Model of particle physics and can teach us an enormous amount about quantum chromodynamics itself. Jet physics is also invaluable for lower-level experimental issues including triggering and background reduction. It is especially important for the removal of pile-up, which is radiation produced by secondary proton collisions that contaminates every hard proton collision event in the ATLAS and CMS experiments at the Large Hadron Collider. In this talk, I will review the myriad ways that jets and jet physics are being exploited at the Large Hadron Collider. This will include a historical discussion of jet algorithms and the requirements that these algorithms must satisfy to be well-defined theoretical objects. I will review how jets are used in searches for new physics and ways in which the substructure of jets is being utilized for discriminating backgrounds from both Standard Model and potential new physics signals. Finally, I will discuss how jets are broadening our knowledge of quantum chromodynamics and how particular measurements performed on jets manifest the universal dynamics of weakly-coupled conformal field theories.

  6. Philippine campaign boosts child immunizations.

    Science.gov (United States)

    Manuel-santana, R

    1993-03-01

    In 1989, USAID awarded the Philippines a 5-year, US $50 million Child Survival Program targeting improvement in immunization coverage of children, prenatal care coverage for pregnant women, and contraceptive prevalence. Upon successful completion of performance benchmarks at the end of each year, USAID released monies to fund child survival activities for the following year. This program accomplished a major program goal, which was decentralization of health planning. The Philippine Department of Health soon incorporated provincial health planning. The Philippine Department of Health soon incorporated provincial health planning in its determination of allocation of resources. Social marketing activities contributed greatly to success in achieving the goal of boosting the immunization coverage rate for the 6 antigens listed under the Expanded Program for Immunization (51%-85% of infants, 1986-1991). In fact, rural health officers in Tarlac Province in Central Luzon went from household to household to talk to mothers about the benefits of immunizing a 1-year-old child, thereby contributing greatly to their achieving a 95% full immunization coverage rate by December 1991. Social marketing techniques included modern marketing strategies and multimedia channels. They first proved successful in metro Manila which, at the beginning of the campaign, had the lowest immunization rate of all 14 regions. Every Wednesday was designated immunization day and was when rural health centers vaccinated the children. Social marketing also successfully publicized oral rehydration therapy (ORT), breast feeding, and tuberculosis control. Another contributing factor to program success in child survival activities was private sector involvement. For example, the Philippine Pediatric Society helped to promote ORT as the preferred treatment for acute diarrhea. Further, the commercial sector distributed packets of oral rehydration salts and even advertised its own ORT product. At the end of 2

  7. Modular Tree Automata

    DEFF Research Database (Denmark)

    Bahr, Patrick

    2012-01-01

    Tree automata are traditionally used to study properties of tree languages and tree transformations. In this paper, we consider tree automata as the basis for modular and extensible recursion schemes. We show, using well-known techniques, how to derive from standard tree automata highly modular r...

  8. THE INTERACTIVE DECISION COMMITTEE FOR CHEMICAL TOXICITY ANALYSIS.

    Science.gov (United States)

    Kang, Chaeryon; Zhu, Hao; Wright, Fred A; Zou, Fei; Kosorok, Michael R

    2012-01-01

    We introduce the Interactive Decision Committee method for classification when high-dimensional feature variables are grouped into feature categories. The proposed method uses the interactive relationships among feature categories to build base classifiers which are combined using decision committees. A two-stage or a single-stage 5-fold cross-validation technique is utilized to decide the total number of base classifiers to be combined. The proposed procedure is useful for classifying biochemicals on the basis of toxicity activity, where the feature space consists of chemical descriptors and the responses are binary indicators of toxicity activity. Each descriptor belongs to at least one descriptor category. The support vector machine, the random forests, and the tree-based AdaBoost algorithms are utilized as classifier inducers. Forward selection is used to select the best combinations of the base classifiers given the number of base classifiers. Simulation studies demonstrate that the proposed method outperforms a single large, unaggregated classifier in the presence of interactive feature category information. We applied the proposed method to two toxicity data sets associated with chemical compounds. For these data sets, the proposed method improved classification performance for the majority of outcomes compared to a single large, unaggregated classifier. PMID:24415822

  9. 基于决策树技术的预离网客户识别模型%Identifying Model for Anticipated Communication Service-discontinuing Customers Based on Decision Tree Technology

    Institute of Scientific and Technical Information of China (English)

    李智勇; 冷夔

    2011-01-01

    The loss of customers will directly impact the survival and development of telecom enterprises,therefore,it is necessary to use the data to develop technology,identify anticipated communication service-discontinuing customers by establishing forecast test model as well as carrying out effective measures to retain.Taking the CRISP-DM(Cross-Industry Standard Process for Data Mining) as a tool,from aspects of business understanding,data understanding,data preparation,establishment of model,model evaluation and outcome arrangement,the method of establishing the model for identifying anticipated communication service-discontinuing customers was discussed in detail.Decision tree node model was used as data mining tool and technology to establish the identifying model.The model has played a positive role in the process of retaining works for communication customers and achieved good effect.%客户流失将直接影响到通信运营企业的生存与发展.对此,需要利用数据挖掘技术通过建立预测模型,将有离网倾向的客户(预离网客户)识别出来,并采用有效措施进行保有.以CRISP-DM(跨行业数据挖掘过程标准)为工具,从商业理解、数据理解、数据准备、建立模型、模型评估和结果部署6个阶段,详细阐述了预离网客户识别模型的构建方法,并以决策树节点模型作为数据挖掘工具及数据挖掘技术来建立预离网客户识别模型.预离网客户识别模型已经在移动客户保有工作当中起到了积极的作用,并取得了良好的实际效果.

  10. Comparative Study of 4-Switch Buck-Boost Controller and Regular Buck-Boost

    Directory of Open Access Journals (Sweden)

    Taufik Taufik

    2011-01-01

    Full Text Available A very important characteristic that dc-dc converters require is the ability to efficiently regulate an output voltage with a wide ranging value of input voltages. A recently developed solution to this requirement is a synchronous 4-Switch Buck-Boost controller developed by Linear Technology. The Linear Technology’s LTC3780 controller chip enables  the adoption of a 4-Switch switching topology as opposed to the traditional single-switch Buck-Boost topology. In this paper, the LTC3780’s 4-Switch BuckBoost topology is analyzed and its performance is compared against those of the regular single-switch Buck-Boost topology. Results from computer simulations demonstrate the benefits of using the 4-switch approach than the conventional buck-boost method.

  11. Binarization With Boosting and Oversampling for Multiclass Classification.

    Science.gov (United States)

    Sen, Ayon; Islam, Md Monirul; Murase, Kazuyuki; Yao, Xin

    2016-05-01

    Using a set of binary classifiers to solve multiclass classification problems has been a popular approach over the years. The decision boundaries learnt by binary classifiers (also called base classifiers) are much simpler than those learnt by multiclass classifiers. This paper proposes a new classification framework, termed binarization with boosting and oversampling (BBO), for efficiently solving multiclass classification problems. The new framework is devised based on the one-versus-all (OVA) binarization technique. Unlike most previous work, BBO employs boosting for solving the hard-to-learn instances and oversampling for handling the class-imbalance problem arising due to OVA binarization. These two features make BBO different from other existing works. Our new framework has been tested extensively on several multiclass supervised and semi-supervised classification problems using five different base classifiers, including neural networks, C4.5, k -nearest neighbor, repeated incremental pruning to produce error reduction, support vector machine, random forest, and learning with local and global consistency. Experimental results show that BBO can exhibit better performance compared to its counterparts on supervised and semi-supervised classification problems. PMID:25955858

  12. Concomitant boost radiotherapy in oropharynx carcinomas

    International Nuclear Information System (INIS)

    Fifty-five patients with resectable and unresectable oropharynx carcinomas were treated with concomitant boost radiotherapy. Forty-two of the patients (76%) had stages III-IV disease. Although none of the patients had undergone major surgery to the primary tumor, 11 had neck dissections prior to radiotherapy, and 19 (35%) received chemotherapy. The planned total tumor dose was 69.9 Gy, delivered over 5.5 weeks. During the last 3.5 weeks, a boost to the initial gross disease was delivered in 13 fractions of 1.5 Gy each, as a second daily fraction in a progressively accelerated schedule; the prescribed dose outside the boost volume thus was 50.4 Gy. Median follow-up for surviving patients was 31.5 months (range: 16-65 months). All patients but one completed the planned radiotherapy schedule. According to the RTOG scoring system, 48 patients (88%) presented with grades 3-4 acute toxicity. The rate of grades 3-4 late complications was 12%. At three years the actuarial locoregional control rate was 69.5% and overall survival was 60%. We conclude that this concomitant boost schedule is feasible and does not seem to be associated with an excess risk of late complications. Acute toxicity was higher in association with chemotherapy, but remained manageable. Although the oncological results appear encouraging, evaluation of the efficacy of concomitant boost schedules compared with conventionally fractionated irradiation with or without concomitant chemotherapy requires prospective randomized trials. (orig.)

  13. Concomitant boost radiotherapy in oropharynx carcinomas

    Energy Technology Data Exchange (ETDEWEB)

    Bieri, S.; Allal, A.S.; Kurtz, J.M. [Ospedale San Giovanni, Bellinzona (Switzerland). Dept. of Radiation Oncology; Dulguerov, P.; Lehmann, W. [Geneva Univ. Hospital (Switzerland). Div. of Head and Neck Surgery

    1998-12-31

    Fifty-five patients with resectable and unresectable oropharynx carcinomas were treated with concomitant boost radiotherapy. Forty-two of the patients (76%) had stages III-IV disease. Although none of the patients had undergone major surgery to the primary tumor, 11 had neck dissections prior to radiotherapy, and 19 (35%) received chemotherapy. The planned total tumor dose was 69.9 Gy, delivered over 5.5 weeks. During the last 3.5 weeks, a boost to the initial gross disease was delivered in 13 fractions of 1.5 Gy each, as a second daily fraction in a progressively accelerated schedule; the prescribed dose outside the boost volume thus was 50.4 Gy. Median follow-up for surviving patients was 31.5 months (range: 16-65 months). All patients but one completed the planned radiotherapy schedule. According to the RTOG scoring system, 48 patients (88%) presented with grades 3-4 acute toxicity. The rate of grades 3-4 late complications was 12%. At three years the actuarial locoregional control rate was 69.5% and overall survival was 60%. We conclude that this concomitant boost schedule is feasible and does not seem to be associated with an excess risk of late complications. Acute toxicity was higher in association with chemotherapy, but remained manageable. Although the oncological results appear encouraging, evaluation of the efficacy of concomitant boost schedules compared with conventionally fractionated irradiation with or without concomitant chemotherapy requires prospective randomized trials. (orig.)

  14. Positive Semidefinite Metric Learning with Boosting

    CERN Document Server

    Shen, Chunhua; Wang, Lei; Hengel, Anton van den

    2009-01-01

    The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed \\BoostMetric, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. \\BoostMetric is instead based on a key observation that any positive semidefinite matrix can be decomposed into a linear positive combination of trace-one rank-one matrices. \\BoostMetric thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting method is easy to implement, does not require tuning, and can accommodate various types of constraints. Experiments on various datasets show that the proposed algorithm compares favorably to those state-of-the-art methods in terms of classi...

  15. Bronchi, Bronchial Tree, & Lungs

    Science.gov (United States)

    ... specific Modules Resources Archived Modules Updates Bronchi, Bronchial Tree, & Lungs Bronchi and Bronchial Tree In the mediastinum , at the level of the ... trachea. As the branching continues through the bronchial tree, the amount of hyaline cartilage in the walls ...

  16. Frequent Pattern Mining using CATSIM Tree

    OpenAIRE

    Ketan Modi; Mr. B. L Pal

    2012-01-01

    Efficient algorithms to discover frequent patterns are essential in data mining research. Frequent pattern mining is emerging as powerful tool for many business applications such as e-commerce, recommendersystems and supply chain management and group decision support systems to name a few. Several effective data structures, such as two-dimensional arrays, graphs, trees and tries have been proposed to collect candidate and frequent itemsets. It seems as the tree structure is most extractive to...

  17. Centrifugal compressor design for electrically assisted boost

    Science.gov (United States)

    Y Yang, M.; Martinez-Botas, R. F.; Zhuge, W. L.; Qureshi, U.; Richards, B.

    2013-12-01

    Electrically assisted boost is a prominent method to solve the issues of transient lag in turbocharger and remains an optimized operation condition for a compressor due to decoupling from turbine. Usually a centrifugal compressor for gasoline engine boosting is operated at high rotational speed which is beyond the ability of an electric motor in market. In this paper a centrifugal compressor with rotational speed as 120k RPM and pressure ratio as 2.0 is specially developed for electrically assisted boost. A centrifugal compressor including the impeller, vaneless diffuser and the volute is designed by meanline method followed by 3D detailed design. Then CFD method is employed to predict as well as analyse the performance of the design compressor. The results show that the pressure ratio and efficiency at design point is 2.07 and 78% specifically.

  18. Improved Stereo Matching With Boosting Method

    Directory of Open Access Journals (Sweden)

    Shiny B

    2015-06-01

    Full Text Available Abstract This paper presents an approach based on classification for improving the accuracy of stereo matching methods. We propose this method for occlusion handling. This work employs classification of pixels for finding the erroneous disparity values. Due to the wide applications of disparity map in 3D television medical imaging etc the accuracy of disparity map has high significance. An initial disparity map is obtained using local or global stereo matching methods from the input stereo image pair. The various features for classification are computed from the input stereo image pair and the obtained disparity map. Then the computed feature vector is used for classification of pixels by using GentleBoost as the classification method. The erroneous disparity values in the disparity map found by classification are corrected through a completion stage or filling stage. A performance evaluation of stereo matching using AdaBoostM1 RUSBoost Neural networks and GentleBoost is performed.

  19. Boost Breaking in the EFT of Inflation

    CERN Document Server

    Delacretaz, Luca V; Senatore, Leonardo

    2015-01-01

    If time-translations are spontaneously broken, so are boosts. This symmetry breaking pattern can be non-linearly realized by either just the Goldstone boson of time translations, or by four Goldstone bosons associated with time translations and boosts. In this paper we extend the Effective Field Theory of Multifield Inflation to consider the case in which the additional Goldstone bosons associated with boosts are light and coupled to the Goldstone boson of time translations. The symmetry breaking pattern forces a coupling to curvature so that the mass of the additional Goldstone bosons is predicted to be equal to $\\sqrt{2}H$ in the vast majority of the parameter space where they are light. This pattern therefore offers a natural way of generating self-interacting particles with Hubble mass during inflation. After constructing the general effective Lagrangian, we study how these particles mix and interact with the curvature fluctuations, generating potentially detectable non-Gaussian signals.

  20. Centrifugal compressor design for electrically assisted boost

    International Nuclear Information System (INIS)

    Electrically assisted boost is a prominent method to solve the issues of transient lag in turbocharger and remains an optimized operation condition for a compressor due to decoupling from turbine. Usually a centrifugal compressor for gasoline engine boosting is operated at high rotational speed which is beyond the ability of an electric motor in market. In this paper a centrifugal compressor with rotational speed as 120k RPM and pressure ratio as 2.0 is specially developed for electrically assisted boost. A centrifugal compressor including the impeller, vaneless diffuser and the volute is designed by meanline method followed by 3D detailed design. Then CFD method is employed to predict as well as analyse the performance of the design compressor. The results show that the pressure ratio and efficiency at design point is 2.07 and 78% specifically