WorldWideScience

Sample records for based decision-tree models

  1. Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in

  2. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data.With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM hi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  3. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling,respectively to generate knowledge from training data.With these methods,building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach.The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area,Zhejiang Province,China using TM bi-temporal imageries and GIS data.To evaluate the performance of the resultant knowledge bases,the classification results were compared to existing soil map based on field survey.The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  4. Assessment of Groundwater Potential Based on Multicriteria Decision Making Model and Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Huajie Duan

    2016-01-01

    Full Text Available Groundwater plays an important role in global climate change and satisfying human needs. In the study, RS (remote sensing and GIS (geographic information system were utilized to generate five thematic layers, lithology, lineament density, topology, slope, and river density considered as factors influencing the groundwater potential. Then, the multicriteria decision model (MCDM was integrated with C5.0 and CART, respectively, to generate the decision tree with 80 surveyed tube wells divided into four classes on the basis of the yield. To test the precision of the decision tree algorithms, the 10-fold cross validation and kappa coefficient were adopted and the average kappa coefficient for C5.0 and CART was 90.45% and 85.09%, respectively. After applying the decision tree to the whole study area, four classes of groundwater potential zones were demarcated. According to the classification result, the four grades of groundwater potential zones, “very good,” “good,” “moderate,” and “poor,” occupy 4.61%, 8.58%, 26.59%, and 60.23%, respectively, with C5.0 algorithm, while occupying the percentages of 4.68%, 10.09%, 26.10%, and 59.13%, respectively, with CART algorithm. Therefore, we can draw the conclusion that C5.0 algorithm is more appropriate than CART for the groundwater potential zone prediction.

  5. Statistical Decision-Tree Models for Parsing

    CERN Document Server

    Magerman, D M

    1995-01-01

    Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...

  6. A decision-tree-based model for evaluating the thermal comfort of horses

    Directory of Open Access Journals (Sweden)

    Ana Paula de Assis Maia

    2013-12-01

    Full Text Available Thermal comfort is of great importance in preserving body temperature homeostasis during thermal stress conditions. Although the thermal comfort of horses has been widely studied, there is no report of its relationship with surface temperature (T S. This study aimed to assess the potential of data mining techniques as a tool to associate surface temperature with thermal comfort of horses. T S was obtained using infrared thermography image processing. Physiological and environmental variables were used to define the predicted class, which classified thermal comfort as "comfort" and "discomfort". The variables of armpit, croup, breast and groin T S of horses and the predicted classes were then subjected to a machine learning process. All variables in the dataset were considered relevant for the classification problem and the decision-tree model yielded an accuracy rate of 74 %. The feature selection methods used to reduce computational cost and simplify predictive learning decreased model accuracy to 70 %; however, the model became simpler with easily interpretable rules. For both these selection methods and for the classification using all attributes, armpit and breast T S had a higher power rating for predicting thermal comfort. Data mining techniques show promise in the discovery of new variables associated with the thermal comfort of horses.

  7. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    Science.gov (United States)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  8. English BNP identification based on corpus-trained decision tree

    Institute of Scientific and Technical Information of China (English)

    孟遥; 赵铁军; 李生; 张晓光

    2001-01-01

    Finding simple, non-recursive, base noun phrase is an important step for many natural language processing applications. This paper presents a new corpus-based approach using decision tree for that purpose. In contrast to previous methods for Base NP identification, we adopt a decision tree trained from Penn Treebank to identify Base NP. And a self-learning mechanism is further integrated into our model. Experimental results show good performances using our method. The method can also be applied to processing of any other language.

  9. INDUCTION OF DECISION TREES BASED ON A FUZZY NEURAL NETWORK

    Institute of Scientific and Technical Information of China (English)

    Tang Bin; Hu Guangrui; Mao Xiaoquan

    2002-01-01

    Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.

  10. Mapping environmental susceptibility to Saint Louis encephalitis virus, based on a decision tree model of remotelysensed data

    Directory of Open Access Journals (Sweden)

    Camilo H. Rotela

    2011-11-01

    Full Text Available In response to the first human outbreak (January - May 2005 of Saint Louis encephalitis (SLE virus in Córdoba province, Argentina, we developed an environmental SLE virus risk map for the capital, i.e. Córdoba city. The aim was to provide a map capable of detecting macro-environmental factors associated with the spatial distribution of SLE cases, based on remotely sensed data and a geographical information system. Vegetation, soil brightness, humidity status, distances to water-bodies and areas covered by vegetation were assessed based on pre-outbreak images provided by the Landsat 5TM satellite. A strong inverse relationship between the number of humans infected by SLEV and distance to high-vigor vegetation was noted. A statistical non-hierarchic decision tree model was constructed, based on environmental variables representing the areas surrounding patient residences. From this point of view, 18% of the city could be classified as being at high risk for SLEV infection, while 34% carried a low risk, or none at all. Taking the whole 2005 epidemic into account, 80% of the cases came from areas classified by the model as medium-high or high risk. Almost 46% of the cases were registered in high-risk areas, while there were no cases (0% in areas affirmed as risk free.

  11. Decision tree modeling with relational views

    CERN Document Server

    Bentayeb, Fadila

    2002-01-01

    Data mining is a useful decision support technique that can be used to discover production rules in warehouses or corporate data. Data mining research has made much effort to apply various mining algorithms efficiently on large databases. However, a serious problem in their practical application is the long processing time of such algorithms. Nowadays, one of the key challenges is to integrate data mining methods within the framework of traditional database systems. Indeed, such implementations can take advantage of the efficiency provided by SQL engines. In this paper, we propose an integrating approach for decision trees within a classical database system. In other words, we try to discover knowledge from relational databases, in the form of production rules, via a procedure embedding SQL queries. The obtained decision tree is defined by successive, related relational views. Each view corresponds to a given population in the underlying decision tree. We selected the classical Induction Decision Tree (ID3) a...

  12. Decision Tree Model for Non-Fatal Road Accident Injury

    Directory of Open Access Journals (Sweden)

    Fatin Ellisya Sapri

    2017-02-01

    Full Text Available Non-fatal road accident injury has become a great concern as it is associated with injury and sometimes leads to the disability of the victims. Hence, this study aims to develop a model that explains the factors that contribute to non-fatal road accident injury severity. A sample data of 350 non-fatal road accident cases of the year 2016 were obtained from Kota Bharu District Police Headquarters, Kelantan. The explanatory variables include road geometry, collision type, accident time, accident causes, vehicle type, age, airbag, and gender. The predictive data mining techniques of decision tree model and multinomial logistic regression were used to model non-fatal road accident injury severity. Based on accuracy rate, decision tree with CART algorithm was found to be more accurate as compared to the logistic regression model. The factors that significantly contribute to non-fatal traffic crashes injury severity are accident cause, road geometry, vehicle type, age and collision type.

  13. CUDT: a CUDA based decision tree algorithm.

    Science.gov (United States)

    Lo, Win-Tsung; Chang, Yue-Shan; Sheu, Ruey-Kai; Chiu, Chun-Chieh; Yuan, Shyan-Ming

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5 ∼ 55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  14. Extracting impervious surfaces from multi-source satellite imagery based on unified conceptual model by decision tree algorithm

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Extraction of impervious surfaces is one of the necessary processes in urban change detection.This paper derived a unified conceptual model (UCM) from the vegetation-impervious surface-soil (VIS) model to make the extraction more effective and accurate.UCM uses the decision tree algorithm with indices of spectrum and texture,etc.In this model,we found both dependent and independent indices for multi-source satellite imagery according to their similarity and dissimilarity.The purpose of the indices is to remove the other land-use and land-cover types (e.g.,vegetation and soil) from the imagery,and delineate the impervious surfaces as the result.UCM has the same steps conducted by decision tree algorithm.The Landsat-5 TM image (30 m) and the Satellite Probatoire d’Observation de la Terre (SPOT-4) image (20 m) from Chaoyang District (Beijing) in 2007 were used in this paper.The results show that the overall accuracy in Landsat-5 TM image is 88%,while 86.75% in SPOT-4 image.It is an appropriate method to meet the demand of urban change detection.

  15. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Directory of Open Access Journals (Sweden)

    Barbara Kraszewska-Głomba

    2016-03-01

    Full Text Available As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT and C-reactive protein (CRP in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42 or viral (n=39 infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30, the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  16. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-03-10

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  17. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  18. Minimum description length criterion based decision tree dynamic pruning method in speech recognition

    Institute of Scientific and Technical Information of China (English)

    XU Xianghua; HE lin

    2006-01-01

    In phonetic decision tree based state tying, decision trees with varying leaf nodes denote models with different complexity. By studying the influence of model complexity on system performance and speaker adaptation, a decision tree dynamic pruning method based on Minimum Description Length (MDL) criterion is presented. In the method, a well-trained,large-sized phonetic decision tree is selected as an initial model set, and model complexity is computed by adding a penalty parameter which alters according to the amount of adaptation data. Largely attributed to the reasonable selection of initial models and the integration of stochastic and aptotic of MDL criterion, the proposed method gains high performance by combining with speaker adaptation.

  19. Generating Decision Trees Method Based on Improved ID3 Algorithm

    Institute of Scientific and Technical Information of China (English)

    Yang Ming; Guo Shuxu1; Wang Jun3

    2011-01-01

    The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.

  20. Soil Organic Matter Mapping by Decision Tree Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHOU Bin; ZHANG Xing-Gang; WANG Fan; WANG Ren-Chao

    2005-01-01

    Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data.The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform,geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative,landscape factor-associated regular system. This system could be used to predict continuous SOM spatial distribution.By analyzing factors such as elevation, geological unit, soil type, land use, remotely sensed data, upslope contributing area, slope, aspect, planform curvature, and profile curvature, the decision tree could predict distribution of soil organic matter levels. Among these factors, elevation, land use, aspect, soil type, the first principle component of bitemporal Landsat TM, and upslope contributing area were considered the most important variables for predicting SOM. Results of the prediction between SOM content and landscape types sorted by the decision tree showed a close relationship with an accuracy of 81.1%.

  1. Vehicle diagnostics based on decision trees

    OpenAIRE

    Gofman, Yevgeniy

    2012-01-01

    In this article the method of diagnosing of T-150 model car lanos is designed for enterprise CJSC "ZAZ". This method is implemented to improve the quality of the process of diagnostics and increasing the rate of automation in the enterprise as a whole.

  2. Multitask Efficiencies in the Decision Tree Model

    CERN Document Server

    Drucker, Andrew

    2008-01-01

    In Direct Sum problems [KRW], one tries to show that for a given computational model, the complexity of computing a collection $F = \\{f_i\\}$ of functions on independent inputs is approximately the sum of their individual complexities. In this paper, by contrast, we study the diversity of ways in which the joint computational complexity can behave when all the $f_i$ are evaluated on a \\textit{common} input. Fixing some model of computational cost, let $C_F(X): \\{0, 1\\}^l \\to \\mathbf{R}$ give the cost of computing the subcollection $\\{f_i(x): X_i = 1\\}$, on common input $x$. What constraints do the functions $C_F(X)$ obey, when $F$ is chosen freely? $C_F(X)$ will, for reasonable models, obey nonnegativity, monotonicity, and subadditivity. We show that, in the deterministic, adaptive query model, these are `essentially' the only constraints: for any function $C(X)$ obeying these properties and any $\\epsilon > 0$, there exists a family $F$ of boolean functions and a $T > 0$ such that for all $X \\in \\{0, 1\\}^l$, \\...

  3. Modelling of Random Textured Tandem Silicon Solar Cells Characteristics: Decision Tree Approach

    Directory of Open Access Journals (Sweden)

    R.S. Kamath

    2016-11-01

    Full Text Available We report decision tree (DT modeling of randomly textured tandem silicon solar cells characteristics. The photovoltaic modules of silicon-based solar cells are extremely popular due to their high efficiency and longer lifetime. Decision tree model is one of the most common data mining models can be used for predictive analytics. The reported investigation depicts optimum decision tree architecture achieved by tuning parameters such as Min split, Min bucket, Max depth and Complexity. DT model, thus derived is easy to understand and entails recursive partitioning approach implemented in the “rpart” package. Moreover the performance of the model is evaluated with reference Mean Square Error (MSE estimate of error rate. The modeling of the random textured silicon solar cells reveals strong correlation of efficiency with “Fill factor” and “thickness of a-Si layer”.

  4. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  5. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  6. Cost Effectiveness of Imiquimod 5% Cream Compared with Methyl Aminolevulinate-Based Photodynamic Therapy in the Treatment of Non-Hyperkeratotic, Non-Hypertrophic Actinic (Solar) Keratoses: A Decision Tree Model

    OpenAIRE

    Wilson, Edward C F

    2010-01-01

    Background: Actinic keratosis (AK) is caused by chronic exposure to UV radiation (sunlight). First-line treatments are cryosurgery, topical 5-fluorouracil (5-FU) and topical diclofenac. Where these are contraindicated or less appropriate, alternatives are imiquimod and photodynamic therapy (PDT). Objective: To compare the cost effectiveness of imiquimod and methyl aminolevulinate-based PDT (MAL-PDT) from the perspective of the UK NHS. Methods: A decision tree model was populated with data fro...

  7. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  8. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev

    2010-01-01

    ) topographic parameters were generated from Digital Elevation Models (DEMs) acquired using airborne LIDAR (Light Detection and Ranging) systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type) to explain organic/mineral field......Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic...... distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index...

  9. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

  10. Computer Crime Forensics Based on Improved Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Ying Wang

    2014-04-01

    Full Text Available To find out the evidence of crime-related evidence and association rules among massive data, the classic decision tree algorithms such as ID3 for classification analysis have appeared in related prototype systems. So how to make it more suitable for computer forensics in variable environments becomes a hot issue. When selecting classification attributes, ID3 relies on computation of information entropy. Then the attributes owning more value are selected as classification nodes of the decision tress. Such classification is unrealistic under many cases. During the process of ID3 algorithm there are too many logarithms, so it is complicated to handle with the dataset which has various classification attributes. Therefore, contraposing the special demand for computer crime forensics, ID3 algorithm is improved and a novel classification attribute selection method based on Maclaurin-Priority Value First method is proposed. It adopts the foot changing formula and infinitesimal substitution to simplify the logarithms in ID3. For the errors generated in this process, an apposite constant is introduced to be multiplied by the simplified formulas for compensation. The idea of Priority Value First is introduced to solve the problems of value deviation. The performance of improved method is strictly proved in theory. Finally, the experiments verify that our scheme has advantage in computation time and classification accuracy, compared to ID3 and two existing algorithms

  11. Parallelism of spatial data mining based on autocorrelation decision tree

    Institute of Scientific and Technical Information of China (English)

    Zhang Shuyu; Zhu Zhongying

    2005-01-01

    Define and theory of autocorrelation decision tree (ADT) is introduced. In spatial data mining, spatial parallel query are very expensive operations. A new parallel algorithm in terms of autocorrelation decision tree is presented. And the new method reduces CPU- and I/O-time and improves the query efficiency of spatial data. For dynamic load balancing, there are better control and optimization. Experimental performance comparison shows that the improved algorithm can obtain a optimal accelerator with the same quantities of processors. There are more completely accesses on nodes. And an individual implement of intelligent information retrieval for spatial data mining is presented.

  12. Decision-tree induction from self-mapping space based on web

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shu-yu; ZHU Zhong-ying

    2007-01-01

    An improved decision tree method for web information retrieval with self-mapping attributes is proposed. The self-mapping tree has a value of self-mapping attribute in its internal node, and information based on dissimilarity between a pair of mapping sequences. This method selects self-mapping which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that the improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that the selfmapping decision tree is promising for data mining and knowledge discovery.

  13. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  14. PERFORMANCE EVALUATION OF C-FUZZY DECISION TREE BASED IDS WITH DIFFERENT DISTANCE MEASURES

    Directory of Open Access Journals (Sweden)

    Vinayak Mantoor

    2012-01-01

    Full Text Available With the ever-increasing growth of computer networks and emergence of electronic commerce in recent years, computer security has become a priority. Intrusion detection system (IDS is often used as another wall of protection in addition to intrusion prevention techniques. This paper introduces a concept and design of decision trees based on Fuzzy clustering. Fuzzy clustering is the core functional part of the overall decision tree development and the developed tree will be referred to as C-fuzzy decision trees. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study the performance of C-fuzzy decision tree based IDS with different distance measures. We analyzed the results of our study using KDD Cup 1999 data and compared the accuracy of the classifier with different distance measures.

  15. Optimized block-based connected components labeling with decision trees.

    Science.gov (United States)

    Grana, Costantino; Borghesani, Daniele; Cucchiara, Rita

    2010-06-01

    In this paper, we define a new paradigm for eight-connection labeling, which employs a general approach to improve neighborhood exploration and minimizes the number of memory accesses. First, we exploit and extend the decision table formalism introducing OR-decision tables, in which multiple alternative actions are managed. An automatic procedure to synthesize the optimal decision tree from the decision table is used, providing the most effective conditions evaluation order. Second, we propose a new scanning technique that moves on a 2 x 2 pixel grid over the image, which is optimized by the automatically generated decision tree. An extensive comparison with the state of art approaches is proposed, both on synthetic and real datasets. The synthetic dataset is composed of different sizes and densities random images, while the real datasets are an artistic image analysis dataset, a document analysis dataset for text detection and recognition, and finally a standard resolution dataset for picture segmentation tasks. The algorithm provides an impressive speedup over the state of the art algorithms.

  16. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study

    Science.gov (United States)

    Ramezankhani, Azra; Hadavandi, Esmaeil; Pournik, Omid; Shahrabi, Jamal; Azizi, Fereidoun; Hadaegh, Farzad

    2016-01-01

    Objective The current study was undertaken for use of the decision tree (DT) method for development of different prediction models for incidence of type 2 diabetes (T2D) and for exploring interactions between predictor variables in those models. Design Prospective cohort study. Setting Tehran Lipid and Glucose Study (TLGS). Methods A total of 6647 participants (43.4% men) aged >20 years, without T2D at baselines ((1999–2001) and (2002–2005)), were followed until 2012. 2 series of models (with and without 2-hour postchallenge plasma glucose (2h-PCPG)) were developed using 3 types of DT algorithms. The performances of the models were assessed using sensitivity, specificity, area under the ROC curve (AUC), geometric mean (G-Mean) and F-Measure. Primary outcome measure T2D was primary outcome which defined if fasting plasma glucose (FPG) was ≥7 mmol/L or if the 2h-PCPG was ≥11.1 mmol/L or if the participant was taking antidiabetic medication. Results During a median follow-up of 9.5 years, 729 new cases of T2D were identified. The Quick Unbiased Efficient Statistical Tree (QUEST) algorithm had the highest sensitivity and G-Mean among all the models for men and women. The models that included 2h-PCPG had sensitivity and G-Mean of (78% and 0.75%) and (78% and 0.78%) for men and women, respectively. Both models achieved good discrimination power with AUC above 0.78. FPG, 2h-PCPG, waist-to-height ratio (WHtR) and mean arterial blood pressure (MAP) were the most important factors to incidence of T2D in both genders. Among men, those with an FPG≤4.9 mmol/L and 2h-PCPG≤7.7 mmol/L had the lowest risk, and those with an FPG>5.3 mmol/L and 2h-PCPG>4.4 mmol/L had the highest risk for T2D incidence. In women, those with an FPG≤5.2 mmol/L and WHtR≤0.55 had the lowest risk, and those with an FPG>5.2 mmol/L and WHtR>0.56 had the highest risk for T2D incidence. Conclusions Our study emphasises the utility of DT for exploring interactions between

  17. Supervised learning with decision tree-based methods in computational and systems biology.

    Science.gov (United States)

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

  18. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  19. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  20. CLOUD DETECTION BASED ON DECISION TREE OVER TIBETAN PLATEAU WITH MODIS DATA

    Directory of Open Access Journals (Sweden)

    L. Xu

    2012-07-01

    Full Text Available Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  1. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  2. A model of software outsourcing evaluate based on decision tree of uncertain data%基于不确定数据决策树分类算法的软件外包评价模型

    Institute of Scientific and Technical Information of China (English)

    赵娟; 王明春; 李小亮

    2011-01-01

    Classic decision tree algorithm is unfit to cope with uncertain data pervaded at both the construction and classification phase. In order to overcome these limitations, distribution-based classification algorithm is proposed. This algorithm extends the decision tree technique to an uncertain environment and then applies this decision tree classification algorithm to uncertain data among software outsourcing evaluate domain. The result would be proved efficiently and good performance.%经典决策树算法不能处理树构建和分类过程中的不确定数据,针对这一局限,提出基于概率分布的方法,把决策树分类技术扩展到含有不确定数据的环境中。然后,针对软件外包评价中普遍存在着不确定数据,应用决策树分类方法,对软件外包公司进行客观评价。实验表明,本文提出的基于不确定数据的决策树分类算法能够实现对软件外包评价的定量研究。

  3. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    Science.gov (United States)

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for Rtraining(2) and Rtest(2), respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for Rtraining(2) and Rtest(2), respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (Rtraining(2) and Rtest(2) were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615.

  4. Design of TV Fault Repair Model Based on Decision Tree Algorithm%基于决策树算法的电视机故障维修模型设计

    Institute of Scientific and Technical Information of China (English)

    武彤; 程辉

    2014-01-01

    Before a television set comes into market,it is required to undergo a series of examination to guarantee its quality. Once a flaw is found,it will go to back shop to be doubly checked and repaired. The fault reason and fault component located are usually determined by their own working experience. It places very strict requirements on the workers,and cannot improve the repair efficiency. TV produc-tion line fault repair model based on the decision tree algorithm is researched which is able to accurately and quickly find out the relation-ship among the fault type,fault reason and product type. So it saves the time of looking for the fault reason and type,considerably eleva-ting the productivity of repairing.%在电视机生产线中,有许多产品质量控制检查点。产品在某个检查点查出存在质量问题,将进入返修线进行修理。在返修点由修理工人凭经验来确定故障原因及定位故障元器件类型,这样就对修理工有很高的要求,而且不能有效地提高维修工作效率。文中研究的基于决策树算法的电视机生产线故障维修模型,能够通过模型找出产品类型、故障现象与故障原因之间的关系,从而快速地确定故障类型,这样节省了查找故障原因及类型的时间,提高了维修效率。

  5. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe

    2014-01-01

    With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...... with outlier identification show high accuracy in the presence of variance and uncertainties due to wind power generation and other dispersed generation units. The performance of this approach is demonstrated on the operational model of western Danish power system with the scale of around 200 lines and 400...

  6. Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    Jing-Fen Zhang; Si-Min He; Jin-Jin Cai; Xing-Jun Cao; Rui-Xiang Sun; Yan Fu; Rong Zeng; Wen Gao

    2005-01-01

    In this study, we present a preprocessing method for quadrupole time-of-flight(Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.

  7. Research on Scholarship Evaluation System based on Decision Tree Algo-rithm

    Institute of Scientific and Technical Information of China (English)

    尹骁; 王明宇

    2015-01-01

    Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the college students. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and construct a data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information. And also having found some factors that plays a significant role in the growing up of the college students through analysis and re⁃search of moral education, intellectural education and culture&PE.

  8. A Decision Tree-Structured Algorithm of Speaker Adaptation Based on Gaussian Similarity Analysis

    Institute of Scientific and Technical Information of China (English)

    WU Ji; WANG Zuoying

    2001-01-01

    Gaussian Similarity Analysis (GSA)algorithm can be used to estimate the similarity between two Gaussian distributed variables with full covariance matrix. Based on this algorithm, we propose a method in speaker adaptation of covariance. It is different from the traditional algorithms, which mainly focus on the adaptation of mean vector of state observation probability density. A binary decision tree is constructed offline with the similarity measure and the adaptation procedure is data-driven. It can be shown from the experiments that we can get a significant further improvement over the mean vectors adaptation.

  9. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Science.gov (United States)

    Pavlopoulos, Sotiris A; Stasis, Antonis CH; Loukis, Euripides N

    2004-01-01

    Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS) and "clear" Mitral Regurgitation (MR) using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that describe S1, S2 and the systolic

  10. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  11. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    Directory of Open Access Journals (Sweden)

    Wan-Yu Chang

    2015-09-01

    Full Text Available In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  12. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  13. Reweighting with Boosted Decision Trees

    CERN Document Server

    Rogozhnikov, A

    2016-01-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the "signal" events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  14. Reweighting with Boosted Decision Trees

    Science.gov (United States)

    Rogozhnikov, Alex

    2016-10-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers [1]. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting — assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  15. A Decision Tree Based Pedometer and its Implementation on the Android Platform

    Directory of Open Access Journals (Sweden)

    Juanying Lin

    2015-02-01

    Full Text Available This paper describes a decision tree (DT based ped ometer algorithm and its implementation on Android. The DT- based pedometer can classify 3 gai t patterns, including walking on level ground (WLG, up stairs (WUS and down stairs (WDS . It can discard irrelevant motion and count user’s steps accurately. The overall classifi cation accuracy is 89.4%. Accelerometer, gyroscope and magnetic field sensors are used in th e device. When user puts his/her smart phone into the pocket, the pedometer can automatica lly count steps of different gait patterns. Two methods are tested to map the acceleration from mobile phone’s reference frame to the direction of gravity. Two significant features are employed to classify different gait patterns.

  16. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe;

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented...... in DIgSILENT PowerFactory environment and applied to western Danish Power System which is passing through a phase of major transformation. The results have shown that phasing out of central power plants coupled with large scale wind energy integration and more dependence on international ties can have...

  17. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Meisam Azad-Manjiri

    2014-04-01

    Full Text Available Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory and uses decision tree algorithm to abstract relationships between ethical principles and morality of actions. We apply this architecture to build an agent that provides guidance to health care workers faced with ethical dilemmas. Our results show that the agent is able to learn ethic well.

  18. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  19. Decision tree approach for soil liquefaction assessment.

    Science.gov (United States)

    Gandomi, Amir H; Fridline, Mark M; Roke, David A

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

  20. 基于决策树的消费行为因素建模与实现%MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  1. FPGA-Based Network Traffic Security:Design and Implementation Using C5.0 Decision Tree Classifier

    Institute of Scientific and Technical Information of China (English)

    Tarek Salah Sobh; Mohamed Ibrahiem Amer

    2013-01-01

    In this work, a hardware intrusion detection system (IDS) model and its implementation are introduced to perform online real-time traffic monitoring and analysis. The introduced system gathers some advantages of many IDSs: hardware based from implementation point of view, network based from system type point of view, and anomaly detection from detection approach point of view. In addition, it can detect most of network attacks, such as denial of services (DoS), leakage, etc. from detection behavior point of view and can detect both internal and external intruders from intruder type point of view. Gathering these features in one IDS system gives lots of strengths and advantages of the work. The system is implemented by using field programmable gate array (FPGA), giving a more advantages to the system. A C5.0 decision tree classifier is used as inference engine to the system and gives a high detection ratio of 99.93%.

  2. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Science.gov (United States)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  3. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe

    2013-01-01

    and adopts a methodology of importance sampling to maximize the information contained in the database so as to increase the accuracy of DT. Further, this paper also studies the effectiveness of DT by implementing its corresponding preventive control schemes. These approaches are tested on the detailed model......Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time......-domain simulation and the process of data mining, which is then implemented online as guidelines for preventive control schemes. An algorithm named Classification and Regression Trees (CART) is used to train the DT and key to this approach lies on the accuracy of DT. This paper proposes contingency oriented DT...

  4. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  5. Induction of hybrid decision tree based on post-discretization strategy

    Institute of Scientific and Technical Information of China (English)

    WANG Limin; YUAN Senmiao

    2004-01-01

    By redefining test selection measure, we propose in this paper a new algorithm, Flexible NBTree, which induces a hybrid of decision tree and Naive Bayes. Flexible NBTree mitigates the negative effect of information loss on test selection by applying postdiscretization strategy: at each internal node in the tree, we first select the test which is the most useful for improving classification accuracy, then apply discretization of continuous tests. The finial decision tree nodes contain univariate splits as regular decision trees, but the leaves contain Naive Bayesian classifiers. To evaluate the performance of Flexible NBTree, we compare it with NBTree and C4.5, both applying pre-discretization of continuous attributes. Experimental results on a variety of natural domains indicate that the classification accuracy of Flexible NBTree is substantially improved.

  6. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  7. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization

    Directory of Open Access Journals (Sweden)

    David Sánchez-Rodríguez

    2015-06-01

    Full Text Available Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have.

  8. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Yanfeng Hong; Kun Deng

    2006-01-01

    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  9. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  10. Assessment of the risk factors of coronary heart events based on data mining with decision trees.

    Science.gov (United States)

    Karaolis, Minas A; Moutiris, Joseph A; Hadjipanayi, Demetra; Pattichis, Constantinos S

    2010-05-01

    Coronary heart disease (CHD) is one of the major causes of disability in adults as well as one of the main causes of death in the developed countries. Although significant progress has been made in the diagnosis and treatment of CHD, further investigation is still needed. The objective of this study was to develop a data-mining system for the assessment of heart event-related risk factors targeting in the reduction of CHD events. The risk factors investigated were: 1) before the event: a) nonmodifiable-age, sex, and family history for premature CHD, b) modifiable-smoking before the event, history of hypertension, and history of diabetes; and 2) after the event: modifiable-smoking after the event, systolic blood pressure, diastolic blood pressure, total cholesterol, high-density lipoprotein, low-density lipoprotein, triglycerides, and glucose. The events investigated were: myocardial infarction (MI), percutaneous coronary intervention (PCI), and coronary artery bypass graft surgery (CABG). A total of 528 cases were collected from the Paphos district in Cyprus, most of them with more than one event. Data-mining analysis was carried out using the C4.5 decision tree algorithm for the aforementioned three events using five different splitting criteria. The most important risk factors, as extracted from the classification rules analysis were: 1) for MI, age, smoking, and history of hypertension; 2) for PCI, family history, history of hypertension, and history of diabetes; and 3) for CABG, age, history of hypertension, and smoking. Most of these risk factors were also extracted by other investigators. The highest percentages of correct classifications achieved were 66%, 75%, and 75% for the MI, PCI, and CABG models, respectively. It is anticipated that data mining could help in the identification of high and low risk subgroups of subjects, a decisive factor for the selection of therapy, i.e., medical or surgical. However, further investigation with larger datasets is

  11. Acid deposition: decision framework. Volume 1. Description of conceptual framework and decision-tree models. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Balson, W.E.; Boyd, D.W.; North, D.W.

    1982-08-01

    Acid precipitation and dry deposition of acid materials have emerged as an important environmental issue affecting the electric utility industry. This report presents a framework for the analysis of decisions on acid deposition. The decision framework is intended as a means of summarizing scientific information and uncertainties on the relation between emissions from electric utilities and other sources, acid deposition, and impacts on ecological systems. The methodology for implementing the framework is that of decision analysis, which provides a quantitative means of analyzing decisions under uncertainty. The decisions of interest include reductions in sulfur oxide and other emissions thought to be precursors of acid deposition, mitigation of acid deposition impacts through means such as liming of waterways and soils, and choice of strategies for research. The report first gives an overview of the decision framework and explains the decision analysis methods with a simplified caricature example. The state of scientific information and the modeling assumptions for the framework are then discussed for the three main modules of the framework: emissions and control technologies; long-range transport and chemical conversion in the atmosphere; and ecological impacts. The report then presents two versions of a decision tree model that implements the decision framework. The basic decision tree addresses decisions on emissions control and mitigation in the immediate future and a decade hence, and it includes uncertainties in the long-range transport and ecological impacts. The research emphasis decision tree addresses the effect of research funding on obtaining new information as the basis for future decisions. Illustrative data and calculations using the decision tree models are presented.

  12. 基于Hadoop的改进决策树剪枝算法%Decision tree pruning algorithm based on Hadoop

    Institute of Scientific and Technical Information of China (English)

    张晶星; 李石君

    2016-01-01

    针对当前决策树剪枝算法较少考虑训练集嘈杂度对模型的影响,以及传统驻留内存分类算法处理海量数据困难的问题,提出一种基于 Hadoop平台的不确定概率误差剪枝算法(IEP),并将其应用在C4.5算法中。在剪枝时,认为用于建树的训练集是嘈杂的,通过将基于不确定概率误差分类数作为剪枝选择依据,减少训练集不可靠对模型的影响。在 Ha-doop平台下,通过将C4.5-IEP算法以文件分裂的方式进行 MapReduce程序设计,增强处理大规模数据的能力,具有较好的可扩展性。%Concerning that current decision tree pruning algorithms seldom consider the influence of the level of noise in the training set on the model,and traditional algorithms of resident memory have difficulty on processing massive data,an imprecise probability error pruning algorithm named IEP was proposed based on Hadoop and applied in C4.5 algorithm.When pruning,IEP algorithm considered that the training set used to design decision trees is noisy,and the error classified number based on imprecise probabi-lity was used as a foundation of pruning to reduce the influence of the noisy data on the model.C4.5-IEP implemented on Hadoop by MapReduce programming based on file split enhanced the ability of dealing with massive data and improved the algorithm’s extendibility.

  13. 聚类支持下决策树模型的借阅数据分析%Analysis Of The Lending Data With Decision Tree Model Based On Clustering

    Institute of Scientific and Technical Information of China (English)

    翟剑锋

    2012-01-01

    通过对高校图书馆提供的借阅数据进行筛选、净化、转换等数据处理,研究了聚类支持下决策树分类技术及其在图书馆借阅数据中的应用。利用聚类得到决策树的训练样本,以期得到高质量的决策树并进一步提高推荐的准确率。以某高校图书馆借阅数据为例,将以上研究结果应用于该校图书馆借阅数据分析,分析的结果提供给图书馆管理者,作为馆藏政策、图书推荐、图书馆管理的参考依据。%Through the choice, purification and transfer of lending data provided by the library, probes into the features of library lending data by using data-mining technique, and then puts the research result into the use of library information system. The paper explores Decision Tree technique supported by clustering and its application in library, uses clustering analysis to obtain the training samples of Decision Tree, and then to obtain high-quality DecisionTree and further improve the preciseness of books' recommendation. Taking an University Library as an example ,the paper applies the above research results to analyze lending data. The result of the analysis offers a basis to collection-policy-making, books recommendation and library management for library managers.

  14. Teratozoospermia Classification Based on the Shape of Sperm Head Using OTSU Threshold and Decision Tree

    Directory of Open Access Journals (Sweden)

    Masdiyasa I Gede Susrama

    2016-01-01

    Full Text Available Teratozoospermia is one of the results of expert analysis of male infertility, by conducting lab tests microscopically to determine the morphology of spermatozoa, one of which is the normal and abnormal form of the head of spermatozoa. The laboratory test results are in the form of a complete image of spermatozoa. In this study, the shape of the head of spermatozoa was taken from a WHO standards book. The pictures taken had a fairly clear imaging and still had noise, thus to differentiate between the head of normal and abnormal spermatozoa, several processes need to be performed, which include: a pre-process or image adjusting, a threshold segmentation process using Otsu threshold method, and a classification process using a decision tree. Training and test data are presented in stages, from 5 to 20 data. Test results of using Otsu segmentation and a decision tree produced different errors in each level of training data, which were 70%, 75%, and 80% for training data of size 5×2, 10×2, and 20×2, respectively, with an average error of 75%. Thus, this study of using Otsu threshold segmentation and a Decision Tree can classify the form of the head of spermatozoa as abnormal or Normal

  15. P2P Network Traffic Classification Based on Decision Tree%基于决策树的P2P流量识别方法研究

    Institute of Scientific and Technical Information of China (English)

    李晟锴

    2011-01-01

    针对新型P2P业务采用净荷加密和伪装端口等方法来逃避检测的问题,提出了一种基于决策树的P2P流量识别方法.该方法将决策树方法应用于网络流量识别领域,以适应网络流量的识别要求.决策树方法通过利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.实验结果验证了C4.5决策树算法相比较Na(i)ve Bayes、Bayes Network算法,处理相对简单且计算量不大,具有较高的数据处理效率和分类精度,能够提高网络流量分类精度,更适用于P2P流量识别.%To solve the question of new P2P application with payload encryption and camouflage to evade detection port,propose P2P network traffic classification based on decision tree. This method applies decision tree into the areas of network traffic to accommodate Internet traffic identification requirements. Decision tree method builds a classification model using information entropy in training data and classifies flows just by a simple search of the decision tree. Compared with Naive Bayes.Bayes network algorithm,experimental results demonstrate the C4.5 decision tree can achieve high classification accuracy with faster computational time by relatively simple and small calculation processing. It is more suitable to P2P traffic identification.

  16. 決策樹形式知識之線上預測系統架構 | An On-Line Decision Tree-Based Predictive System Architecture

    Directory of Open Access Journals (Sweden)

    馬芳資、林我聰 Fang-Tz Ma、Woo-Tsong Lin

    2003-10-01

    ="font-size: small;">This paper presents an on-line decision tree-based predictive system architecture. The architecture contains nine components, including a database of the examples, a learning system of the decision trees, a knowledge base, a historical knowledge base, a maintaining interface of the decision trees, an interface to upload training and testing examples, a PMML (Predictive Model Markup Language translator, an on-line predictive system, and a merging optional decision trees system. There are three channels to import knowledge in the architecture; the developers can upload the examples to the learning system to induce the decision tree, directly input the information of decision trees through the user interface, or import the decision trees in PMML format. In order to integrate the knowledge of the decision trees, we added the merging optional decision trees system into this architecture. The merging optional decision trees system can combine multiple decision trees into a single decision tree to integrate the knowledge of the trees. In the future research, we will implement this architecture as a real system in the web-based platform to do some empirical analyses. And in order to improve the performance of the merging decision trees, we will also develop some pruning strategies in the merging optional decision trees system.

  17. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain

    2014-01-01

    This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power...... system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost...

  18. Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM

    Directory of Open Access Journals (Sweden)

    Shweta Tiwari

    2012-06-01

    Full Text Available Around the world, trading in the stock market has gained huge attractiveness as a means through which, one can obtain vast profits. Attempting to profitably and precisely predict the financial market has long engrossed the interests and attention of bankers, economists and scientists alike. Stock market prediction is the act of trying, to determine the future value of a company’s stock or other financial instrument traded on a financial exchange. Accurate stock market predictions are important for many reasons. Chief among all is the need for investors, to hedge against potential market risks and the opportunities for arbitrators and speculators, to make profits by trading indexes. Stock Market is a place, where shares are issued and traded. These shares are either traded through Stock exchanges or Overthe-Counter in physical or electronic form. Data mining, as a process of discovering useful patterns, correlations has its own role in financial modeling. Data mining is a discipline in computational intelligence that deals with knowledge discovery, data analysis and full and semi-autonomous decision making. Prediction of stock market by data mining techniques has been receiving a lot of attention recently. This paper presents a hybrid system based on decision tree- rough set, for predicting the trends in the Bombay Stock Exchange (BSESENSEX with the combination of Hierarchical Hidden Markov Model. In this paper we present future trends on the bases of price earnings and dividend. The data on accounting earnings when averaged over many years help to predict the present value of future dividends.

  19. Totally optimal decision trees for Boolean functions

    KAUST Repository

    Chikalov, Igor

    2016-07-28

    We study decision trees which are totally optimal relative to different sets of complexity parameters for Boolean functions. A totally optimal tree is an optimal tree relative to each parameter from the set simultaneously. We consider the parameters characterizing both time (in the worst- and average-case) and space complexity of decision trees, i.e., depth, total path length (average depth), and number of nodes. We have created tools based on extensions of dynamic programming to study totally optimal trees. These tools are applicable to both exact and approximate decision trees, and allow us to make multi-stage optimization of decision trees relative to different parameters and to count the number of optimal trees. Based on the experimental results we have formulated the following hypotheses (and subsequently proved): for almost all Boolean functions there exist totally optimal decision trees (i) relative to the depth and number of nodes, and (ii) relative to the depth and average depth.

  20. Geometric Decision Tree

    CERN Document Server

    Manwani, Naresh

    2010-01-01

    In this paper we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node, are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.

  1. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays.

  2. Derived operating rules for a reservoir operation system: Comparison of decision trees, neural decision trees and fuzzy decision trees

    Science.gov (United States)

    Wei, Chih-Chiang; Hsu, Nien-Sheng

    2008-02-01

    This article compares the decision-tree algorithm (C5.0), neural decision-tree algorithm (NDT) and fuzzy decision-tree algorithm (FIDs) for addressing reservoir operations regarding water supply during normal periods. The conventional decision-tree algorithm, such as ID3 and C5.0, executes rapidly and can easily be translated into if-then-else rules. However, the C5.0 algorithm cannot discover dependencies among attributes and cannot treat the non-axis-parallel class boundaries of data. The basic concepts of the two algorithms presented are: (1) NDT algorithm combines the neural network technologies and conventional decision-tree algorithm capabilities, and (2) FIDs algorithm extends to apply fuzzy sets for all attributes with membership function grades and generates a fuzzy decision tree. In order to obtain higher classification rates in FIDs, the flexible trapezoid fuzzy sets are employed to define membership functions. Furthermore, an intelligent genetic algorithm is utilized to optimize the large number of variables in fuzzy decision-tree design. The applicability of the presented algorithms is demonstrated through a case study of the Shihmen Reservoir system. A network flow optimization model for analyzing long-term supply demand is employed to generate the input-output patterns. Findings show superior performance of the FIDs model in contrast with C5.0, NDT and current reservoir operating rules.

  3. Towards closed-loop deep brain stimulation: decision tree-based essential tremor patient's state classifier and tremor reappearance predictor.

    Science.gov (United States)

    Shukla, Pitamber; Basu, Ishita; Tuninetti, Daniela

    2014-01-01

    Deep Brain Stimulation (DBS) is a surgical procedure to treat some progressive neurological movement disorders, such as Essential Tremor (ET), in an advanced stage. Current FDA-approved DBS systems operate open-loop, i.e., their parameters are unchanged over time. This work develops a Decision Tree (DT) based algorithm that, by using non-invasively measured surface EMG and accelerometer signals as inputs during DBS-OFF periods, classifies the ET patient's state and then predicts when tremor is about to reappear, at which point DBS is turned ON again for a fixed amount of time. The proposed algorithm achieves an overall accuracy of 93.3% and sensitivity of 97.4%, along with 2.9% false alarm rate. Also, the ratio between predicted tremor delay and the actual detected tremor delay is about 0.93, indicating that tremor prediction is very close to the instant where tremor actually reappeared.

  4. Decision tree methods:applicaitons for classiifcaiton and prediciton

    Institute of Scientific and Technical Information of China (English)

    Yan-yan SONG; Ying LU

    2015-01-01

    Summary:Decision tree methodology is a commonly used data mining method for establishing classiifcaiton systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can effciently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validaiton datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the opitmal ifnal model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  5. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  6. Application of portfolio theory in decision tree analysis.

    Science.gov (United States)

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation.

  7. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    Science.gov (United States)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the

  8. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India.

    Science.gov (United States)

    Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K

    2013-01-01

    The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index.

  9. 基于SVM与DT的核电装备制造业供应风险组合预测模型%Combined Prediction Model for Supply Risk in Nuclear Power Equipment Manufacturing Industry Based on Support Vector Machine and Decision Tree

    Institute of Scientific and Technical Information of China (English)

    石春生; 孟大鹏

    2011-01-01

    基于对核电装备制造业供应风险的识别,确立风险预测的指标体系;对国内3家重点核电装备制造企业及其60家供应商进行问卷调查及深度访谈,运用支持向量机与决策树组合的方法建立供应风险的预测模型.实证研究表明,组合预测模型对供应风险预测的精确性优于单一方法的模型,证明了该预测体系的可行性与可靠性,为核电装备制造业供应风险的管理提供了一种对供应商进行考评、测量供应风险度的方法.%The prediction index for supply risk is developed based on the factor identifying of nuclear equipment manufacturing industry. The supply risk prediction model is established with the method of support vector machine and decision tree, based on the investigation on 3 important nuclear power equipment manufacturing enterprises and 60 suppliers. Final case study demonstrates that the combination model is better than the single prediction model, and demonstrates the feasibility and reliability of this model, which provides a method to evaluate the suppliers and measure the supply risk.

  10. Efficient Prediction of Surface Roughness Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Manikant Kumar

    2016-12-01

    Full Text Available Surface roughness is a parameter which determines the quality of machined product. Now a days the general manufacturing problem can be described as the attainment of a predefined product quality with given equipment, cost and time constraints. So in recent years, a lot of extensive research work has been carried out for achieving predefined surface quality of machined product to eliminate wastage of over machining. Response surface methodology is used initially for prediction of surface roughness of machined part. After the introduction of artificial intelligent techniques many predictive model based on AI was developed by researchers because artificial intelligence technique is compatible with computer system and various microcontrollers. Researchers used fuzzy logic, artificial neural network, adaptive neuro-fuzzy inference system, genetic algorithm to develop predictive model for predicting surface roughness of different materials. Many researchers have developed ANN based predictive model because ANN outperforms other data mining techniques in certain scenarios like robustness and high learning accuracy of neural network. In this research work a new predictive model is proposed which is based on Decision tree. ANN and ANFIS are known as black box model in which only outcome of these predictive models are comprehensible but the same doesn’t hold true for understanding the internal operations. Decision tree is known as white box model because it provides a clear view of what is happening inside the model in the view of tree like structure. As use of decision tree held in the prediction of cancer that means it is very efficient method for prediction. At the end of this research work comparison of results obtained by ANN based model and Decision tree model will be carried out and a prediction methodology for roughness is introduced using decision tree along with ANN

  11. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations.

    Science.gov (United States)

    Soner Yorgun, M; Rood, Richard B

    2016-12-01

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smooth topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.

  12. Method for Walking Gait Identification in a Lower Extremity Exoskeleton based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors’ information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person’s motion.

  13. A Com-Gis Based Decision Tree Model Inagricultural Application

    Science.gov (United States)

    Cheng, Wei; Wang, Ke; Zhang, Xiuying

    The problem of agricultural soil pollution by heavy metals has been receiving an increasing attention in the last few decades. Geostatistics module in ArcGIS, could not however efficiently simulate the spatial distribution of heavy metals with satisfied accuracy when the spatial autocorrelation of the study area severely destroyed by human activities. In this study, the classificationand regression tree (CART) has been integrated into ArcGIS using ArcObjects and Visual Basic for Application (VBA) to predict the spatial distribution of soil heavy metals contents in the area severely polluted. This is a great improvement comparing with ordinary Kriging method in ArcGIS. The integrated approach allows for relatively easy, fast, and cost-effective estimation of spatially distributed soil heavy metals pollution.

  14. Genetic Program Based Data Mining of Fuzzy Decision Trees and Methods of Improving Convergence and Reducing Bloat

    Science.gov (United States)

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm...that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing...Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  15. Building Customers` Credit Scoring Models with Combination of Feature Selection and Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Zahra Davoodabadi

    Full Text Available Today`s financial transactions have been increased through banks and financial institutions. Therefore, credit scoring is a critical task to forecast the customers’ credit. We have created 9 different models for the credit scoring by combining three metho ...

  16. A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree

    Directory of Open Access Journals (Sweden)

    Suresh Ramakrishnan

    2015-04-01

    Full Text Available The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector indicators. The results show significant relationship between probability of default and sector indicators. The results of this study may improve the default prediction models performance and reduce the costs of risk management.

  17. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  18. New Explorations for Decision Trees

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Traditionally, the decision tree method is defined and used for finding the optimal solution of a Bayesian decision problem. And it is difficult to use the decision tree method to find the sub-optimal solution, not to mention to rank alternatives. This paper discusses how to use the decision tree method for the alternative selecting and ranking.A practical case study is given to illustrate the applicability.

  19. Maximal standard dose of parenteral iron for hemodialysis patients: an MRI-based decision tree learning analysis.

    Directory of Open Access Journals (Sweden)

    Guy Rostoker

    Full Text Available Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin, in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%. The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4 with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.

  20. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard

    2013-01-01

    the adaptations of individual fishermen to resource availability dynamics, increasing fuel prices, changes in regulations, and the consequences of socioeconomic external pressures on harvested stocks. A new methodology is described here to obtain quantitative information on the fishermen’s micro-scale decisions...... integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy...... hypothetical conditions influencing their trip decisions, covering the duration of fishing time, choice of fishing ground(s), when to stop fishing and return to port, and the choice of the port for landing. Fleet-based energy and economy efficiency are linked to the decision (choice) dynamics. Larger fuel...

  1. Tailored approach in inguinal hernia repair – Decision tree based on the guidelines

    Directory of Open Access Journals (Sweden)

    Ferdinand eKöckerling

    2014-06-01

    Full Text Available The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society and the European Association of Endoscopic Surgery. 82 % of experienced hernia surgeons use the tailored approach, the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anaesthesia possible, recurrence and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarized the recommendations of the various guidelines and to transfer them into a practical dicision tree for the daily work of surgeons performing inguinal hernia repair.

  2. Modelling alcohol consumption during adolescence using zero inflated negative binomial and decision trees

    Directory of Open Access Journals (Sweden)

    Alfonso Palmer

    2010-07-01

    Full Text Available Alcohol is currently the most consumed substance among the Spanish adolescent population. Some of the variables that bear an influence on this consumption include ease of access, use of alcohol by friends and some personality factors. The aim of this study was to analyze and quantify the predictive value of these variables specifically on alcohol consumption in the adolescent population. The useful sample was made up of 6,145 adolescents (49.8% boys and 50.2% girls with a mean age of 15.4 years (SE= 1.2. The data were analyzed using the statistical model for a count variable and Data Mining techniques. The results show the influence of ease of access, alcohol consumption by the group of friends, and certain personality factors on alcohol intake, allowing us to quantify the intensity of this influence according to age and gender. Knowing these factors is the starting point in elaborating specific preventive actions against alcohol consumption.

  3. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    Directory of Open Access Journals (Sweden)

    Wided Khiari

    2013-09-01

    Full Text Available This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through the nature (type of information disclosed. Applying the decision tree method, the obtained Tree diagrams provide ways to know the characteristics of a particular firm regardless of its level of disclosure. Obtained results show that the characteristics of corporate governance to achieve good quality of disclosure are not unique for all firms. These structures are not necessarily all of the recommendations of best practices, but converge towards the best combination. Indeed, in practice, there are companies which have a good quality of disclosure but are not well governed. However, we hope that by improving their governance system their level of disclosure may be better. These findings show, in a general way, a convergence towards the standards of corporate governance with a few exceptions related to the specificity of Tunisian listed firms and show the need for the adoption of a code for each context. These findings shed the light on corporate governance features that enhance incentives for good disclosure. It allows identifying, for each firm and in any date, corporate governance determinants of disclosure quality. More specifically, and all being equal, obtained tree makes a rule of decision for the company to know the level of disclosure based on certain characteristics of the governance strategy adopted by the latter.

  4. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. Khader

    2012-12-01

    Full Text Available Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i ignore the health risk of nitrate contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  5. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. I. Khader

    2013-05-01

    Full Text Available Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i ignore the health risk of nitrate-contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water

  6. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  7. A similarity study between the query mass and retrieved masses using decision tree content-based image retrieval (DTCBIR) CADx system for characterization of ultrasound breast mass images

    Science.gov (United States)

    Cho, Hyun-Chong; Hadjiiski, Lubomir; Chan, Heang-Ping; Sahiner, Berkman; Helvie, Mark; Paramagul, Chintana; Nees, Alexis V.

    2012-03-01

    We are developing a Decision Tree Content-Based Image Retrieval (DTCBIR) CADx scheme to assist radiologists in characterization of breast masses on ultrasound (US) images. Three DTCBIR configurations, including decision tree with boosting (DTb), decision tree with full leaf features (DTL), and decision tree with selected leaf features (DTLs) were compared. For DTb, the features of a query mass were combined first into a merged feature score and then masses with similar scores were retrieved. For DTL and DTLs, similar masses were retrieved based on the Euclidean distance between the feature vector of the query and those of the selected references. For each DTCBIR configuration, we investigated the use of the full feature set and the subset of features selected by the stepwise linear discriminant analysis (LDA) and simplex optimization method, resulting in six retrieval methods. Among the six methods, we selected five, DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, for the observer study. For a query mass, three most similar masses were retrieved with each method and were presented to the radiologists in random order. Three MQSA radiologists rated the similarity between the query mass and the computer-retrieved masses using a ninepoint similarity scale (1=very dissimilar, 9=very similar). For DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, the average Az values were 0.90+/-0.03, 0.85+/-0.04, 0.87+/-0.04, 0.79+/-0.05 and 0.71+/-0.06, respectively, and the average similarity ratings were 5.00, 5.41, 4.96, 5.33 and 5.13, respectively. Although the DTb measures had the best classification performance among the DTCBIRs studied, and DTLs had the worst performance, DTLs-full obtained higher similarity ratings than the DTb measures.

  8. Establishment of the Associated Model between Turbid Phlegm Syndrome and Clinical Indicators in the Patients of Diabetes Type 2 Based on Decision Tree Method%基于决策树方法的2型糖尿病患者痰浊证与临床指标关联模式的建立

    Institute of Scientific and Technical Information of China (English)

    赵灵燕; 毕力夫; 张亚军; 陈建新; 赵慧辉; 戴军有; 王伟

    2014-01-01

    目的:采用决策树的数据挖掘方法建立2型糖尿病患者痰浊证与临床常规检测指标间的关联模式。方法采用多中心临床流行病学调查方法,在全国5家三级甲等医院共收集249例合格2型糖尿病病例,综合分析基本信息、中医四诊信息、临床常规检测指标。在t检验、非参数检验、Pearson相关分析基础上,进一步采用决策树的数据挖掘方法建立痰浊证与临床常规检测指标间的关联模式。结果249例患者中有106例为痰浊证,占42.57%。以尿素氮、白细胞、平均红细胞体积、超敏C反应蛋白、红细胞、甲状腺素6个核心指标建立了痰浊证决策树模型,10倍交叉验证得到模型的灵敏度为75.47%、特异度为76.22%,检测总正确率为75.90%。结论决策树模型可以清晰、直观的进行2型糖尿病患者痰浊证的判断,在证候客观化研究中显示了一定的优势。%Objective To establish the associated model between turbid phlegm syndrome and clini-cal routine indicators in the patients of diabetes type 2,using data-mining method of decision tree.Methods The multi-central clinical epidemiological investigation was adopted.Two hundred and forty-nine cases of diabetes type 2 were collected from 5 Three-A hospitals.The basic information,the information of four di-agnostic methods of TCM and clinical routine indicators were analyzed comprehensively.On the basis of the t test,nonparametric test and Pearson correlation analysis,the data-mining method of decision tree was adopt-ed further to set up the association model between turbid phlegm syndrome and clinical indicators.Results Of 249 cases,1 06 cases(42.57%)were differentiated as turbid phlegm syndrome.Six core indicators inclu-ding urea nitrogen,white blood cells,average red blood cell volume,hypersensitive C-reactive protein,eryth-rocyte and thyroxin were used to establish decision tree model of turbid phlegm syndrome

  9. CART Decision Tree Classifier Based on Multi-feature of MODIS Data%基于MODIS影像多特征的CART决策树分类

    Institute of Scientific and Technical Information of China (English)

    张会; 闫金凤

    2013-01-01

    以山东省为研究区域,利用2009年9月MODIS的8d合成波段反射率产品MOD09,选择特征变量植被指数(NDVI、EVI)、NDWI、NDMI、NDSI及辅助信息DEM,通过选取其中的影像特征组合来确定分类方案,构建各波段组合的CART决策树,对MODIS影像进行分类,得到CART决策树的最优波段组合.结果表明,特征变量DEM、NDVI、EVI对分类结果贡献较大;将CART决策树的分类结果与其相对应的最大似然分类结果进行比较可知,基于影像多特征的CART决策树分类方法能明显提高分类精度.%Taking Shandong Province as the study area, we chose composite albedo MODIS products MODIS09Q1 (Bl~B2 band in September 2009, 250 m resolution)- MODIS09A1 (B3~B7 band, 500 m resolution) for one period of 8-day, characteristics variables vegetation index (NDVI, EVI,)NDWI, NDSI, and auxiliary information DEM by selecting a combination of image features to determine the classification schemes. The CART decision tree was built for each kind of band combination to classify MODIS images. The optimum band combination of the CART decision tree was composed of the bands of B1~B7,DEM,NDVI,NDMI and Feature variables DEM, NDVI, EVI make a greater contribution to classification results. Comparing CART decision tree classification results with their corresponding maximum likelihood classification results, it show that the CART decision tree classification based on image features can significantly improve the classification accuracy.

  10. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  11. A new decision tree learning algorithm

    Institute of Scientific and Technical Information of China (English)

    FANG Yong; QI Fei-hu

    2005-01-01

    In order to improve the generalization ability of binary decision trees, a new learning algorithm, the MMDT algorithm, is presented. Based on statistical learning theory the generalization performance of binary decision trees is analyzed, and the assessment rule is proposed. Under the direction of the assessment rule, the MMDT algorithm is implemented. The algorithm maps training examples from an original space to a high dimension featurespace, and constructs a decision tree in it. In the feature space, a new decision node splitting criterion, the max-min rule, is used, and the margin of each decision node is maximized using a support vector machine, to improve the generalization performance. Experimental results show that the new learning algorithm is much superior to others such as C4. 5 and OC1.

  12. Application of decision trees in credit scoring

    Directory of Open Access Journals (Sweden)

    Ljiljanka Kvesić

    2013-12-01

    Full Text Available Banks are particularly exposed to credit risk due to the nature of their operations. Inadequate assessment of the borrower directly causes losses. The financial crisis the global economy is still going through has clearly shown what kind of problems can arise from an inadequate credit policy. Thus, the primary task of bank managers is to minimise credit risk. Credit scoring models were developed to support managers in assessing the creditworthiness of borrowers. This paper presents the decision tree based on exhaustive CHAID algorithm as one such model. Since the application of credit scoring models has not been adequately explored in the Croatian banking theory and practice, this paper aims not only to determine the characteristics that are crucial for predicting default, but also to highlight the importance of a quantitative approach in assessing the creditworthiness of borrowers.

  13. Support vector machine-based decision tree for snow cover extraction in mountain areas using high spatial resolution remote sensing image

    Science.gov (United States)

    Zhu, Liujun; Xiao, Pengfeng; Feng, Xuezhi; Zhang, Xueliang; Wang, Zuo; Jiang, Luyuan

    2014-01-01

    Snow cover extraction in mountain areas is a complex task, especially from high spatial resolution remote sensing (HSRRS) data. The influence of mountain shadows in HSRRS is severe and normalized difference snow index-based snow cover extraction methods are inaccessible. A decision tree building method for snow cover extraction (DTSE) integrated with an efficiency feature selection algorithm is proposed. The severe influence of terrain shadows is eliminated by extracting snow in sunlight and snow in shadow separately in different nodes. In the feature selection algorithm, deviation of fuzzy grade matrix is proposed as a class-specific criterion which improves the efficiency and robustness of the selected feature set, thus making the snow cover extraction accurate. Two experiments are carried out based on ZY-3 image of two regions (regions A and B) located in Tianshan Mountains, China. The experiment on region A achieves an adequate accuracy demonstrating the robustness of the DTSE building method. The experiment on region B shows that a general DTSE model achieves an unsatisfied accuracy for snow in shadow and DTSE rebuilding evidently improves the performance, thus providing an accurate and fast way to extract snow cover in mountain areas.

  14. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    Science.gov (United States)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  15. Boosted Decision Trees for Lithiasis Type Identification

    Directory of Open Access Journals (Sweden)

    Boutalbi Rafika

    2015-06-01

    Full Text Available Several urologic studies showed that it was important to determine the lithiasis types, in order to limit the recurrence residive risk and the renal function deterioration. The difficult problem posed by urologists for classifying urolithiasis is due to the large number of parameters (components, age, gender, background ... taking part in the classification, and hence the probable etiology determination. There exist 6 types of urinary lithiasis which are distinguished according to their compositions (chemical components with given proportions, their etiologies and patient profile. This work presents models based on Boosted decision trees results, and which were compared according to their error rates and the runtime. The principal objectives of this work are intended to facilitate the urinary lithiasis classification, to reduce the classification runtime and an epidemiologic interest. The experimental results showed that the method is effective and encouraging for the lithiasis type identification.

  16. Remote Sensing Image Classification Based on Decision Tree in the Karst Rocky Desertification Areas: A Case Study of Kaizuo Township

    Institute of Scientific and Technical Information of China (English)

    Shuyong; MA; Xinglei; ZHU; Yulun; AN

    2014-01-01

    Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.

  17. Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem

    Directory of Open Access Journals (Sweden)

    Bryant Stephen H

    2008-09-01

    Full Text Available Abstract Background Recent advances in high-throughput screening (HTS techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. Results In this study, Decision Trees (DT based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system http://pubchem.ncbi.nlm.nih.gov. The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV sensitivity, specificity and Matthews Correlation Coefficient (MCC for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. Conclusion Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.

  18. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  19. 基于粗糙变精度的食品安全决策树研究%Research on Decision Tree for Food Safety Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    鄂旭; 任骏原; 毕嘉娜; 沈德海

    2014-01-01

    Food safety decision is an important content of food safety research. Based on variable precision rough sets model,a method of building decision tree with rules that have definite confidence is proposed for food safety analysis. It is an improvement for decision tree inducing approach presented in traditional methods. Present a new algorithm for constructing decision tree with variable precision weighted mean roughness as the criteria for selecting attribute. The new algorithm used variable precision approximate accuracy instead the approxi-mate accuracy. Noisy data of training sets are considered enough. Limited inconsistency is allowed to existed examples of the positive re-gions. So the decision tree is simplified and its extensive ability is improved and more comprehensible. Experiments show that the algo-rithm is feasible and effective.%食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。

  20. 基于LBP和SVM决策树的人脸表情识别%Facial Expression Recognition Based on LBP and SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    李扬; 郭海礁

    2014-01-01

    为了提高人脸表情识别的识别率,提出一种LBP和SVM决策树相结合的人脸表情识别算法。首先利用LBP算法将人脸表情图像转换为LBP特征谱,然后将LBP特征谱转换成LBP直方图特征序列,最后通过SVM决策树算法完成人脸表情的分类和识别,并且在JAFFE人脸表情库的识别中证明该算法的有效性。%In order to improve the recognition rate of facial expression, proposes a facial expression recognition algorithm based on a LBP and SVM decision tree. First facial expression image is converted to LBP characteristic spectrum using LBP algorithm, and then the LBP character-istic spectrum into LBP histogram feature sequence, finally completes the classification and recognition of facial expression by SVM deci-sion tree algorithm, and proves the effectiveness of the proposed method in the recognition of facial expression database in JAFFE.

  1. 基于决策树的IDS报警数据融合技术研究%RESEARCH ON IDS SECURITY DATA FUSION TECHNOLOGY BASED ON DECISION TREE

    Institute of Scientific and Technical Information of China (English)

    黄正兴; 苏旸

    2013-01-01

    针对当前多个IDS的相互协作带来的海量报警数据,提出一种基于决策树的IDS报警数据融合技术,介绍决策树及其构造算法ID3,并利用决策树改进IDS报警数据融合中的属性匹配融合技术,提高了融合效率,融合后的报警数据降低了漏警率。实验证明了该方法的有效性。%In order to reduce the amount of security data produced by the collaboration of a lot of intrusion detection systems , the paper puts forward an IDS security data fusion technology based on decision tree and introduces both itself and its building arithmetic called ID 3. Then it adopts decision tree to ameliorate the attribute matching fusion technology in IDS security data fusion ,so that its fusion efficiency is in-creased and its missing rate of fused security data is decreased .Experiment confirms the validity of the method .

  2. A Customer Churn Alarm Model based on the C5 .0 Decision Tree-Taking the Postal Short Message as an Example%一种基于 C5.0决策树的客户流失预测模型研究

    Institute of Scientific and Technical Information of China (English)

    张宇; 张之明

    2015-01-01

    Customer churn is an outstanding problem in the enterprise management . Avoiding the customer churn ,trying to maintain and detain the customers has already become an important project in the management and development of the enterprise .The C5 .0 decision tree algorithm is used to build a customer churn alarm model and the model is used in the short message service in Chinese postal enterprise for an empirical study .The study result shows that the model can provide a high hit rate and coverage rate ,and has a good early warning function .It can help the enterprise timely find the potential losing customers and reduce farthest the customer churn .%客户流失是企业面临的一项突出问题。防止客户流失、尽力维系与挽留客户已成为企业经营与发展的一项重要课题。本文利用C5.0决策树算法建立了一种客户流失的预测模型,并利用中国邮政短信服务的400多万条实际业务数据,对模型的有效性进行了实证研究。研究结果表明,该模型提供了较高的命中率和覆盖率,具有良好的预警功能,可帮助企业及时发现有可能流失的客户,最大程度减少客户流失。

  3. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  4. Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System

    Directory of Open Access Journals (Sweden)

    Jashan Koshal

    2012-08-01

    Full Text Available Main reason for the attack being introduced to the system is because of popularity of the internet. Information security has now become a vital subject. Hence, there is an immediate need to recognize and detect the attacks. Intrusion Detection is defined as a method of diagnosing the attack and the sign of malicious activity in a computer network by evaluating the system continuously. The software that performs such task can be defined as Intrusion Detection Systems (IDS. System developed with the individual algorithms like classification, neural networks, clustering etc. gives good detection rate and less false alarm rate. Recent studies show that the cascading of multiple algorithm yields much better performance than the system developed with the single algorithm. Intrusion detection systems that uses single algorithm, the accuracy and detection rate were not up to mark. Rise in the false alarm rate was also encountered. Cascading of algorithm is performed to solve this problem. This paper represents two hybrid algorithms for developing the intrusion detection system. C4.5 decision tree and Support Vector Machine (SVM are combined to maximize the accuracy, which is the advantage of C4.5 and diminish the wrong alarm rate which is the advantage of SVM. Results show the increase in the accuracy and detection rate and less false alarm rate.

  5. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  6. 基于CART决策树方法的遥感影像分类%Remote Sensing Image Classification Based on CART Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    齐乐; 岳彩荣

    2011-01-01

    Taking Shangri-La County, Yunnan Province as the study area, this paper built a decision tree classification method based on remote sensing images.And Regression Tree.Using the methods of principal component extraction, vegetation information extraction, texture information extraction, combined with the main feature type test area of training samples, and taking Landsat 5 TM image date ,DEM date, software ENVI as platform, the remote sensing image classification has been done.The comparison results which with the maximum likelihood classification show that CART-based remote sensing image classification accuracy of decision tree is better than maximum likelihood classification, has a better effect of classification.%以云南省香格里拉县为研究区域,构建一种基于CART遥感影像的决策树分类方法.对遥感影像采用主成分提取、植被信息提取、纹理信息提取等方法,并结合试验区主要地物类型训练样本,采用Landsat 5 TM影像数据、DEM数据以及遥感处理软件ENVI为平台进行影像分类,并将结果与最大似然分类结果作比较.结果表明,基于CART遥感影像决策树分类精度优于最大似然分类,有较好的分类效果.

  7. 基于决策树的双边交易购电策略分析%Energy Procurement Strategy via Bilateral Contracts Based on Decision Tree

    Institute of Scientific and Technical Information of China (English)

    胡乐宜; 杨立兵; 宋依群; 刘福斌; 洪元瑞

    2012-01-01

    提出了用决策树选择交易方案的思路,制定了一个逻辑清晰的决策过程,用算例解释分析了此方法的应用,并进行了灵敏度分析。算例证明,购电交易的单级决策与多级决策问题都可用决策树法解决,靠前的决策因素更重要,各方案价格差的大小对决策结果有影响。%In regional electricity markets,when an electric power company purchases electricity through bilateral transactions among provinces,it is necessary to select one proper trading scheme from the option(s) the seller provides.This selection should be based on the comprehensive consideration of all the interdependent decision factors such as economic benefit,reliability,energy conservation policy and satisfaction of other entities in the electricity markets.In this paper,the idea of selecting a trading scheme based on decision tree is proposed for the first time,and a logical decision process is formulated.Then some numerical examples are illustrated to analyze the method application,and a set of sensitivity analysis is performed as well.It is proved that both single level decision problems and multi-level decision problems can be solved by decision trees,in which the front factors are more important than the latter factors.Price difference among the trading schemes will also influence the decision results.

  8. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision tree approach for predicting smokers'quit intentions using the data from the International Tobacco Control Four Country Survey. Three rule-based classification models are generated from three data sets using attributes in relation to demographics, warning labels, and smokers' beliefs. Both demographic attributes and warning label attributes are important in predicting smokers' quit intentions. The model's ability to predict smokers' quit intentions is enhanced, if the attributes regarding smokers' internal motivation and beliefs about quitting are included.

  9. Traffic Accident Analysis Using Decision Trees and Neural Networks

    OpenAIRE

    Chong, Miao M.; Abraham, Ajith; Paprzycki, Marcin

    2004-01-01

    The costs of fatalities and injuries due to traffic accident have a great impact on society. This paper presents our research to model the severity of injury resulting from traffic accidents using artificial neural networks and decision trees. We have applied them to an actual data set obtained from the National Automotive Sampling System (NASS) General Estimates System (GES). Experiment results reveal that in all the cases the decision tree outperforms the neural network. Our research analys...

  10. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    G KISHOR KUMAR; P VISWANATH; A ANANDA RAO

    2016-03-01

    For classification, decision trees have become very popular because of its simplicity, interpret-ability and good performance. To induce a decision tree classifier for data having continuous valued attributes, the most common approach is, split the continuous attribute range into a hard (crisp) partition having two or more blocks, using one or several crisp (sharp) cut points. But, this can make the resulting decision tree, very sensitive to noise.An existing solution to this problem is to split the continuous attribute into a fuzzy partition (soft partition) using soft or fuzzy cut points which is based on fuzzy set theory and to use fuzzy decisions at nodes of the tree. Theseare called soft decision trees in the literature which are shown to perform better than conventional decision trees, especially in the presence of noise. Current paper, first proposes to use an ensemble of soft decision trees forrobust classification where the attribute, fuzzy cut point, etc. parameters are chosen randomly from a probability distribution of fuzzy information gain for various attributes and for their various cut points. Further, the paperproposes to use probability based information gain to achieve better results. The effectiveness of the proposed method is shown by experimental studies carried out using three standard data sets. It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  11. 基于决策树技术分析动态图形数据的研究与实现%Research and implementation of dynamic graph data based on decision trees technology

    Institute of Scientific and Technical Information of China (English)

    雷炜; 叶东毅

    2011-01-01

    针对传统动态数据分析方法(如时间序列分析)存在对动态图分析较繁琐的问题,研究基于决策树技术进行动态图形数据分析的方法和过程.利用采集的心电图数据和SLIQ算法加以实现,所得模型准确率约为73%.%Traditional dynamic data analysis approaches such as time series analysis turn out to have shortcoming in the analysis of dynamic graphs. In this paper, a method for dynamic graph data analysis based on decision tree technique was researched and implemented by using SLIQ algorithm to analyze real electrocardiogram data. The experiment results show that the obtained model is accurate to about 73%.

  12. Detection and Extraction of Videos using Decision Trees

    Directory of Open Access Journals (Sweden)

    Sk.Abdul Nabi

    2011-12-01

    Full Text Available This paper addresses a new multimedia data mining framework for the extraction of events in videos by using decision tree logic. The aim of our DEVDT (Detection and Extraction of Videos using Decision Trees system is for improving the indexing and retrieval of multimedia information. The extracted events can be used to index the videos. In this system we have considered C4.5 Decision tree algorithm [3] which is used for managing both continuous and discrete attributes. In this process, firstly we have adopted an advanced video event detection method to produce event boundaries and some important visual features. This rich multi-modal feature set is filtered by a pre-processing step to clean the noise as well as to reduce the irrelevant data. This will improve the performance of both Precision and Recall. After producing the cleaned data, it will be mined and classified by using a decision tree model. The learning and classification steps of this Decision tree are simple and fast. The Decision Tree has good accuracy. Subsequently, by using our system we will reach maximum Precision and Recall i.e. we will extract pure video events effectively and proficiently.

  13. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  14. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  15. 基于决策树的戈壁信息提取研究%Gobi information extraction based on decision tree classification method

    Institute of Scientific and Technical Information of China (English)

    冯益明; 智长贵; 姚爱冬

    2013-01-01

    Gobi is one of the main landscape types of earth' s surface in the arid region of northwestern parts of China, with the total area of 458 000-757 000 km2, accounting for the 4.8%-7.9% of China's total land area. The gobi holds abundant natural resources such as minerals, wind energy and solar power. Meanwhile, many modern cities and towns and some important traffic routes were also constructed on the gobi region. The gobi region plays an important role in the construction of western economy. Therefore, it is important to launch the gobi research under current social and economic conditions, and accurately revealing the distribution and area of gobi is the base and premise of launching the gobi research. At present, it is difficult to do fieldwork due to the execrable natural conditions and the sparse dweller in the gobi region, which leads to the scarcity of research documents on the situation, distribution, type classification, transformation and utilization of gobi. The studied region of this paper is a typical gobi distribution region, locating in Ejina County in Inner Mongolia, China, and its climatic characteristics include lack of rain, more evaporation, full sunshine, large temperature difference and frequent windy sand weather. Using Remote Sensing imageries Landsat TM5 and TM7 of plant growth season of 2005-2010, the DEM with 30 m spatial resolution, administrative map, present land use map, field investigation data and related documents as the basic data resource. Firstly, the non-gobi distribution regions were extracted in GIS software by analyzing DEM. Then, based on the analysis of spectral characteristics of difference typical ground objects, the information extraction model of Decision Tree based on knowledge was constructed to classify the remote sensing imageries, and eroded gobi and cumulated gobi were relatively accurately separated. The general accuracy of the extracted gobi information reached 91.57%. There were few materials in China on using

  16. Short-Time Fourier Transform and Decision Tree-Based Pattern Recognition for Gas Identification Using Temperature Modulated Microhotplate Gas Sensors

    Directory of Open Access Journals (Sweden)

    Aixiang He

    2016-01-01

    Full Text Available Because the sensor response is dependent on its operating temperature, modulated temperature operation is usually applied in gas sensors for the identification of different gases. In this paper, the modulated operating temperature of microhotplate gas sensors combined with a feature extraction method based on Short-Time Fourier Transform (STFT is introduced. Because the gas concentration in the ambient air usually has high fluctuation, STFT is applied to extract transient features from time-frequency domain, and the relationship between the STFT spectrum and sensor response is further explored. Because of the low thermal time constant, the sufficient discriminatory information of different gases is preserved in the envelope of the response curve. Feature information tends to be contained in the lower frequencies, but not at higher frequencies. Therefore, features are extracted from the STFT amplitude values at the frequencies ranging from 0 Hz to the fundamental frequency to accomplish the identification task. These lower frequency features are extracted and further processed by decision tree-based pattern recognition. The proposed method shows high classification capability by the analysis of different concentration of carbon monoxide, methane, and ethanol.

  17. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  18. Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks

    Energy Technology Data Exchange (ETDEWEB)

    Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences

    2007-09-15

    This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)

  19. Influence diagrams and decision trees for severe accident management

    Energy Technology Data Exchange (ETDEWEB)

    Goetz, W.W.J.

    1996-09-01

    A review of relevant methodologies based on Influence Diagrams (IDs), Decision Trees (DTs), and Containment Event Trees (CETs) was conducted to assess the practicality of these methods for the selection of effective strategies for Severe Accident Management (SAM). The review included an evaluation of some software packages for these methods. The emphasis was on possible pitfalls of using IDs and on practical aspects, the latter by performance of a case study that was based on an existing Level 2 Probabilistic Safety Assessment (PSA). The study showed that the use of a combined ID/DT model has advantages over CET models, in particular when conservatisms in the Level 2 PSA have been identified and replaced by fair assessments of the uncertainties involved. It is recommended to use ID/DT models complementary to CET models. (orig.).

  20. Quantum Computation and Decision Trees

    CERN Document Server

    Farhi, E; Farhi, Edward; Gutmann, Sam

    1998-01-01

    Many interesting computational problems can be reformulated in terms of decision trees. A natural classical algorithm is to then run a random walk on the tree, starting at the root, to see if the tree contains a node n levels from the root. We devise a quantum mechanical algorithm that evolves a state, initially localized at the root, through the tree. We prove that if the classical strategy succeeds in reaching level n in time polynomial in n, then so does the quantum algorithm. Moreover, we find examples of trees for which the classical algorithm requires time exponential in n, but for which the quantum algorithm succeeds in polynomial time. The examples we have so far, however, could also be solved in polynomial time by different classical algorithms.

  1. Automatic sleep staging using state machine-controlled decision trees.

    Science.gov (United States)

    Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2015-01-01

    Automatic sleep staging from a reduced number of channels is desirable to save time, reduce costs and make sleep monitoring more accessible by providing home-based polysomnography. This paper introduces a novel algorithm for automatic scoring of sleep stages using a combination of small decision trees driven by a state machine. The algorithm uses two channels of EEG for feature extraction and has a state machine that selects a suitable decision tree for classification based on the prevailing sleep stage. Its performance has been evaluated using the complete dataset of 61 recordings from PhysioNet Sleep EDF Expanded database achieving an overall accuracy of 82% and 79% on training and test sets respectively. The algorithm has been developed with a very small number of decision tree nodes that are active at any given time making it suitable for use in resource-constrained wearable systems.

  2. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  3. 基于相似度衡量的决策树自适应迁移%Self-adaptive Transfer for Decision Trees Based on Similarity Metric

    Institute of Scientific and Technical Information of China (English)

    王雪松; 潘杰; 程玉虎; 曹戈

    2013-01-01

    如何解决迁移学习中的负迁移问题并合理把握迁移的时机与方法,是影响迁移学习广泛应用的关键点.针对这个问题,提出一种基于相似度衡量机制的决策树自适应迁移方法(Self-adaptive transfer for decision trees based on a similarity metric,STDT).首先,根据源任务数据集是否允许访问,自适应地采用成分预测概率或路径预测概率对决策树间的相似性进行判定,其亲和系数作为量化衡量关联任务相似程度的依据.然后,根据多源判定条件确定是否采用多源集成迁移,并将相似度归一化后依次分配给待迁移源决策树作为迁移权值.最后,对源决策树进行集成迁移以辅助目标任务实现决策.基于UCI机器学习库的仿真结果说明,与多源迁移加权求和算法(Weighted sum rule,WSR)和MS-TrAdaBoost相比,STDT能够在保证决策精度的前提下实现更为快速的迁移.

  4. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  5. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  6. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    Directory of Open Access Journals (Sweden)

    Bruno Carneiro da Rocha

    2010-10-01

    Full Text Available This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DMmanagement model of data mining in large operational databases logged from internet banktransactions.

  7. R-C4.5决策树模型在高职就业分析中的应用%The Application of R-C4.5 Decision Tree Model in Higher Vocational Employment

    Institute of Scientific and Technical Information of China (English)

    张继美; 桂红兵

    2011-01-01

    Expounds the decision tree classification technology and R-C4.5 decision tree model.In a recent graduates of higher vocational colleges of education personal information,information and employment information data for the research object,experimental data in the data pretreatment,using R-C4.5 decision tree classification technology data mining,dig out the influence the quality of higher vocational graduate employment related factors,for government and schools improve employment of the quality of all kinds of measures and reform provides decision-making basis.%阐述了决策树分类技术和R-C4.5决策树模型。以某高职院校近几届毕业生的个人信息、教育信息和就业信息数据为研究对象,对实验数据进行数据预处理,运用R-C4.5决策树分类技术进行数据挖掘,挖掘出影响高职毕业生就业质量的相关因素,为政府和学校提高就业质量的各类措施和改革提供了决策依据。

  8. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Science.gov (United States)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  9. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Energy Technology Data Exchange (ETDEWEB)

    Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru [Saint Petersburg Electrotechnical University “LETI” (Russian Federation)

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  10. Decision trees for predicting the academic success of students

    Directory of Open Access Journals (Sweden)

    Josip Mesarić

    2016-12-01

    Full Text Available The aim of this paper is to create a model that successfully classifies students into one of two categories, depending on their success at the end of their first academic year, and finding meaningful variables affecting their success. This model is based on information regarding student success in high school and their courses after completing their first year of study, as well as the rank of preferences assigned to the observed faculty, and attempts to classify students into one of the two categories in line with their academic success. Creating a model required collecting data on all undergraduate students enrolled into their second year at the Faculty of Economics, University of Osijek, as well as data on completion of the state exam. These two datasets were combined and used for the model. Several classification algorithms for constructing decision trees were compared and the statistical significance (t-test of the results was analyzed. Finally, the algorithm that produced the highest accuracy was chosen as the most successful algorithm for modeling the academic success of students. The highest classification rate of 79% was produced using the REPTree decision tree algorithm, but the tree was not as successful in classifying both classes. Therefore, the average rate of classification was calculated for two models that gave the highest total rate of classification, where a higher percentage is achieved using the model relying on the algorithm J48. The most significant variables were total points in the state exam, points from high school and points in the Croatian language exam.

  11. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    CERN Document Server

    Farid, Dewan Md; Rahman, Mohammad Zahidur; 10.5121/ijnsa.2010.2202

    2010-01-01

    In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learn...

  12. PREDIKSI CALON MAHASISWA BARU MENGUNAKAN METODE KLASIFIKASI DECISION TREE

    Directory of Open Access Journals (Sweden)

    Mambang

    2015-02-01

    Full Text Available Prior to the organization of health education begin the new school year, then the first step will be carried out selection of new admissions from general secondary education graduates and vocational. In this study, predicting new students to take multiple data attributes. The model is a decision tree classification prediction method to create a tree consisting of a root node, internal nodes and terminal nodes. While the root node and internal nodes are variables / features, the terminal node. Based on the experimental results and evaluations are done, it can be concluded that algorithm C4.5 with 80.39% accuracy obtained Uncertainty, Precision 94.44%, Recall of 75.00 % while the C4.5 algorithm with Information Gain Accuracy Ratio 88.24%, 98.28% Precision, 83.82% Recall.

  13. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  14. Optimization of Stacking Method Based on Cluster Analysis and Decision Tree%基于聚类分析和决策树的堆垛方法优化

    Institute of Scientific and Technical Information of China (English)

    高昊江; 张宜生; 肖田元

    2011-01-01

    准时生产模式下的大型钢铁卷材仓储方法是一个多目标综合优化问题,依靠人工经验的传统方法已不能满足生产需要.由此提出根据生产计划和安全在库系数计算货品在库量的方法,设计聚类分析算法用于货架配置,构造决策树方法解决多目标综合优化问题.实验结果证明,该方法能够提高出库效率和仓储空间利用率,满足安全生产、优质高效、减少浪费的要求.%In Just in Time(JIT) production model, large coils storage study is to resolve a multi-objective optimization problem, and traditional manual methods do not work. A storehouse goods amount calculation method is designed based on production plan and safe stock coefficient. And a cluster analysis algorithm is given for allocation of shelves. A decision tree is structured to solve the multi-objective optimization problem. These methods pass the test of practice, raise the warehouse efficiency and the utilization rate of storehouse, and meet the enterprise requirements, such as safety production, good quality and high efficiency and reduced waste.

  15. Solar and Wind Site Screening Decision Trees

    Science.gov (United States)

    EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.

  16. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad

    2016-06-09

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  17. Decision tree approach to evaluating inactive uranium processing sites for liner requirements

    Energy Technology Data Exchange (ETDEWEB)

    Relyea, J.F.

    1983-03-01

    Recently, concern has been expressed about potential toxic effects of both radon emission and release of toxic elements in leachate from inactive uranium mill tailings piles. Remedial action may be required to meet disposal standards set by the states and the US Environmental Protection Agency (EPA). In some cases, a possible disposal option is the exhumation and reburial (either on site or at a new location) of tailings and reliance on engineered barriers to satisfy the objectives established for remedial actions. Liners under disposal pits are the major engineered barrier for preventing contaminant release to ground and surface water. The purpose of this report is to provide a logical sequence of action, in the form of a decision tree, which could be followed to show whether a selected tailings disposal design meets the objectives for subsurface contaminant release without a liner. This information can be used to determine the need and type of liner for sites exhibiting a potential groundwater problem. The decision tree is based on the capability of hydrologic and mass transport models to predict the movement of water and contaminants with time. The types of modeling capabilities and data needed for those models are described, and the steps required to predict water and contaminant movement are discussed. A demonstration of the decision tree procedure is given to aid the reader in evaluating the need for the adequacy of a liner.

  18. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  19. 基于神经网络与决策树的土壤粗糙度测量%Soil surface roughness measuring method based on neural network and decision tree

    Institute of Scientific and Technical Information of China (English)

    李俐; 王荻; 潘彩霞; 王鹏新

    2015-01-01

    Soil surface roughness is one of the important indices commonly used to describe soil hydrological characteristics and Lambert characteristic. In microwave quantitative remote sensing application, it affects the microwave scattering values and therefore impacts the accuracy of soil moisture retrieved using microwave sensing data. Therefore, measuring soil surface roughness has become one of the research hotspots in the field of microwave remote sensing. Two kinds of techniques are used to calculate soil surface roughness, including contact method, such as the pin meter and profile meter, and non-contact method, such as ultrasonic measurement, laser scanning, three-dimensional photography, infrared measurement and radar measurement method. All these methods need some special device. The development of image processing technology and the popularization of digital camera provide a simple measuring method which only needs a reference whiteboard and a camera. However, the detailed scale information commonly used on the reference whiteboard increases the requirements for data acquisition and data processing. The purpose of this study is to provide a method to obtain the soil surface image with a simplified reference whiteboard and then to measure soil surface roughness in the presence of field environmental noise. Therefore, a simple image acquisition method is introduced and then an image processing method combining the neural network and the decision tree is proposed. The neural network is built to detect image edge points. To reduce the environmental noise effect, the input characteristic parameters of the neural network are selected carefully, which include not only gradient information, but also image direction and neighborhood consistency information. The cutting of the background section on the original image based on image edge detection result improves the computing speed effectively. A decision tree model is introduced to divide image segments into 4 classes

  20. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features

    Directory of Open Access Journals (Sweden)

    D. Mudali

    2015-01-01

    Full Text Available Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson’s disease, multiple system atrophy, and progressive supranuclear palsy compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (fMRI data.

  1. Application of Clustering-based Decision Tree in the Screening of Maize Germplasm%基于聚类的决策树在玉米种质筛选中的应用

    Institute of Scientific and Technical Information of China (English)

    王斌

    2011-01-01

    [Purpose] This paper aims to construct an improved fuzzy decision tree which is based on clustering, and researches into ils application in the screening of maize germplasm. [Method] A new decision tree algorithm based upon clustering is adopted in this paper, which is improved against the defect that traditional decision tree algorithm fails to handle samples of no classes. Meanwhile, the improved algorithm is also applied to the screening of maize varieties. Through the indices as leaf area, plant height, dry weight, potassium (K) utilization and others, maize seeds with strong tolerance of hypokalemic are filtered out. F Result ] The algorithm in the screening of maize germplasm has great applicability and good performance. [ Conclusion ] In the future more efforts should be made to compare improved the performance of fuzzy decision tree based upon clustering with the performance of traditional fuzzy one, and it should be applied into more realistic problems.%[目的]建立一种改进的基于聚类的模糊决策树,并研究其在玉米种质筛选中的应用.[方法]采用一种新型的基于聚类的决策树算法,该算法针对传统的决策树算法不能处理无类别样本的这一不足,进行了改进.同时,将改进算法应用在玉米品种的筛选问题中,通过对叶面积、株高、干重、钾利用率等指标的衡量,筛选出耐低钾性较强的玉米种子.[结果]该算法在玉米种质的筛选上,适用性强且性能较优.[结论]在今后工作中还需进一步验证比较改进的基于聚类的模糊决策树与传统的模糊聚类决策树的性能,并将其应用在更多的实际问题中.

  2. Application of decision-tree technique to assess herd specific risk factors for coliform mastitis in sows

    Directory of Open Access Journals (Sweden)

    Imke Gerjets

    2011-06-01

    Full Text Available The aim of the study was to investigate factors associated with coliform mastitis in sows, determined at herd level, by applying the decision-tree technique. Coliform mastitis represents an economically important disease in sows after farrowing that also affects the health, welfare and performance of the piglets. The decision-tree technique, a data mining method, may be an effective tool for making large datasets accessible and different sow herd information comparable. It is based on the C4.5-algorithm which generates trees in a top-down recursive strategy. The technique can be used to detect weak points in farm management. Two datasets of two farms in Germany, consisting of sow-related parameters, were analysed and compared by decision-tree algorithms. Data were collected over the period of April 2007 to August 2010 from 987 sows (499 CM-positive sows and 488 CM-negative sows and 596 sows (322 CM-positive sows and 274 CM-negative sows, respectively. Depending on the dataset, different graphical trees were built showing relevant factors at the herd level which may lead to coliform mastitis. To our understanding, this is the first time decision-tree modeling was used to assess risk factors for coliform mastitis. Herd specific risk factors for the disease were illustrated what could prove beneficial in disease and herd management.

  3. Virus Detection Algorithm Based on Decision Tree%基于决策树的病毒检测算法磁

    Institute of Scientific and Technical Information of China (English)

    朱俚治

    2015-01-01

    如今病毒的智能性,日益突出。具有当代智能性技术的病毒能够躲避部分杀毒软件的检测。因此有些病毒,在传统检测算法面前是难以被发现。为有效检测出采用了新技术的病毒,使得病毒检测算法具有新的智能性是十分必要的。MMTD算法和决策树算法是两种智能性的算法,该智能性算法在检测病毒上进行应用将有助提高病毒检测算法的智能性。因此根据当病毒检测时的过程中病毒表现出的特性,论文将M M TD算法和决策树算法结合在一起而提出了一种新的病毒检测算法。%Today intelligence viruses have become increasingly prominent .Virus with a contemporary intelligent tech‐nologies can evade detection portion antivirus software .Therefore ,some viruses ,in front of the traditional detection algo‐rithm are difficult to be found .To effectively detect the virus ,using a new technology ,virus detection algorithm with a new intelligence is essential .MMTD algorithms and decision tree algorithms are two intelligent algorithms .The intelligent algo‐rithms for application in the detection of the virus will help to improve virus detection algorithm intelligence .Therefore ,ac‐cording to the time when the process of virus detection virus exhibit characteristics ,this article combines MMTD algorithms and decision tree algorithms together and propose a new virus detection algorithm .

  4. Using Decision Trees for Coreference Resolution

    CERN Document Server

    McCarthy, J F; Carthy, Joseph F. Mc; Lehnert, Wendy G.

    1995-01-01

    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.

  5. Diagnosis of Hepatitis using Decision tree algorithm

    Directory of Open Access Journals (Sweden)

    V.Shankar sowmien

    2016-06-01

    Full Text Available This research paper proposes a prediction system for liver disease using machine learning. Researchers provided various data to identify the causes for Hepatitis. Here, Decision tree method is used to determine the structural information of tissues. The algorithm used to construct the decision tree is C4.5 that concentrates on 19 attributes such as age, sex, steroids, antivirals, spleen, fatigue, malaise, anorexia, liver big, liver firm, spiders, vilirubin, varices, ascites, ALK phosphate, SGOT, albumin, protime, and histology for the diagnosis of the disease. These features helped in determining the abnormalities of the patient which resulted in 85.81% accuracy.

  6. Rule Extraction in Transient Stability Study Using Linear Decision Trees

    Institute of Scientific and Technical Information of China (English)

    SUN Hongbin; WANG Kang; ZHANG Boming; ZHAO Feng

    2011-01-01

    Traditional operation rules depend on human experience, which are relatively fixed and difficult to fulfill the new demand of the modern power grid. In order to formulate suitable and quickly refreshed operation rules, a method of linear decision tree based on support samples is proposed for rule extraction in this paper. The operation rules extracted by this method have advantages of refinement and intelligence, which helps the dispatching center meet the requirement of smart grid construction.

  7. Nerual Networks with Decision Trees for Diagnosis Issues

    Directory of Open Access Journals (Sweden)

    Yahia Kourd

    2013-05-01

    Full Text Available This paper presents a new idea for fault detection and isolation (FDI technique which is applied to industrial system. This technique is bas ed on Neural Networks fault-free and Faulty behaviours Models (NNFMs. NNFMs are used for resid ual generation, while decision tree architecture is used for residual evaluation. The d ecision tree is realized with data collected from the NNFM’s outputs and is used to isolate dete ctable faults depending on computed threshold. Each part of the tree corresponds to spe cific residual. With the decision tree, it becomes possible to take the appropriate decision r egarding the actual process behaviour by evaluating few numbers of residuals. In comparison to usual systematic evaluation of all residuals, the proposed technique requires less com putational effort and can be used for on line diagnosis. An application example is presented to i llustrate and confirm the effectiveness and the accuracy of the proposed approach.

  8. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities

    Science.gov (United States)

    Moon, Mikyung

    2017-01-01

    Objectives The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. Methods The data were extracted from the 2014 National Inpatient Sample (NIS)—data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89*). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. Results The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, “injuries to the hip and thigh” was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. Conclusions These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data. PMID:28261530

  9. Using boosted decision trees for star-galaxy separation

    Science.gov (United States)

    Etayo-Sotos, P.; Sevilla-Noarbe, I.

    2013-05-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDT) to separate stars and galaxies from their catalog characteristics. This application is based on the BDT implementation in the Toolkit for Multivariate Analysis (TMVA) for ROOT, a physics analysis package widely used in high energy physics. The main goal is to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. We explain the basics of decision trees and the training sets used for the cases that we analyze. The improvements are shown using the Sloan Digital Sky Survey Data Release 7. With this method we have reached an efficiency of 99% with a contamination level of less than 0.45%.

  10. Using decision trees to measure activities in people with stroke.

    Science.gov (United States)

    Zhang, Ting; Fulk, George D; Tang, Wenlong; Sazonov, Edward S

    2013-01-01

    Improving community mobility is a common goal for persons with stroke. Measuring daily physical activity is helpful to determine the effectiveness of rehabilitation interventions. In our previous studies, a novel wearable shoe-based sensor system (SmartShoe) was shown to be capable of accurately classify three major postures and activities (sitting, standing, and walking) from individuals with stroke by using Artificial Neural Network (ANN). In this study, we utilized decision tree algorithms to develop individual and group activity classification models for stroke patients. The data was acquired from 12 participants with stroke. For 3-class classification, the average accuracy was 99.1% with individual models and 91.5% with group models. Further, we extended the activities into 8 classes: sitting, standing, walking, cycling, stairs-up, stairs-down, wheel-chair-push, and wheel-chair-propel. The classification accuracy for individual models was 97.9%, and for group model was 80.2%, demonstrating feasibility of multi-class activity recognition by SmartShoe in stroke patients.

  11. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    CERN Document Server

    Elyassami, Sanaa

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  12. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-07-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  13. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    Science.gov (United States)

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-12-28

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  14. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  15. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    OpenAIRE

    J. Höhle

    2014-01-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a ...

  16. Fingerprint Gender Classification using Univariate Decision Tree (J48

    Directory of Open Access Journals (Sweden)

    S. F. Abdullah

    2016-09-01

    Full Text Available Data mining is the process of analyzing data from a different category. This data provide information and data mining will extracts a new knowledge from it and a new useful information is created. Decision tree learning is a method commonly used in data mining. The decision tree is a model of decision that looklike as a tree-like graph with nodes, branches and leaves. Each internal node denotes a test on an attribute and each branch represents the outcome of the test. The leaf node which is the last node will holds a class label. Decision tree classifies the instance and helps in making a prediction of the data used. This study focused on a J48 algorithm for classifying a gender by using fingerprint features. There are four types of features in the fingerprint that is used in this study, which is Ridge Count (RC, Ridge Density (RD, Ridge Thickness to Valley Thickness Ratio (RTVTR and White Lines Count (WLC. Different cases have been determined to be executed with the J48 algorithm and a comparison of the knowledge gain from each test is shown. All the result of this experiment is running using Weka and the result achieve 96.28% for the classification rate.

  17. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Dewan Md. Farid

    2010-04-01

    Full Text Available In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposedalgorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS. We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR andsignificant reduce false positives (FP for different types of network intrusions using limited computational resources

  18. Information Filtering Algorithm of Text Content-based Sensitive Words Decision Tree%基于文本内容的敏感词决策树信息过滤算法

    Institute of Scientific and Technical Information of China (English)

    邓一贵; 伍玉英

    2014-01-01

    随着互联网的高速发展,各种各样的信息资源呈指数级增长,随之出现许多负面影响,需要构建一个安全健康的网络环境。为此,提出针对网页文本内容的敏感信息过滤算法( SWDT-IFA)。该算法不依赖词典与分词,通过构建敏感词决策树,将网页文本内容以数据流形式检索决策树,记录敏感词词频、区域信息以及敏感词级别,计算文本整体敏感度,过滤敏感文本。实验结果表明,SWDT-IFA算法具有较高的查准率和查全率,且执行时间能够满足当前网络环境的实时性要求。%With the development of Internet, many negative effects come out as the exponential growth of various information resources,which means that a more secure and healthy network environment should be constructed right now. In order to solve this problem,this paper proposes a Sensitive Word Decision Tree for Information Filtering Algorithm ( SWDT-IFA) for content-based Web pages. The algorithm takes no consideration of dictionary and word segmentation, builds the foundation on the sensitive words decision tree,lets the web text retrieval decision tree in form of data stream, records word frequency,regional information and sensitive level,and calculates the sensitive degree of the text to filter the sensitivity. Experimental results show that the SWDT-IFA algorithm has precision ratio and recall ratio, and low time complexity which can require the real-time demand of network environment.

  19. 基于决策树和链接相似的DeepWeb查询接口判定%Deep Web query interface identification based on decision tree and link-similar

    Institute of Scientific and Technical Information of China (English)

    李雪玲; 施化吉; 兰均; 李星毅

    2011-01-01

    针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法.该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口.结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率.%In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished, this paper proposed a Deep Web query interface identification method based on decision tree and link-similar. This method used attribute information gain ratio as selection level, built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features, and then used a new method based on link-similar to identify these unidentified again, distinguishing between Deep Web query interface and the interface of search engines. The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.

  20. COMPARING THE PERFORMANCE OF SEMANTIC IMAGE RETRIEVAL USING SPARQL QUERY, DECISION TREE ALGORITHM AND LIRE

    Directory of Open Access Journals (Sweden)

    Magesh

    2013-01-01

    Full Text Available The ontology based framework is developed for representing image domain. The textual features of images are extracted and annotated as the part of the ontology. The ontology is represented in Web Ontology Language (OWL format which is based on Resource Description Framework (RDF and Resource Description Framework Schema (RDFS. Internally, the RDF statements represent an RDF graph which provides the way to represent the image data in a semantic manner. Various tools and languages are used to retrieve the semantically relevant textual data from ontology model. The SPARQL query language is more popular methods to retrieve the textual data stored in the ontology. The text or keyword based search is not adequate for retrieving images. The end users are not able to convey the visual features of an image in SPARQL query form. Moreover, the SPARQL query provides more accurate results by traversing through RDF graph. The relevant images cannot be retrieved by one to one mapping. So the relevancy can be provided by some kind of onto mapping. The relevancy is achieved by applying a decision tree algorithm. This study proposes methods to retrieve the images from ontology and compare the image retrieval performance by using SPARQL query language, decision tree algorithm and Lire which is an open source image search engine. The SPARQL query language is used to retrieving the semantically relevant images using keyword based annotation and the decision tree algorithms are used in retrieving the relevant images using visual features of an image. Lastly, the image retrieval efficiency is compared and graph is plotted to indicate the efficiency of the system.

  1. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    Directory of Open Access Journals (Sweden)

    O.BENCHAREF

    2011-09-01

    Full Text Available The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

  2. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    Institute of Scientific and Technical Information of China (English)

    Fang Ye; Zhi-Hua Chen; Jie Chen; Fang Liu; Yong Zhang; Qin-Ying Fan; Lin Wang

    2016-01-01

    Background:In the past decades,studies on infant anemia have mainly focused on rural areas of China.With the increasing heterogeneity of population in recent years,available information on infant anemia is inconclusive in large cities of China,especially with comparison between native residents and floating population.This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing.Methods:As useful methods to build a predictive model,Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia.A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31,2014.Results:The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics.The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia.Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy,exclusive breastfeeding in the first 6 months,and floating population,CHAID decision tree analysis also identified the fourth risk factor,the matemal educational level,with higher overall classification accuracy and larger area below the receiver operating characteristic curve.Conclusions:The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners.CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity.Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  3. FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ferdaws Ezzi

    2016-01-01

    Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.

  4. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  5. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  6. Optimizing Decision Tree Attack on CAS Scheme

    Directory of Open Access Journals (Sweden)

    PERKOVIC, T.

    2016-05-01

    Full Text Available In this paper we show a successful side-channel timing attack on a well-known high-complexity cognitive authentication (CAS scheme. We exploit the weakness of CAS scheme that comes from the asymmetry of the virtual interface and graphical layout which results in nonuniform human behavior during the login procedure, leading to detectable variations in user's response times. We optimized a well-known probabilistic decision tree attack on CAS scheme by introducing this timing information into the attack. We show that the developed classifier could be used to significantly reduce the number of login sessions required to break the CAS scheme.

  7. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  8. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  9. A decision tree for soft tissue grafting.

    Science.gov (United States)

    Leong, Daylene Jack-Min; Wang, Hom-Lay

    2011-06-01

    Periodontal plastic surgery is commonly performed for esthetic and physiologic reasons, such as alleviating root sensitivity, root caries, and cervical abrasion and facilitating plaque control at the affected site. Currently, there is a lack of information regarding the most appropriate treatment method for the various clinical situations encountered. The aims of this paper are to review and discuss the various clinical situations that require soft tissue grafting and to attempt to provide recommendations for the most predictable technique. Using MEDLINE and The Cochrane Library, a review of all available literature was performed. Papers published in peer-reviewed journals written in English were chosen and reviewed to validate the decision-making process when planning for soft tissue grafting. A decision tree was subsequently developed to guide clinicians to choose the most appropriate soft tissue grafting procedure by taking into consideration the following clinical parameters: etiology, purpose of the procedure, adjacent interproximal bone level, and overlying tissue thickness. The decision tree proposed serves as a guide for clinicians to select the most appropriate and predictable soft tissue grafting procedure to minimize unnecessary mistakes while providing the ultimate desired treatment outcome.

  10. Fuzzy Decision Tree Model for Driver Behavior Confronting Yellow Signal at Signalized Intersection%交叉口黄灯期间驾驶员行为的模糊决策树模型

    Institute of Scientific and Technical Information of China (English)

    龙科军; 赵文秀; 肖向良

    2011-01-01

    Drivers decision to go or stop during the yellow interval belongs to uncertain decision making. This paper collects drivers behavior data at four similar intersections. Fuzzy Decision Tree(FDT) is applied to model driver behavior at signalized intersection. Considering vehicle location,velocity and countdown timer as the influencing factors, the FDT model is constructed using FID3 algorithm, and decision roles are generated as well. Test sample is applied to test FDT model, and results indicate that FDT model can predict drivers' decision with overall accuracy of 84.8%.%采集黄灯期间驾驶员行为的相关数据,考虑车辆位置、车速、倒计时表3个影响因素,分别设定其隶属度函数,应用模糊决策树中的FID3算法,以模糊信息熵为启发信息,构建驾驶员选择的模糊决策树模型,生成决策规则.利用测试样本对模型进行检验,结果表明,基于模糊决策树的预测结果准确率总体达到84.8%.

  11. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  12. Modeling of stage-discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi-Sugeno inference system technique: a comparative study

    Science.gov (United States)

    Al-Abadi, Alaa M.

    2016-11-01

    The potential of using three different data-driven techniques namely, multilayer perceptron with backpropagation artificial neural network (MLP), M5 decision tree model, and Takagi-Sugeno (TS) inference system for mimic stage-discharge relationship at Gharraf River system, southern Iraq has been investigated and discussed in this study. The study used the available stage and discharge data for predicting discharge using different combinations of stage, antecedent stages, and antecedent discharge values. The models' results were compared using root mean squared error (RMSE) and coefficient of determination ( R 2) error statistics. The results of the comparison in testing stage reveal that M5 and Takagi-Sugeno techniques have certain advantages for setting up stage-discharge than multilayer perceptron artificial neural network. Although the performance of TS inference system was very close to that for M5 model in terms of R 2, the M5 method has the lowest RMSE (8.10 m3/s). The study implies that both M5 and TS inference systems are promising tool for identifying stage-discharge relationship in the study area.

  13. 一种用于网络取证分析的模糊决策树推理方法%Fuzzy Decision Tree Based Inference Techniques for Network Forensic Analysis

    Institute of Scientific and Technical Information of China (English)

    刘在强; 林东岱; 冯登国

    2007-01-01

    网络取证是对现有网络安全体系的必要扩展,已日益成为研究的重点.但目前在进行网络取证时仍存在很多挑战:如网络产生的海量数据;从已收集数据中提取的证据的可理解性;证据分析方法的有效性等.针对上述问题,利用模糊决策树技术强大的学习能力及其分析结果的易理解性,开发了一种基于模糊决策树的网络取证分析系统,以协助网络取证人员在网络环境下对计算机犯罪事件进行取证分析.给出了该方法的实验结果以及与现有方法的对照分析结果.实验结果表明,该系统可以对大多数网络事件进行识别(平均正确分类率为91.16%),能为网络取证人员提供可理解的信息,协助取证人员进行快速高效的证据分析.%Network forensics is an important extension to present security infrastructure,and is becoming the research focus of forensic investigators and network security researchers.However many challenges still exist in conducting network forensics:The sheer amount of data generated by the network;the comprehensibility of evidences extracted from collected data;the efficiency of evidence analysis methods,etc.Against above challenges,by taking the advantage of both the great learning capability and the comprehensibility of the analyzed results of decision tree technology and fuzzy logic,the researcher develops a fuzzy decision tree based network forensics system to aid an investigator in analyzing computer crime in network environments and automatically extract digital evidence.At the end of the paper,the experimental comparison results between our proposed method and other popular methods are presented.Experimental results show that the system can classify most kinds of events (91.16% correct classification rate on average),provide analyzed and comprehensible information for a forensic expert and automate or semi-automate the process of forensic analysis.

  14. 基于决策树体系的预想故障集下风电场扰动风险测度评估%Disturbance Risk Measure of Wind Farm Based on Decision Trees under Contingency

    Institute of Scientific and Technical Information of China (English)

    卓毅鑫; 徐铝洋; 张伟; 林湘宁; 李正天

    2015-01-01

    With the development of wind power and scale of wind farm, the spatial distribution difference between wind turbines also increase. Besides, wind turbine trip-off and damage accidents has occurred frequently because of the severe wind conditions, having adverse impacts on the stability and safety operation of power grid. Therefore, it is necessary to study the online risk assessment method for power system with wind energy. Considering the wind turbine spatial distribution difference, this paper proposed an online disturbance risk measure of wind farm based on decision trees, which can perform data mining on online information, and make fast judgement on voltage violation and wind turbine trip-off. Furthermore, according to the judgement of decision trees, disturbance risk measure indices are proposed, which are visualized and provide supportive information for wind farm and power system operators.%随着风力发电的大力发展及风电场规模的持续增加,风机的空间分布差异性愈发显著。此外,风机运行状态易受风电场元件故障、电网扰动等诸多因素的影响,因此,建立实时在线评估方法和预警机制已成为当务之急。该文考虑了风电场风机分布的离散特性,建立了风电场动态安全决策树体系,并提出风电场扰动风险测度指标。该决策树体系可利用在线信息进行数据挖掘,针对预想故障集下的风机电压越限与脱网状况进行快速分析判断,并根据判断结果输出扰动测度指标,为电网及风电场运行人员提供直观地风险水平及决策参考。通过风电场算例分析,验证了所提方法的有效性。

  15. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  16. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  17. 基于C5.0决策树的税务稽查研究%Tax Inspection Research Based on C5.0 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    陈仕鸿; 刘晓庆

    2011-01-01

    The principle of C5.0 decision tree is analyzed and used in tax inspection. Through its model financial statements and tax declarations of 80 businesses and enterprises are analyzed and compared with binary Logistic regression. The result shows the model can assist the inspection and improve efficiency and effectiveness of checking case selection.%简要分析了C5.0决策树原理,并将它应用于税务稽查中,通过C5.0决策树模型,对80个商业企业的财务报表和纳税申报袁的分析,再与二分类Logistic回归法进行比较,结论表明该模型方法能够辅助稽查选案,提高稽查选案工作的效率和效果。

  18. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  19. Antibiogram-Derived Radial Decision Trees: An Innovative Approach to Susceptibility Data Display

    Directory of Open Access Journals (Sweden)

    Rocco J. Perla

    2005-01-01

    Full Text Available Hospital antibiograms (ABGMs are often presented in the form of large 2-factor (single organism vs. single antimicrobial tables. Presenting susceptibility data in this fashion, although of value, does have limitations relative to drug resistant subpopulations. As the crisis of antimicrobial drug-resistance continues to escalate globally, clinicians need (1 to have access to susceptibility data that, for isolates resistant to first-line drugs, indicates susceptibility to second line drugs and (2 to understand the probabilities of encountering such organisms in a particular institution. This article describes a strategy used to transform data in a hospital ABGM into a probability-based radial decision tree (RDT that can be used as a guide to empiric antimicrobial therapy. Presenting ABGM data in the form of a radial decision tree versus a table makes it easier to visually organize complex data and to demonstrate different levels of therapeutic decision-making. The RDT model discussed here may also serve as a more effective tool to understand the prevalence of different resistant subpopulations in a given institution compared to the traditional ABGM.

  20. 基于决策树的虚拟咨询团队成员选择路径%The Decision Tree-based Path for Selecting Virtual Consulting Team Members

    Institute of Scientific and Technical Information of China (English)

    尚珊; 胡贵玲; 崔洁

    2012-01-01

    This paper expatiates on the importance of the virtual consulting team in the development of the virtual consulting enterprise.Based on the comparative analysis of the virtual consulting enterprises themselves with the entity consulting enterprises and virtual enterprises,this paper discusses the existing problems in virtual consulting enterprises nowadays,and points out that the virtual team cooperation in virtual consulting enterprises is an important approach to solve these problems.The paper gives the selection process of virtual consulting team cooperation,and for the first time puts forward the specific practice of using decision tree to select team members.%阐述虚拟咨询团队在虚拟咨询企业发展中的重要作用,通过对虚拟咨询企业自身及与实体咨询企业、虚拟企业的对比分析,探讨虚拟咨询企业现今存在的问题,并提出虚拟咨询企业实现虚拟团队合作是解决这些问题的一条重要途径,给出虚拟咨询团队合作的选择流程,并且首次提出利用决策树来选择团队成员的具体做法。

  1. 基于朴素贝叶斯与ID3算法的决策树分类%Decision Tree Classification Based on Naive Bayesian and ID3 Algorithm

    Institute of Scientific and Technical Information of China (English)

    黄宇达; 王迤冉

    2012-01-01

    在朴素贝叶斯算法和ID3算法的基础上,提出一种改进的决策树分类算法.引入客观属性重要度参数,给出弱化的朴素贝叶斯条件独立性假设,并采用加权独立信息熵作为分类属性的选取标准.理论分析和实验结果表明,改进算法能在一定程度上克服ID3算法的多值偏向问题,并且具有较高的执行效率和分类准确度.%This paper proposes an improved decision tree classification algorithm based on naive Bayes algorithm and ID3 algorithm. It introduces objective attribute importance parameter, gives a kind of conditional independence assumption that is weaker than naive Bayesian algorithm, and uses the weighted independent information entropy as splitting attribute's selection criteria. Theoretical analysis and experimental results show that the improved algorithm, to a certain extent well overcomes ID3 algorithm's shortcoming of multi-value tendency, and improves algorithm's implementation efficiency and classification accuracy.

  2. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  3. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  4. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  5. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  6. Estimating Suspended Sediment by Artificial Neural Network (ANN, Decision Trees (DT and Sediment Rating Curve (SRC Models (Case study: Lorestan Province, Iran

    Directory of Open Access Journals (Sweden)

    Fatemeh Barzegari

    2015-12-01

    Full Text Available The aim of this study was to estimate suspended sediment by the ANN model, DT with CART algorithm and different types of SRC, in ten stations from the Lorestan Province of Iran. The results showed that the accuracy of ANN with Levenberg-Marquardt back propagation algorithm is more than the two other models, especially in high discharges. Comparison of different intervals in models showed that running models with monthly data,resulted in smaller error and better estimated results. Moreover, results showed that using Minimum Variance Unbiased Estimator (MVUE bias correction factor modified the SRC results, especially in monthly time steps in almost all stations. Hence, it can be said that if because of advantages such as simplicity, SRC models are preferred, it is better that MSRC (modified sediment rating curve is used in monthly period.

  7. An Efficient Method of Vibration Diagnostics For Rotating Machinery Using a Decision Tree

    Directory of Open Access Journals (Sweden)

    Bo Suk Yang

    2000-01-01

    Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.

  8. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  9. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  10. The Research of Reliability of Trash E-mail Identifier Based on Decision Tree of Continuous Attributes%连续属性决策树所建立的垃圾邮件识别器的稳定性研究

    Institute of Scientific and Technical Information of China (English)

    王星; 谢邦昌

    2005-01-01

    Avoiding spare mial is one of the most critical problem in Internet technology, finding the most important attribute or the attribute combination to identify which email is normal and which email is spam mail, is the bottleneck of discriminate of the spam. Recent years, decision tress is popular used for excellent with good expression and capable to output rules, and then becomes the core technique in predicting spam mail. However, many famous decision trees such as CA .5 and CART is not very robust,that make the output is not stable which distrubing the construction of the identifying classification. In this paper, we studied the robust of CART algorithm, point out the robust problem when using the decision tree classifier on identifying Spam from normal email with interval attribute, then we try to using BAGGING algorithm to gain more robust model, an at the same time increase the performance of the initial models.

  11. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objec...

  12. Invasion Rule Generation Based on Fuzzy Decision Tree%基于模糊决策树的入侵规则生成技术

    Institute of Scientific and Technical Information of China (English)

    郭洪荣

    2013-01-01

      计算机免疫系统模型GECISM中的类MC Agent,可有效的利用模糊决策树Fuzzy-Id3算法,将应用程序中系统调用视为数据集构造决策树,便会生成计算机免疫系统中入侵检测规则,并分析对比试验结束后的结果,利用Fuzzy-Id3算法所生成的规则对于未知数据的收集进行分类,具有低误报率、低漏报率。%Class MC Agent of computer immune system model GECISM can effectively use fuzzy decision-making tree Fuzzy-Id3 algorithm, consider the system call in application program as data set constructed decision-making tree, generate the invasion detection rules of computer immune system, and analyze comparison test results, use rules generated by Fuzzy-Id3 algorithm to classify for unknown data of collection, has low errors reported rate, and low omitted rate.

  13. MALDI-TOF MS Combined With Magnetic Beads for Detecting Serum Protein Biomarkers and Establishment of Boosting Decision Tree Model for Diagnosis of Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Chibo Liu, Chunqin Pan, Jianmin Shen, Haibao Wang, Liang Yong

    2011-01-01

    Full Text Available The aim of present study is to study the serum protein fingerprint of patients with colorectal cancer (CRC and to screen protein molecules that are closely related to colorectal cancer during the onset and progression of the disease with Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS. Serum samples from 144 patients with CRC and 120 healthy volunteers were adopted in present study. Weak cation exchange (WCX magnetic beads and PBSII-C protein chips reader (Ciphergen Biosystems Ins. were used. The protein fingerprint expression of all the Serum samples and the resulted profiles between cancer and normal groups were analyzed with Biomarker Wizard system. Several proteomic peaks were detected and four potential biomarkers with different expression profiles were identified with their relative molecular weights of 2870.7Da, 3084Da, 9180.5Da, and 13748.8Da, respectively. Among the four proteins, two proteins with m/z 2870.7 and 3084 were down-regulated, and the other two with m/z 9180.5 and 13748.8 were up-regulated in serum samples from CRC patients. The present diagnostic model could distinguish CRC from healthy controls with the sensitivity of 92.85% and the specificity of 91.25%. Blind test data indicated a sensitivity of 86.95% and a specificity of 85%. The result suggested that MALDI technology could be used to screen critical proteins with differential expression in the serum of CRC patients. These differentially regulated proteins were considered as potential biomarkers for the patients with CRC in the serum and of the potential value for further investigation.

  14. Prediction Of Study Track Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Deepali Joshi

    2014-05-01

    Full Text Available One of the most important issues to succeed in academic life is to assign students to the right track when they arrive at the end of basic education stage. The education system is graded from 1st to 10th standard, where after finishing the 10th grade the student’s are distributed into different academic tracks or fields such as Science, Commerce, Arts depending on the marks that they have scored. In order to succeed in academic life the student should select the correct academic field. Many students fail to select the appropriate field. At one instant of time they prefer a certain type of career and at the next instant they consider another option. To improve the quality of education data mining techniques can be utilized instead of the traditional process. The proposed system has many benefits as compared to traditional system as the accuracy of results is better. The problems can be solved via the proposed system. The proposed system will predict the streams through the decision tree method. With each and every input the proposed system evolves with better accuracy.

  15. Probabilistic lung nodule classification with belief decision trees.

    Science.gov (United States)

    Zinovev, Dmitriy; Feigenbaum, Jonathan; Furst, Jacob; Raicu, Daniela

    2011-01-01

    In reading Computed Tomography (CT) scans with potentially malignant lung nodules, radiologists make use of high level information (semantic characteristics) in their analysis. Computer-Aided Diagnostic Characterization (CADc) systems can assist radiologists by offering a "second opinion"--predicting these semantic characteristics for lung nodules. In this work, we propose a way of predicting the distribution of radiologists' opinions using a multiple-label classification algorithm based on belief decision trees using the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, which includes semantic annotations by up to four human radiologists for each one of the 914 nodules. Furthermore, we evaluate our multiple-label results using a novel distance-threshold curve technique--and, measuring the area under this curve, obtain 69% performance on the validation subset. We conclude that multiple-label classification algorithms are an appropriate method of representing the diagnoses of multiple radiologists on lung CT scans when ground truth is unavailable.

  16. Analysis of College Students Consumption Data Based on Decision Tree Data Mining Algorithm%基于决策树数据挖掘算法的大学生消费数据分析

    Institute of Scientific and Technical Information of China (English)

    黄剑

    2015-01-01

    文章使用决策树数据挖掘算法为基本工具,以近年大学生在校校园卡消费数据为基础,探讨数据挖掘在分析和研究大学生在校消费行为变化、消费特点以及与消费价格之间的深入关系.通过对消费数据的数据挖掘,分析得到近年来大学生消费行为、习惯、消费量的信息,找出其中的内在关联和变化趋势.并使文章结果能够更好、更有效的指导学校餐饮价格波动、菜品的新增;在学生可承受的价格范围内更好的提供餐饮服务.%This paper uses decision tree data mining algorithm as the basic tool. Based on the consumption data of college students in college in recent years, the relationship between college students consumption behavior, consumption characteristics and consumption price is analyzed and studied by data mining. Through data mining of consumption data, the information of College Students' consumption behavior, habits and consumption is analyzed, and the inherent relation and changing trend are found out. And the results of this paper can better and more effectively guide the food price fluctuation and the new dishes, and provide catering service for the students who can afford the price range.

  17. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  18. Classification of Liss IV Imagery Using Decision Tree Methods

    Science.gov (United States)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  19. CLASSIFICATION OF LISS IV IMAGERY USING DECISION TREE METHODS

    Directory of Open Access Journals (Sweden)

    A. K. Verma

    2016-06-01

    Full Text Available Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  20. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size.

  1. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  2. Automatic design of decision-tree algorithms with evolutionary algorithms.

    Science.gov (United States)

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  3. Application of Decision Tree Algorithm in Stamping Process

    Institute of Scientific and Technical Information of China (English)

    WANG Ying-chun; LI Da-yong; YIN Ji-long; PENG Ying-hong

    2005-01-01

    Various process parameters exert different effects in stamping process. In order to study the relationships among the process parameters of box stamping process, including the blank holder force, friction coefficient,depth of drawbead, offset and length of drawbead, the decision tree algorithm C4.5 was performed to generate the decision tree using the result data of the box stamping simulation. The design and improvement methods of the decision tree were presented. Potential and valuable rules were generated by traversing the decision tree, which plays an instructive role on the practical design. The rules show that the correct combination of blank holder force and setting of drawbead are the dominant contribution for controlling the cracking and wrinkling in box stamping process. In order to validate the rules, the stamping process for box was also performed. The experiment results show good agreement with the generated rules.

  4. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  5. 基于决策树方法的银行客户关系管理的研究和应用%Research and Application of Bank Customer Relationship Management based on the Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    李明辉

    2012-01-01

      Decision tree algorithm in data mining is a very important value in the banking industry. Decision tree technology for the banking industry, through the analysis of specific customer background information, predict the customer's customer categories in order to take the appropriate business strategy, both to improve the service level of banking services, development of client resources, to avoid the loss of customers, to conserve resources, use of a minimum investment to get a larger income. Bank lending to judge whether the borrowers have the risk of the loan proposal is feasible, customers will be classified in accordance with the actual needs of the bank, these problems can be resolved through the decision tree algorithm%  数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。

  6. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  7. 基于ENVI的决策树方法在土地利用分类中的应用%Application of Decision Tree based on ENVI to the Classification of Land Utilization

    Institute of Scientific and Technical Information of China (English)

    秦臻; 汪云甲; 王行风; 阚俊峰; 李晓霞

    2011-01-01

    Taking Shenfu Dongsheng mine in north of Yulin Shenmu County,Shanxi Province as a study case,and with the support of ENVI software, Landsat ETM image was used to analyze the spectral characteristics of the image and its value of NDVI,NDBI,NDWI. And this image was transformed with tasseled cap to determine the threshold value for different lands and build a decision tree. Then,the classification results were obtained and the role of decision tree in remote sensing data classification was evaluated.%以陕西省榆林市神木县北部神府东胜矿区为研究区,利用Landsat ETM 影像,在ENVI软件的支持下,分析了影像的光谱特征及NDVI,NDBI,NDWI特征值,并对影像进行缨帽变换,确定各地类的综合阈值,建立决策树模型,得到分类结果,并且评价了决策树分类在遥感数据分类中的作用.

  8. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  9. Novel decision tree algorithms for the treatment planning of compromised teeth.

    Science.gov (United States)

    Ovaydi-Mandel, Amy; Petrov, Sofia D; Drew, Howard J

    2013-01-01

    In clinical practice, dentists are faced with the dilemma of whether to treat, maintain, or extract a tooth. Of primary importance are the patient's desires and the restorability and periodontal condition of the tooth/teeth in question. Too often, clinicians extract teeth when endodontic therapy, crown-lengthening surgery, forced orthodontic eruption, or regenerative therapy can be used with predictable results. In addition, many clinicians do not consider the use of questionable teeth as provisional or transitional abutments. The aim of this article is to present a novel decision tree approach that will address the clinical deductive reasoning, based on the scientific literature and exemplified by selective case presentations, that may help clinicians make the right decision. Innovative decision tree algorithms will be proposed that consider endodontic, restorative, and periodontal assessments to improve and possibly eliminate erroneous decision making. Decision-based algorithms are dynamic and must be continually updated in accordance with new evidence-based studies.

  10. Research on Internet of Things Security Based on Support Vector Machines with Balanced Binary Decision Tree%基于平衡二叉决策树SVM算法的物联网安全研究

    Institute of Scientific and Technical Information of China (English)

    张晓惠; 林柏钢

    2015-01-01

    物联网是继计算机、互联网和移动通信之后的又一次信息产业革命。目前,物联网已经被正式列为国家重点发展的战略性新兴产业之一,其应用范围几乎覆盖了各行各业。物联网中存在的网络入侵等安全问题日趋突出,在大数据背景下,文章提出一种适用于物联网环境的入侵检测模型。该模型把物联网中的入侵检测分为数据预处理、特征提取和数据分类3部分。数据预处理主要解决数据的归一化和冗余数据等问题;特征提取的主要目标是降维,以减少数据分类的时间;数据分类中引入平衡二叉决策树支持向量机(SVM)多分类算法,选用BDT-SVM算法对网络入侵数据进行训练和检测。实验表明,选用BDT-SVM多分类算法可以提高入侵检测系统的精度;通过特征提取,在保证精度的前提下,减少了检测时间。%The Internet of Things (IoT) is another information industry revolution after the computer, the Internet and the mobile communications. At present, IoT has been ofifcially listed as one of the national strategic emerging industries, and its application range covers almost all areas. Secure problems such as network intrusion in the IoT art prominent increasingly. In the big data context, this paper proposes an intrusion detection model that is suitable for IoT which divides the intrusion detection procedure into three parts, which are data preprocessing, features extraction and data classiifcation. Data normalization and data redundancy reduction are solved in the data preprocessing. The main goal of features extraction is to reduce the dimension and thus to reduce the time of data classiifcation. Support vector machine with balanced binary decision tree algorithm that is named BDT-SVM is introduced in the data classiifcation for training and testing the network intrusion data. Experimental results show that it can improve the accuracy of intrusion

  11. Predicting metabolic syndrome using decision tree and support vector machine methods

    Science.gov (United States)

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-01-01

    BACKGROUND Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. METHODS This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. RESULTS SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. CONCLUSION The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in

  12. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  13. 基于决策树法的北京城市居民通勤距离模式挖掘%Data mining on commuting distance mode of urban residents based on the analysis of decision tree

    Institute of Scientific and Technical Information of China (English)

    王茂军; 宋国庆; 许洁

    2009-01-01

    以问卷调查数据为基础,引进决策树分析方法,讨论了北京市城市居民通勤距离模式.研究发现:第一,在设定的修剪纯度下,北京城市居民通勤距离远近与出行工具、居住地变更、职业、居住地就业率、最小孩子求学状况、住房而积、家庭月收入、机动车利用状况密切相关;第二,在影响城市居民通勤距离的变量中,出行工具变量的重要性最大,其次是住房而积变量、最小孩子求学变量,再次为居住地变更变量、职业变量,家庭月收入变量为第四等级,机动车使用变量和本地就业率为第五等级.第三,因住房产权复杂性、迁居原因的多样性、被动郊区化以及生产、育儿福利及家庭内部事务分工等因素的影响,住房面积、迁居史、家庭生命周期、职业与通勤距离的关系与国内已有结论相悖,部分变量因子对短距离通勤具有决定性影响,部分变量对于长距离通勤有决定性影响.%With the development of suburbanization, urban residents now have more choices in jobs and housing locations. Nowadays, scholars increasingly pay attention to the studies on citizens' commuting mode. The analysis of commuting space characteristics belongs to the study of geography. Based on questionnaire survey, this paper first makes a descrip-tive analysis of people's commuting variables, distances, and directions. Then it discusses the commuters of Beijing by decision tree analysis and data mining. Conclusions are ob-tained as follows:First, under the fixed pruning severity, people's commuting distance is related to their traveling vehicles, resident locations, jobs, youngest child's education conditions, living space, family incomes, usage of cars, and employment rate on local areas. Factors such as gender, educational level, marital status, housing property are not involved in the mode. Second, our study of the relations between the eight variables and commuting distance is

  14. USING PRECEDENTS FOR REDUCTION OF DECISION TREE BY GRAPH SEARCH

    Directory of Open Access Journals (Sweden)

    I. A. Bessmertny

    2015-01-01

    Full Text Available The paper considers the problem of mutual payment organization between business entities by means of clearing that is solved by search of graph paths. To reduce the decision tree complexity a method of precedents is proposed that consists in saving the intermediate solution during the moving along decision tree. An algorithm and example are presented demonstrating solution complexity coming close to a linear one. The tests carried out in civil aviation settlement system demonstrate approximately 30 percent shortage of real money transfer. The proposed algorithm is planned to be implemented also in other clearing organizations of the Russian Federation.

  15. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  16. 基于决策树方法的水库跨流域引水调度规则研究%Research on reservoir operation rules of inter-basin water transfer based on decision tree method

    Institute of Scientific and Technical Information of China (English)

    习树峰; 彭勇; 梁国华; 王本德; 谢志高; 李学森

    2012-01-01

    The inter-basin water transfer operation belongs to the conventional water transfer planning operation mode,and the real-time information is not considered in the operation.To solve this problem,the decision tree method in data mining is used combining the current reservoir forecast information,underlaying surface water storage condition,perennial reservoir running situation and other data with the reservoir managers' actual operation experiences,and then,the inter-basin water transfer real-time operation rules can be realized.Research process has three steps.Firstly,initial reservoir water level,actual rainfall,GFS forecasting rainfall,soil moisture,diversion water quantity,etc.are selected to compose the reservoir operation data set.Secondly,the inter-basin water transfer operation decision tree is extracted by using data mining technology.Finally,the operation decision tree is tested and the inter-basin water transfer real-time operation rules are obtained.The actual example results show that using decision tree inter-basin water transfer operation rules in the reservoir operating can increase the water resource efficiency and the reservoir comprehensive benefits.This research result has some reference value for the further study and application of the inter-basin water transfer real-time operation.%目前跨流域引水调度属于常规引水规划调度方式,没有考虑实时信息.利用数据挖掘技术中的决策树方法将水库当前的气象预报信息和下垫面蓄水状态、水库多年的实际运行情况等资料与水库管理者的实际调度经验相结合,提出跨流域引水水库的实时调度规则.研究分三步,即首先选取旬初库水位、GFS预报与实际降雨量,旬前土壤含水状态,以及跨流域引水量等资料构成水库调度数据集;然后利用数据挖掘技术从中提取跨流域引水调度决策树;最后对调度决策树进行检验获取跨流域引水水库实时调度规则.实例计算结

  17. Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images.

    Science.gov (United States)

    Kuo, W J; Chang, R F; Chen, D R; Lee, C C

    2001-03-01

    To increase the ability of ultrasonographic (US) technology for the differential diagnosis of solid breast tumors, we describe a novel computer-aided diagnosis (CADx) system using data mining with decision tree for classification of breast tumor to increase the levels of diagnostic confidence and to provide the immediate second opinion for physicians. Cooperating with the texture information extracted from the region of interest (ROI) image, a decision tree model generated from the training data in a top-down, general-to-specific direction with 24 co-variance texture features is used to classify the tumors as benign or malignant. In the experiments, accuracy rates for a experienced physician and the proposed CADx are 86.67% (78/90) and 95.50% (86/90), respectively.

  18. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

    Science.gov (United States)

    Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

    2012-05-30

    The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release.

  19. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  20. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  1. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  2. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    Hoogh, de Sebastiaan; Schoenmakers, Berry; Chen, Ping; Akker, op den Harm

    2014-01-01

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approa

  3. Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

    Directory of Open Access Journals (Sweden)

    Chuan Ding

    2016-10-01

    Full Text Available Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.

  4. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    Science.gov (United States)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  5. A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-09-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used, as measures of prediction accuracy, for this study. A series of experiments is reported using Tukutuku software projects dataset. The results are compared with those produced by three crisp versions of decision trees: ID3, C4.5 and CART.

  6. Extraction of information on construction land based on multi-feature decision tree classification%基于多特征决策树的建设用地信息提取

    Institute of Scientific and Technical Information of China (English)

    饶萍; 王建力; 王勇

    2014-01-01

    Spatial distribution status of construction land is closely related to the regional economic and social development. Therefore, timely monitoring and delivery of data on the dynamics of construction land are far-reaching for policy and decision making processes. Classifying land-use/land-cover and analyzing changes are among the most common applications of remote sensing. One of the most basic and difficult classification tasks is to distinguish the construction land from other land surfaces. Landsat imagery is one of the most widely used sources of data in remote sensing of construction land. Several techniques of construction land extraction using Landsat data are described in some literatures, but their applications are constrained by low accuracy in various situations, and usually using the technique of single index or multi-index. The purpose of this study was to devise a method to improve the accuracy of construction land extraction in the presence of various kinds of environmental noise. Thus we introduce a multi-features decision tree (DT) classification model for improving classification accuracy in the areas that including bare land, shadow and some streams, in which the other classification methods often fail to classify correctly. The model integrates four spectral indexes, the pattern recognition technique and spatial algorithms. The four spectral indexes are the normalized difference three bands index (NDTBI), the normalized difference building index (NDBI), the modified normalized difference water index (MNDWI) and the normalized difference vegetation index (NDVI) respectively. The pattern recognition technique is referred to support vector machine (SVM). And the spatial algorithm is to create buffer zone. The test site was deliberately selected so that it consists of complex surface features, such as bare land, hill shade, and some small streams that are liable to be mixed up with construction land on the Landsat imagery. For that reason, Landsat-8

  7. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    Science.gov (United States)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  8. Using decision tree to predict serum ferritin level in women with anemia

    Directory of Open Access Journals (Sweden)

    Parisa Safaee

    2016-04-01

    Full Text Available Background: Data mining is known as a process of discovering and analysing large amounts of data in order to find meaningful rules and trends. In healthcare, data mining offers numerous opportunities to study the unknown patterns in a data set. These patterns can be used to diagnosis, prognosis and treatment of patients by physicians. The main objective of this study was to predict the level of serum ferritin in women with anemia and to specify the basic predictive factors of iron deficiency anemia using data mining techniques. Methods: In this research 690 patients and 22 variables have been studied in women population with anemia. These data include 11 laboratories and 11 clinical variables of patients related to the patients who have referred to the laboratory of Imam Hossein and Shohada-E- Haft Tir hospitals from April 2013 to April 2014. Decision tree technique has been used to build the model. Results: The accuracy of the decision tree with all the variables is 75%. Different combinations of variables were examined in order to determine the best model to predict. Regarding the optimum obtained model of the decision tree, the RBC, MCH, MCHC, gastrointestinal cancer and gastrointestinal ulcer were identified as the most important predictive factors. The results indicate if the values of MCV, MCHC and MCH variables are normal and the value of RBC variable is lower than normal limitation, it is diagnosed that the patient is likely 90% iron deficiency anemia. Conclusion: Regarding the simplicity and the low cost of the complete blood count examination, the model of decision tree was taken into consideration to diagnose iron deficiency anemia in patients. Also the impact of new factors such as gastrointestinal hemorrhoids, gastrointestinal surgeries, different gastrointestinal diseases and gastrointestinal ulcers are considered in this paper while the previous studies have been limited only to assess laboratory variables. The rules of the

  9. Decision Tree Classifiers for Star/Galaxy Separation

    Science.gov (United States)

    Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Frago Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R.

    2011-06-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 = 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (~2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.

  10. Supervised hashing using graph cuts and boosted decision trees.

    Science.gov (United States)

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data.

  11. Distributed Decision-Tree Induction in Peer-to-Peer Systems

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in such...

  12. Emergent Linguistic Rules from Inducing Decision Trees Disambiguating Discourse Clue Words

    CERN Document Server

    Siegel, E V; Siegel, Eric V.; Keown, Kathleen R. Mc

    1994-01-01

    We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules.

  13. Research on the accuracy of TM images land-use classification based on QUEST decision tree: A case study of Lijiang in Yunnan%基于QUEST决策树的遥感影像土地利用分类——以云南省丽江市为例

    Institute of Scientific and Technical Information of China (English)

    吴健生; 潘况; 彭建; 黄秀兰

    2012-01-01

    The accuracy of research on land use/cover change (LUCC) is determined directly by the accuracy of land use classification derived from aerial and satellite images. In analysis of the factors of accuracy of current remote sensing image classification, some methods were introduced to study new trends of classification modes. Some previous studies showed that the speed and accuracy of QUEST (Quick, Unbiased, and Efficient Statistical Tree) decision tree classification were superior to those of other decision tree classifications. On the basis of this approach, the research classified the Landsat TM-5 images in Lijiang, Yunnan province. This paper compared the result with that of maximum likelihood image classification. The overall accuracy was 90. 086 %, which was higher than the overall accuracy (85. 965%) of CART (Classification And Regression Tree). Meanwhile, the Kappa efficient was 0. 849, which was higher than the Kappa efficient (0. 760) of CART. Therefore, it is concluded that in the complex terrain area such as in mountainous regions, the choice of QUEST decision tree classification on TM image would improve the accuracy of land use classification. This type of classification decision tree can precisely obtain new classification rules from integrated satellite images, land use thematic maps, DEM maps and other field investigation materials. Simultaneously, the method can also help users to find new classification rules in multidimensional information, and to build decision tree classifier models. Furthermore, the methods, including a large number of high-resolution and hyperspectral image data, integrated multi-sensor platform, multi-temporal remote sensing image, the pattern recognition and data mining of spectral and texture features, and auxiliary geographic data, will become a trend.%土地利用分类精度直接决定土地利用/土地覆被变化相关研究的准确性,而基于决策树的遥感影像分类是近年来提高土地利用分类

  14. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  15. Classification of posture and activities by using decision trees.

    Science.gov (United States)

    Zhang, Ting; Tang, Wenlong; Sazonov, Edward S

    2012-01-01

    Obesity prevention and treatment as well as healthy life style recommendation requires the estimation of everyday physical activity. Monitoring posture allocations and activities with sensor systems is an effective method to achieve the goal. However, at present, most devices available rely on multiple sensors distributed on the body, which might be too obtrusive for everyday use. In this study, data was collected from a wearable shoe sensor system (SmartShoe) and a decision tree algorithm was applied for classification with high computational accuracy. The dataset was collected from 9 individual subjects performing 6 different activities--sitting, standing, walking, cycling, and stairs ascent/descent. Statistical features were calculated and the classification with decision tree classifier was performed, after which, advanced boosting algorithm was applied. The computational accuracy is as high as 98.85% without boosting, and 98.90% after boosting. Additionally, the simple tree structure provides a direct approach to simplify the feature set.

  16. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  17. Using boosted decision trees for tau identification in the ATLAS experiment

    CERN Document Server

    Godfrey, Jennifer

    The ATLAS detector will begin taking data from p - p collisions in 2009. This experiment will allo w for man y dif ferent physics measurements and searches. The production of tau leptons at the LHC is a key signature of the decay of both the standard model Higgs (via H ! t t ) and SUSY particles. Taus have a short lifetime ( c t = 87 m m) and decay hadroni- cally 65% of the time. Man y QCD interactions produce similar hadronic sho wers and have cross-sections about 1 billion times lar ger than tau production. Multi variate techniques are therefore often used to distinguish taus from this background. Boosted Decision Trees (BDTs) are a machine-learning technique for developing cut-based discriminants which can signicantly aid in extracting small signal samples from overwhelming backgrounds. In this study , BDTs are used for tau identication for the ATLAS experiment. The y are a fast, exible alternati ve to existing discriminants with comparable or better performance.

  18. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  19. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    Science.gov (United States)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  20. Decision Tree and Texture Analysis for Mapping Debris-Covered Glaciers in the Kangchenjunga Area, Eastern Himalaya

    Directory of Open Access Journals (Sweden)

    Adina Racoviteanu

    2012-10-01

    Full Text Available In this study we use visible, short-wave infrared and thermal Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER data validated with high-resolution Quickbird (QB and Worldview2 (WV2 for mapping debris cover in the eastern Himalaya using two independent approaches: (a a decision tree algorithm, and (b texture analysis. The decision tree algorithm was based on multi-spectral and topographic variables, such as band ratios, surface reflectance, kinetic temperature from ASTER bands 10 and 12, slope angle, and elevation. The decision tree algorithm resulted in 64 km2 classified as debris-covered ice, which represents 11% of the glacierized area. Overall, for ten glacier tongues in the Kangchenjunga area, there was an area difference of 16.2 km2 (25% between the ASTER and the QB areas, with mapping errors mainly due to clouds and shadows. Texture analysis techniques included co-occurrence measures, geostatistics and filtering in spatial/frequency domain. Debris cover had the highest variance of all terrain classes, highest entropy and lowest homogeneity compared to the other classes, for example a mean variance of 15.27 compared to 0 for clouds and 0.06 for clean ice. Results of the texture image for debris-covered areas were comparable with those from the decision tree algorithm, with 8% area difference between the two techniques.

  1. Manifold Learning Co-Location Decision Tree for Remotely Sensed Imagery Classification

    Directory of Open Access Journals (Sweden)

    Guoqing Zhou

    2016-10-01

    Full Text Available Because traditional decision tree (DT induction methods cannot efficiently take advantage of geospatial knowledge in the classification of remotely sensed imagery, several researchers have presented a co-location decision tree (CL-DT method that combines the co-location technique with the traditional DT method. However, the CL-DT method only considers the Euclidean distance of neighborhood events, which cannot truly reflect the co-location relationship between instances for which there is a nonlinear distribution in a high-dimensional space. For this reason, this paper develops the theory and method for a maximum variance unfolding (MVU-based CL-DT method (known as MVU-based CL-DT, which includes unfolding input data, unfolded distance calculations, MVU-based co-location rule generation, and MVU-based CL-DT generation. The proposed method has been validated by classifying remotely sensed imagery and is compared with four other types of methods, i.e., CL-DT, classification and regression tree (CART, random forests (RFs, and stacked auto-encoders (SAE, whose classification results are taken as “true values.” The experimental results demonstrate that: (1 the relative classification accuracies of the proposed method in three test areas are higher than CL-DT and CART, and are at the same level compared to RFs; and (2 the total number of nodes, the number of leaf nodes, and the number of levels are significantly decreased by the proposed method. The time taken for the data processing, decision tree generation, drawing of the tree, and generation of the rules are also shortened by the proposed method compared to CL-DT, CART, and RFs.

  2. 基于决策树分类的大屯矿区地物信息提取及矿区污染分析%Extraction of Features Based on the Decision Tree Classification and Analysis of Plo lution in Datun County

    Institute of Scientific and Technical Information of China (English)

    于海若; 燕琴; 董春; 战丽丽

    2016-01-01

    本文在对国内外遥感图像分类方法充分研究分析的基础上,选择决策树分类法对大屯矿区的Landsat 8遥感图像进行分类研究。选取样本提取并分析研究区典型地类光谱特征曲线,依据光谱曲线特征和归一化植被指数建立了土地利用分类决策树模型,通过反复试验和修正,筛选出适宜大屯矿区地物分类的决策树最优阈值,对研究区进行分类和精度评价,最后通过分类结果对研究区的水体污染状况进行简要分析。%This article selected and used decision tree classification to classify and study Landsat 8 images of Datun Mining Area , on the basis of fully researching and analyzing Domestic and foreign remote sensing image classification method .Samples are chosen and subtracted to analyze the typical spectrum characteristic curve of the researched area and establish the land usage decision tree classifi -cation model according to the typical spectrum curve characteristic and NDVI .After repeated classifications and experiments , the opti-mal threshold value is picked out to analyze the preciously evaluate the researched area .Finally, brief analysis is conduct to water pol-lution in the researched area through the classification results .

  3. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    Science.gov (United States)

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions.

  4. An analysis and study of decision tree induction operating under adaptive mode to enhance accuracy and uptime in a dataset introduced to spontaneous variation in data attributes

    Directory of Open Access Journals (Sweden)

    Uttam Chauhan

    2011-01-01

    Full Text Available Many methods exist for the purpose of classification of an unknown dataset. Decision tree induction is one of the well-known methods for classification. Decision tree method operates under two different modes: nonadaptive and adaptive mode. The non adaptive mode of operation is applied when the data set is completely mature and available or the data set is static and their will be no changes in dataset attributes. However when the dataset is likely to have changes in the values and attributes leading to fluctuation i.e., monthly, quarterly or annually, then under the circumstances decision tree method operating under adaptive mode needs to be applied, as the conventional non-adaptive method fails, as it needs to be applied once again starting from scratch on the augmented dataset. This makes things expensive in terms of time and space. Sometimes attributesare added into the dataset, at the same time number of records also increases. This paper mainly studies the behavioral aspects of classification model particularly, when number of attr bute in dataset increase due to spontaneous changes in the value(s/attribute(s. Our investigative studies have shown that accuracy of decision tree model can be maintained when number of attributes including class increase in dataset which increases thenumber of records as well. In addition, accuracy also can be maintained when number of values increase in class attribute of dataset. The way Adaptive mode decision tree method operates is that it reads data instance by instance and incorporates the same through absorption to the said model; update the model according to valueof attribute particular and specific to the instance. As the time required to updating decision tree can be less than introducing it from scratch, thus eliminating the problem of introducing decision tree repeatedly from scratch and at the same time gaining upon memory and time.

  5. Land use classification in arid region based on multi-seasonal linear spectral mixture analysis and decision tree method%基于多季相光谱混合分解和决策树的干旱区土地利用分类

    Institute of Scientific and Technical Information of China (English)

    姜宛贝; 孙强强; 曲葳; 刘晓娜; 于文婧; 孙丹峰

    2016-01-01

    endmembers within each pixel. At last, these endmember abundance estimates were used for land cover/use classification in Minqin study area by using the decision tree method. According to the natural environment and land-use characters of study area and given the resolution of remote sensing data and applicability for ecosystem service assessment, in this research, we developed the two-level classification system. Exposed surface, crop land, forest/shrub land, grassland, impervious surface and water area were defined as first-level classes. The exposed surface was subdivided into moving sand, Gobi/hill/bare-land, salinized moving sand, and saline-alkaline land. Crop land was subdivided into spring crop, summer crop, perennial crop based on seasonal growth characteristics. Similarly, forest/shrub land was subdivided into spring forest/shrub, summer forest/shrub, and evergreen forest/shrub. Decision tree was designed based on the seasonality pattern of feature endmember abundance of each target class. The first step in the classification procedure was to overlay the training data on the three-seasonal abundance composite images for identifying the seasonality pattern of each class. The second step was to measure the feature endmember abundance distribution of each class within training samples. Aided by the histogram distribution, the segmenting boundary of each node was established by an interactive process. The results showed that sand, salt, green vegetation, dark materials, and water were five endmember classes used for multi-seasonal linear spectral mixture analysis. But their representative seasons were different. So, we extracted endmember reflectance for sand, salt, and green vegetation from early winter, spring, and summer, respectively. The spectral reflectance of dark material and water endmembers were derived from spring as well as salt. The mean RMSE (root-mean-square error) values were all lower than 0.01, which meant good fitness of linear spectral mixture model

  6. Using Boosted Decision Trees to look for displaced Jets in the ATLAS Calorimeter

    CERN Document Server

    CERN. Geneva

    2017-01-01

    A boosted decision tree is used to identify unique jets in a recently released conference note describing a search for long lived particles decaying to hadrons in the ATLAS Calorimeter. Neutral Long lived particles decaying to hadrons are “typical” signatures in a lot of models including Hidden Valley models, Higgs Portal Models, Baryogenesis, Stealth SUSY, etc. Long lived neutral particles that decay in the calorimeter leave behind an object that looks like a regular Standard Model jet, with subtle differences. For example, the later in the calorimeter it decays, the less energy will be deposited in the early layers of the calorimeter. Because the jet does not originate at the interaction point, it will likely be more narrow as reconstructed by the standard Anti-kT jet reconstruction algorithm used by ATLAS. To separate the jets due to neutral long lived decays from the standard model jets we used a boosted decision tree with thirteen variables as inputs. We used the information from the boosted decision...

  7. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  8. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  9. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  10. Diagnostic Features of Common Oral Ulcerative Lesions: An Updated Decision Tree

    Science.gov (United States)

    Safi, Yaser

    2016-01-01

    Diagnosis of oral ulcerative lesions might be quite challenging. This narrative review article aims to introduce an updated decision tree for diagnosing oral ulcerative lesions on the basis of their diagnostic features. Various general search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of MeSH keywords such as “oral ulcer,” “stomatitis,” and “mouth diseases.” Thereafter, English-language articles published since 1983 to 2015 in both medical and dental journals including reviews, meta-analyses, original papers, and case reports were appraised. Upon compilation of the relevant data, oral ulcerative lesions were categorized into three major groups: acute, chronic, and recurrent ulcers and into five subgroups: solitary acute, multiple acute, solitary chronic, multiple chronic, and solitary/multiple recurrent, based on the number and duration of lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by stepwise progression. PMID:27781066

  11. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  12. 基于C5.0决策树算法的西北干旱区土地覆盖分类研究——以甘肃省武威市为例%The Study of the Northwest Arid Zone Land-Cover Classification Based on C5.0 Decision Tree Algorithm at Wuwei City,Gansu Province

    Institute of Scientific and Technical Information of China (English)

    齐红超; 祁元; 徐瑱

    2009-01-01

    西北干旱区面积广阔,由于土地利用类型多样,成因复杂,对环境变化敏感、变化过程快、幅度大、景观差异明显等特点,在影像上表现出的"同物异谱"现象明显;利用常规目视解译、监督非监督分类、人工参与的决策树分类等方法在效率或精度等方面各有其缺陷.采用机器学习C5.0决策树算法,综合利用地物波谱、NDVI、TC、纹理等信息,根据样本数据自动挖掘分类规则并对整个研究区进行地物分类.机器学习的决策树可以挖掘出更多的分类规则,C5.0算法对采样数据的分布没有要求,可以处理离散和连续数据,生成的规则易于理解,分类精度高,可以满足西北干旱区大面积的土地利用/覆被变化制图的需要.%In the broadly northwest arid regions,frequently,same object has different spectral characters because of the special characteristics of land cover change such as complex causes of formation,sensitivity to environment change,rapid and violent change and obvious differences in landscape. The conventional methods of classification including visual interpretation,supervised classification,unsupervised classification,and artificial decision tree classification have disadvantages in the efficiency or the accuracy. In this paper,machine learning algorithm based on C5. 0 decision tree was used to classify the entire study area automatically according to the sample data mining classification rules. Spectral features,NDVI,TC,texture and other informations were involved in the algorithm. More classification rules could be mined by machine learning decision tree. C5. 0 algorithm handling with both continuous and discrete data is independent of the distribution of sampling sites,The classification rules mined by this algorithm were interpretable. Other superiority of this algorithm included the fast speed of training and higher accuracy than many other classifiers. Thus,it is able to be used in the mapping of

  13. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  14. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  15. Decision tree analysis of factors influencing rainfall-related building damage

    Directory of Open Access Journals (Sweden)

    M. H. Spekkers

    2014-04-01

    Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of

  16. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  17. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    Science.gov (United States)

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  18. An Analysis on Performance of Decision Tree Algorithms using Student’s Qualitative Data

    Directory of Open Access Journals (Sweden)

    T.Miranda Lakshmi

    2013-06-01

    Full Text Available Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classification accuracy of CART is higher when compared to ID3 and C4.5. However the difference in classification accuracy between the decision tree algorithms is not considerably higher. The experimental results of decision tree indicate that student’s performance also influenced by qualitative factors.

  19. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  20. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  1. Discovering Patterns in Brain Signals Using Decision Trees

    Directory of Open Access Journals (Sweden)

    Narusci S. Bastos

    2016-01-01

    Full Text Available Even with emerging technologies, such as Brain-Computer Interfaces (BCI systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain’s behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain’s behaviour.

  2. Discovering Patterns in Brain Signals Using Decision Trees

    Science.gov (United States)

    2016-01-01

    Even with emerging technologies, such as Brain-Computer Interfaces (BCI) systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain's behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT) to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain's behaviour. PMID:27688746

  3. Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems

    CERN Document Server

    Gupta, Anupam; Nagarajan, Viswanath; Ravi, R

    2010-01-01

    We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disease, what is a good adaptive strategy to perform these tests to minimize the expected cost to identify the disease? We settle the approximability of this problem by giving a tight $O(\\log m)$-approximation algorithm. We also consider a more substantial generalization, the Adaptive TSP problem. Given an underlying metric space, a random subset $S$ of cities is drawn from a known distribution, but $S$ is initially unknown to us--we get information about whether any city is in $S$ only when we visit the city in question. What is a good adaptive way of visiting all the cities in the random subset $S$ while minimizing the expected distance traveled? For this problem, we give the first poly-logarithmic approximation, and show that this algorithm is best possible unless w...

  4. Decision Tree Classifiers for Star/Galaxy Separation

    CERN Document Server

    Vasconcellos, E C; Gal, R R; LaBarbera, F L; Capelato, H V; Velho, H F Campos; Trevisan, M; Ruiz, R S R

    2010-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14\\le r\\le21$ ($85.2%$) and $r\\ge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15\\le r\\le21$, with m...

  5. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  6. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  7. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  8. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  9. 基于高分一号影像光谱指数识别火烧迹地的决策树方法%Decision Tree Method for Burned Area Identification Based on the Spectral Index of GF-1 WFV Image

    Institute of Scientific and Technical Information of China (English)

    祖笑锋; 覃先林; 尹凌宇; 陈小中; 钟祥清

    2015-01-01

    This paper describes the technique to be needed for rapidly and accurately identifying the burn-ed area by forest fires,following the catastrophic fires by the vegetation index CART decision tree methods using the wide coverage image of GF-1(GF-1 WFV).They were compared between the maximum likeli-hood classification of supervised and unsupervised classification(ISODATA),within burned area indexes, to improve the accuracy of the burned area,shaded vegetation index,global environment monitoring in-dex,improved shadows and bare commission or omission burned phenomenon.The results showed that the decision tree classification method based on CART algorithms for burned area identification has signifi-cantly improved the overall accuracy by 4.38% compared with the maximum likelihood method;Kappa coefficient increased by 0.1024.GF-1 satellite imagery for unsupervised classification(ISODATA)identi-fies the burned area poorly,the overall accuracy and Kappa coefficient are low,the map making accuracy and user accuracy have not reached 1%.%森林火灾发生后,为及时、准确地掌握森林受灾情况,利用高分一号卫星(GF -1)16m 宽幅影像各波段反射率信息,结合计算的归一化植被指数(NDVI)、过火区识别指数(BAI)、阴影植被指数(SVI)、归一化差异水体指数(NDWI)和全球环境监测指数(GEMI)等5种光谱指数,构建森林火烧迹地识别决策树模型(CART);在选取的研究区对该模型方法进行验证,并与最大似然监督分类法和非监督分类(ISODATA)方法所得到的结果精度进行了对比分析,结果表明:采用基于 CART 模型的决策树方法对火烧迹地识别结果精度较最大似然法总体分类精度提高了4.38%,Kappa 系数提高了0.1024,制图精度提高了14.96%,用户精度提高了8.50%;而采用ISODATA 方法识别的火烧迹地的总体精度和 Kappa 系数都较低,制图精度

  10. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    Science.gov (United States)

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  11. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  12. EVALUATION OF DECISION TREE CLASSIFICATION ACCURACY TO MAP LAND COVER IN CAPIXABA, ACRE

    Directory of Open Access Journals (Sweden)

    Symone Maria de Melo Figueiredo

    2006-03-01

    Full Text Available This study evaluated the accuracy of mapping land cover in Capixaba, state of Acre, Brazil, using decision trees. Elevenattributes were used to build the decision trees: TM Landsat datafrom bands 1, 2, 3, 4, 5, and 7; fraction images derived from linearspectral unmixing; and the normalized difference vegetation index (NDVI. The Kappa values were greater than 0,83, producingexcellent classification results and demonstrating that the technique is promising for mapping land cover in the study area.

  13. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  14. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  15. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  16. 基于 C5.0决策树算法的元胞自动机的洪河湿地演化模拟%Cellular Automata Simulation Hong he Wetland Evolution Based on C5 .0 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    于振华; 万鲁河

    2014-01-01

    以洪河自然保护区1992年、2001年、2010年三期TM遥感影像为数据源,利用C5.0决策树算法从已有的数据及其影响因子数据中挖掘出洪河湿地的演变规则,并将获得的转换规则应用到元胞自动机模型中进行洪河湿地演变的动态模拟与预测,分析和探讨了元胞自动机模型在湿地景观模拟和预测中的重要作用。结果表明,在现有的空间变量和条件不变的情况下,在未来的洪河自然保护区湿地面积将减小,洪河自然保护区干旱化将加重。通过对湿地景观的动态变化模拟和预测研究,能够较好地反映湿地景观的动态变化情况。%In this paper, Hong he Nature Reserve in 1992,2001,2010 three TM image as a data source.C5.0 decision tree algorithm using the data from the existing data mining and its influencing factors of the evolution of rules .Hong he Wetland and the obtained transformation rules applied to the cellular automaton model for the evolution of Hong he Wetland dynamic simulation and forecasting analysis and discussion of cellular automata in simulation and prediction of wetland landscape in an important role The results show that the existing spatial variables and conditions remain unchanged in the future Hong he Nature Reserve wetland area will be reduced . Hong he National Nature Reserve drought will increase .Through the study of the dynamic simulation and prediction of wetland land-scape, to better reflect the dynamic changes of wetland landscape .

  17. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  18. Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

    Science.gov (United States)

    Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

    2012-07-01

    Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.

  19. Bar Mechanism Design of Irregular Gear Planetary System Based on the Parallel Algorithm of Granular Computing and Decision Tree%非规则齿轮行星系扎穴机构设计-基于粒计算决策树并行算法

    Institute of Scientific and Technical Information of China (English)

    魏小燕

    2016-01-01

    As a large agricultural country with 21%people in the world , China should develop advanced modern agricul-ture , need to use chemical fertilizer and improve the efficiency of fertilizer use .Compared with the solid state fertilizer , the liquid is more easily absorbed by crops , and the fertilizer use is more direct , efficient and economical .International-ly, Russia, the United States, Australia and other countries have taken the lead in the use of liquid fertilizer .In order to save the fertilizer , improve the utilization rate of the fertilizer , save the economic cost and reduce the pollution to the soil .Based on the parallel algorithm of the decision tree , this paper designs and studies the mechanism of the planetary system of irregular gear , which is a little waste of chemical fertilizer and the absorption efficiency .%作为占据世界21%人口的农业大国,中国要发展先进的现代农业,需要合理使用化学肥料,提高肥料的使用效率。与固态肥料相比较,液态更容易被作物吸收,肥料利用更直接,效率较高,经济成本更低。在国际上,俄罗斯、美国、澳大利亚等国家已经率先使用了液态肥料。为了节省肥料、提高农作物对肥料的吸收利用率、节省经济成本及降低对土壤的污染,基于粒计算决策树并行算法,设计了非规则齿轮行星系扎穴机构。该扎穴装置在作业中,化学肥液浪费较少、吸收效率较高。

  20. Decision trees and decision committee applied to star/galaxy separation problem

    Science.gov (United States)

    Vasconcellos, Eduardo Charles

    Vasconcellos et al [1] study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set. [1] Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R.. Decision Tree Classifiers for Star/Galaxy Separation. The Astronomical Journal, Volume 141, Issue 6, 2011.

  1. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection

    Science.gov (United States)

    Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    Background WHO’s new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. Methods We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. Results 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. Conclusion The decision tree algorithm proposed

  2. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  3. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision treeapproach for predicting smokers' quit intentions usingthe data from the International Tobacco Control FourCountry Survey. Three rule-based classification modelsare generated from three data sets using attributes inrelation to demographics, warning labels, and smokers'beliefs. Both demographic attributes and warning labelattributes are important in predicting smokers' quitintentions. The model's ability to predict smokers' quitintentions is enhanced, if the attributes regardingsmokers' internal motivation and beliefs about quittingare included.

  4. Liver disorder diagnosis using linear, nonlinear and decision tree classification algorithms

    Directory of Open Access Journals (Sweden)

    Aman Singh

    2016-10-01

    Full Text Available In India and across the globe, liver disease is a serious area of concern in medicine. Therefore, it becomes essential to use classification algorithms for assessing the disease in order to improve the efficiency of medical diagnosis which eventually leads to appropriate and timely treatment. The study accordingly implemented various classification algorithms including linear discriminant analysis (LDA, diagonal linear discriminant analysis (DLDA, quadratic discriminant analysis (QDA, diagonal quadratic discriminant analysis (DQDA, naive bayes (NB, feed-forward neural network (FFNN and classification and regression tree (CART in an attempt to enhance the diagnostic accuracy of liver disorder and to reduce the inefficiencies caused by false diagnosis. The results demonstrated that CART had emerged as the best model by achieving higher diagnostic accuracy than LDA, DLDA, QDA, DQDA, NB and FFNN. FFNN stood second in comparison and performed better than rest of the classifiers. After evaluation, it can be said that the precision of a classification algorithm depends on the type and features of a dataset. For the given dataset, decision tree classifier CART outperforms all other linear and nonlinear classifiers. It also showed the capability of assisting clinicians in determining the existence of liver disorder, in attaining better diagnosis and in avoiding delay in treatment.

  5. Establishing the diagnostic model of SCC in cervical cancer by using Logistic regression combined with CHAID analysis of decision tree%Logistic回归联合分类树CHAID法建立SCC在宫颈癌中的辅助诊断模型

    Institute of Scientific and Technical Information of China (English)

    王静; 郑群; 余素飞; 冯贻君; 沈波

    2015-01-01

    目的 采用Logistic回归筛选与宫颈癌相关的血清肿瘤标志物,并进一步使用分类树卡方自动交互检测法(CHAID)建立鳞状上皮细胞癌相关抗原(Scc)在宫颈癌中的辅助诊断模型.方法 回顾性收集2010至2013年浙江省台州医院检测肿瘤标志物的宫颈癌初诊患者581例,宫颈良性疾病者342例,健康体检者341名,检测其糖类抗原199(CA199)、糖类抗原125(CA125)、CEA、SCC、AFP水平.先采用Logistic回归筛选出有统计学意义的肿瘤标志物,再进一步采用决策树CHAID法确定上述肿瘤标志物在辅助诊断宫颈癌中的价值.最后收集2014年1至12月SCC高于本研究得出的诊断值的子宫相关疾病患者共284例,计算其中的宫颈癌患者比例来验证决策树CHAID法结果.结果 Logistic回归结果显示5类可能与宫颈癌相关的肿瘤标志物中仅SCC具有统计学意义(Wald x2=22.120,P=0.000),OR值及其95% CI为1.900(1.454 ~2.483).随着SCC数值的升高,宫颈癌患者的比例也逐渐增高,当SCC>2.20 μg/L时,阳性预测值达94.7%.284例SCC高于2.20 μg/L的考虑子宫相关疾病的人群中,最终证实为宫颈癌的比例为95.1%(270例).结论 SCC对于官颈癌患者具有较好的辅助诊断价值.%Objective To explore the relationship between serum tumor markers and cervical cancer by using Logistic regression, and to further establish the diagnosis model of squamous cell carcinoma antigen (SCC) in cervical cancer by using chi-squared automatic interaction detector (CHAID) analysis of decision tree.Methods Total of 581 cases of cervical cancer,342 cases of cervical benign diseases and 341 cases of healthy controls who detected tumor markers in Taizhou Hospital of Zhejiang during 2010-2013, were retrospectively studied.The test results of carbohydrate antigen 199 (CA199), carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA), SCC, and alpha fetoprotein (AFP) were reviewed.The Logistic regression were

  6. Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies

    Directory of Open Access Journals (Sweden)

    Joaquín Texeira Quirós

    2013-09-01

    Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies

  7. P2P Domain Classification using Decision Tree

    CERN Document Server

    Ismail, Anis

    2011-01-01

    In Peer-to-Peer context, a challenging problem is how to find the appropriate peer to deal with a given query without overly consuming bandwidth? Different methods proposed routing strategies of queries taking into account the P2P network at hand. This paper considers an unstructured P2P system based on an organization of peers around Super-Peers that are connected to Super-Super- Peer according to their semantic domains; By analyzing the queries log file, a predictive model that avoids flooding queries in the P2P network is constructed after predicting the appropriate Super-Peer, and hence the peer to answer the query. A challenging problem in a schema-based Peer-to-Peer (P2P) system is how to locate peers that are relevant to a given query. In this paper, architecture, based on (Super-)Peers is proposed, focusing on query routing. The approach to be implemented, groups together (Super-)Peers that have similar interests for an efficient query routing method. In such groups, called Super-Super-Peers (SSP), Su...

  8. The Construction of Jasmine Tea Flavor Index and Decision Tree Model in Identificating Scenting Quality%利用茉莉花茶香气指数鉴定其窨制品质及构建决策树模型

    Institute of Scientific and Technical Information of China (English)

    唐夏妮; 夏益民; 雷永宏; 王校常; 林杰

    2016-01-01

    The ‘jasmine tea flavor’ (JIF) index of well-scented, poor-scented and not-scented jasmine teas was determined by analysis of the volatile components of 32 samples combined with relative references. Results showed that the JTF index was highly correlated with the total volatiles. Principal component analysis (PCA) were performed on 29 volatile compounds in 32 jasmine tea samples. It can be found that the JTF index and 29 volatile compounds were very helpful for identification of fake jasmine teas. In addition, the Quality Determination Model for jasmine tea was constructed with two decision nodes (JTF index=0.915, less than 4 components were missed in the twenty-nine volatile compounds). It was confirmed that the criteria was feasible for the quality identification of jasmine tea. Moreover, the construction of a decision tree model will provide a promising method for quality control and technology development of jasmine tea, especially for the admixture (such as synthetic fragrance oil).%在检测分析32个茉莉花茶样挥发性成分的基础上,结合文献引证分析了茉莉花茶、茉莉花、茉莉花干和茶坯的茉莉花茶香气指数(Jasmine tea flavor index,JTF index)。结果表明,JTF值与茉莉鲜花的芳香物质整体挥发水平存在相关性;针对29种特征挥发物对32个茉莉花茶茶样进行主成分分析(Principal component analysis,PCA),得到不同窨制品质的显著聚类特征,推断 JTF指数与29种特征挥发物的缺失可作为茉莉花茶窨制品质判定指标。进一步构建这32个茉莉花茶茶样窨制品质的决策树鉴定模型(判别准确率为93.8%),确定特征挥发物的缺失峰数(节点为<4)和 JTF 值(节点为0.915)为判定节点,证实了 JTF 值与29种特征挥发物的缺失可应用于茉莉花茶窨制品质鉴定与掺假判别,并且建立决策树模型,能够快速、准确检测出不同窨制品质的茉莉花茶,尤其为掺假

  9. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  10. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  11. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    Science.gov (United States)

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  12. Visualization of Decision Tree State for the Classification of Parkinson's Disease

    NARCIS (Netherlands)

    Valentijn, E

    2016-01-01

    Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node wi

  13. Agent-based modeling of sustainable behaviors

    CERN Document Server

    Sánchez-Maroño, Noelia; Fontenla-Romero, Oscar; Polhill, J; Craig, Tony; Bajo, Javier; Corchado, Juan

    2017-01-01

    Using the O.D.D. (Overview, Design concepts, Detail) protocol, this title explores the role of agent-based modeling in predicting the feasibility of various approaches to sustainability. The chapters incorporated in this volume consist of real case studies to illustrate the utility of agent-based modeling and complexity theory in discovering a path to more efficient and sustainable lifestyles. The topics covered within include: households' attitudes toward recycling, designing decision trees for representing sustainable behaviors, negotiation-based parking allocation, auction-based traffic signal control, and others. This selection of papers will be of interest to social scientists who wish to learn more about agent-based modeling as well as experts in the field of agent-based modeling.

  14. Soft context clustering for F0 modeling in HMM-based speech synthesis

    Science.gov (United States)

    Khorram, Soheil; Sameti, Hossein; King, Simon

    2015-12-01

    This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure

  15. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    Science.gov (United States)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  16. Nitrogen removal influence factors in A/O process and decision trees for nitrification/denitrification system

    Institute of Scientific and Technical Information of China (English)

    MA Yong; PENG Yong-zhen; WANG Shu-ying; WANG Xiao-lian

    2004-01-01

    In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.

  17. Condition monitoring on grinding wheel wear using wavelet analysis and decision tree C4.5 algorithm

    Directory of Open Access Journals (Sweden)

    S.Devendiran

    2013-10-01

    Full Text Available A new online grinding wheel wear monitoring approach to detect a worn out wheel, based on acoustic emission (AE signals processed by discrete wavelet transform and statistical feature extraction carried out using statistical features such as root mean square and standard deviation for each wavelet decomposition level and classified using tree based knowledge representation methodology decision tree C4.5 data mining techniques is proposed. The methodology was validate with AE signal data obtained in Aluminium oxide 99 A(38A grinding wheel which is used in three quarters of majority grinding operations under different grinding conditions to validate the proposed classification system. The results of this scheme with respect to classification accuracy were discussed.

  18. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    Directory of Open Access Journals (Sweden)

    Hedayetul Islam Shovon

    2012-08-01

    Full Text Available Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student’s GPA and based on that instructor can take necessary step to improve student academic performance

  19. EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE

    Directory of Open Access Journals (Sweden)

    S. Anupama Kumar

    2011-07-01

    Full Text Available Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps were taken to improve the performance of the students who were predicted to fail. After the declaration of the results in the final examination the marks obtained by the students are fed into the system and the results were analyzed. The comparative analysis of the results states that the prediction has helped the weaker students to improve and brought out betterment in the result. To analyse the accuracy of the algorithm, it is compared with ID3 algorithm and found to be more efficient in terms of the accurately predicting the outcome of the student and time taken to derive the tree. Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps

  20. Transient Stability Assessment using Decision Trees and Fuzzy Logic Techniques

    Directory of Open Access Journals (Sweden)

    A. Y. Abdelaziz

    2013-09-01

    Full Text Available Many techniques are used for Transient Stability assessment (TSA of synchronous generators encompassing traditional time domain state numerical integration, Lyapunov based methods, probabilistic approaches and Artificial Intelligence (AI techniques like pattern recognition and artificial neural networks.This paper examines another two proposed artificial intelligence techniques to tackle the transient stability problem. The first technique is based on the Inductive Inference Reasoning (IIR approach which belongs to a particular family of machine learning from examples. The second presents a simple fuzzy logic classifier system for TSA. Not only steady state but transient attributes are used for transient stability estimation so as to reflect machine dynamics and network changes due to faults.The two techniques are tested on a standard test power system. The performance evaluation demonstrated satisfactory results in early detection of machine instability. The advantage of the two techniques is that they are straightforward and simple for on-line implementation.

  1. Intrusion Preventing System using Intrusion Detection System Decision Tree Data Mining

    Directory of Open Access Journals (Sweden)

    Syurahbil

    2009-01-01

    Full Text Available Problem statement: To distinguish the activities of the network traffic that the intrusion and normal is very difficult and to need much time consuming. An analyst must review all the data that large and wide to find the sequence of intrusion on the network connection. Therefore, it needs a way that can detect network intrusion to reflect the current network traffics. Approach: In this study, a novel method to find intrusion characteristic for IDS using decision tree machine learning of data mining technique was proposed. Method used to generate of rules is classification by ID3 algorithm of decision tree. Results: These rules can determine of intrusion characteristics then to implement in the firewall policy rules as prevention. Conclusion: Combination of IDS and firewall so-called the IPS, so that besides detecting the existence of intrusion also can execute by doing deny of intrusion as prevention.

  2. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  3. Three-dimensional object recognition using similar triangles and decision trees

    Science.gov (United States)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  4. A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining

    CERN Document Server

    Kadampur, Mohammad Ali

    2010-01-01

    Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their enterprises. These data sets typically contain sensitive individual information, which consequently get exposed to the other parties. Though we cannot deny the benefits of knowledge discovery that comes through data mining, we should also ensure that data privacy is maintained in the event of data mining. Privacy preserving data mining is a specialized activity in which the data privacy is ensured during data mining. Data privacy is as important as the extracted knowledge and efforts that guarantee data privacy during data mining are encouraged. In this paper we propose a strategy that protects the data privacy during decision tree analysis of data mining process. We propose to add specific noise to the numeric attributes after exploring the decision tree of the original data. T...

  5. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  6. Assessment of Landslide Susceptibility by Decision Trees in the Metropolitan Area of Istanbul, Turkey

    Directory of Open Access Journals (Sweden)

    H. A. Nefeslioglu

    2010-01-01

    Full Text Available The main purpose of the present study is to investigate the possible application of decision tree in landslide susceptibility assessment. The study area having a surface area of 174.8 km2 locates at the northern coast of the Sea of Marmara and western part of Istanbul metropolitan area. When applying data mining and extracting decision tree, geological formations, altitude, slope, plan curvature, profile curvature, heat load and stream power index parameters are taken into consideration as landslide conditioning factors. Using the predicted values, the landslide susceptibility map of the study area is produced. The AUC value of the produced landslide susceptibility map has been obtained as 89.6%. According to the results of the AUC evaluation, the produced map has exhibited a good enough performance.

  7. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    Directory of Open Access Journals (Sweden)

    Yang Youjin

    2016-01-01

    Full Text Available Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  8. Development of a decision tree to classify the most accurate tissue-specific tissue to plasma partition coefficient algorithm for a given compound.

    Science.gov (United States)

    Yun, Yejin Esther; Cotton, Cecilia A; Edginton, Andrea N

    2014-02-01

    Physiologically based pharmacokinetic (PBPK) modeling is a tool used in drug discovery and human health risk assessment. PBPK models are mathematical representations of the anatomy, physiology and biochemistry of an organism and are used to predict a drug's pharmacokinetics in various situations. Tissue to plasma partition coefficients (Kp), key PBPK model parameters, define the steady-state concentration differential between tissue and plasma and are used to predict the volume of distribution. The experimental determination of these parameters once limited the development of PBPK models; however, in silico prediction methods were introduced to overcome this issue. The developed algorithms vary in input parameters and prediction accuracy, and none are considered standard, warranting further research. In this study, a novel decision-tree-based Kp prediction method was developed using six previously published algorithms. The aim of the developed classifier was to identify the most accurate tissue-specific Kp prediction algorithm for a new drug. A dataset consisting of 122 drugs was used to train the classifier and identify the most accurate Kp prediction algorithm for a certain physicochemical space. Three versions of tissue-specific classifiers were developed and were dependent on the necessary inputs. The use of the classifier resulted in a better prediction accuracy than that of any single Kp prediction algorithm for all tissues, the current mode of use in PBPK model building. Because built-in estimation equations for those input parameters are not necessarily available, this Kp prediction tool will provide Kp prediction when only limited input parameters are available. The presented innovative method will improve tissue distribution prediction accuracy, thus enhancing the confidence in PBPK modeling outputs.

  9. A gradient-descent-based approach for transparent linguistic interface generation in fuzzy models.

    Science.gov (United States)

    Chen, Long; Chen, C L Philip; Pedrycz, Witold

    2010-10-01

    Linguistic interface is a group of linguistic terms or fuzzy descriptions that describe variables in a system utilizing corresponding membership functions. Its transparency completely or partly decides the interpretability of fuzzy models. This paper proposes a GRadiEnt-descEnt-based Transparent lInguistic iNterface Generation (GREETING) approach to overcome the disadvantage of traditional linguistic interface generation methods where the consideration of the interpretability aspects of linguistic interface is limited. In GREETING, the widely used interpretability criteria of linguistic interface are considered and optimized. The numeric experiments on the data sets from University of California, Irvine (UCI) machine learning databases demonstrate the feasibility and superiority of the proposed GREETING method. The GREETING method is also applied to fuzzy decision tree generation. It is shown that GREETING generates better transparent fuzzy decision trees in terms of better classification rates and comparable tree sizes.

  10. Decision Trees in the Analysis of the Intensity of Damage to Portal Frame Buildings in Mining Areas / Drzewa Decyzyjne W Analizie Intensywności Uszkodzeń Budynków Halowych Na Terenach Górniczych

    Science.gov (United States)

    Firek, Karol; Rusek, Janusz; Wodyński, Aleksander

    2015-09-01

    The article presents a preliminary database analysis regarding the technical condition of 94 portal frame buildings located in the mining area of Legnica-Głogów Copper District (LGOM), using the methodology of decision trees. The scope of the analysis was divided into two stages. The first one included creating a decision tree by a standard CART method, and determining the importance of individual damage indices in the values of the technical wear of buildings. The second one was based on verification of the created decision tree and the importance of these indices in the technical wear of buildings by means of a simulation of individual dendritic models using the method of random forest. The obtained results confirmed the usefulness of decision trees in the early stage of data analysis. This methodology allows to build the initial model to describe the interaction between variables and to infer about the importance of individual input variables. Celem prezentowanych w artykule badań było sprawdzenie możliwości pozyskiwania informacji na temat udziału uszkodzeń w zużyciu technicznym zabudowy terenu górniczego z wykorzystaniem metody drzew decyzyjnych. Badania przeprowadzono na podstawie utworzonej przez autorów bazy danych o stanie technicznym i uszkodzeniach 94 budynków typu halowego, usytuowanych na terenie górniczym Legnicko-Głogowskiego Okręgu Miedziowego (LGOM). Do analiz przyjęto metodę drzew decyzyjnych CART - Classification & Regression Tree, na bazie której utworzono model aproksymujący wartość zużycia technicznego budynków. W efekcie ustalono wpływ poszczególnych zmiennych na przebieg modelowanego procesu (Rys. 3 i 4). W drugim etapie, stosując metodę losowych lasów przeprowadzono weryfikację wyników uzyskanych dla modelu utworzonego metodą CART (Tab. 2). Przeprowadzone badania pozwoliły na ustalenie udziałów wyspecyfikowanych kategorii uszkodzeń elementów badanych budynków w ich stopniu zużycia technicznego. Najwi

  11. Performance Evaluation of Discriminant Analysis and Decision Tree, for Weed Classification of Potato Fields

    Directory of Open Access Journals (Sweden)

    Farshad Vesali

    2012-09-01

    Full Text Available In present study we tried to recognizing weeds in potato fields to effective use from herbicides. As we know potato is one of the crops which is cultivated vastly all over the world and it is a major world food crop that is consumed by over one billion people world over, but it is threated by weed invade, because of row cropping system applied in potato tillage. Machine vision is used in this research for effective application of herbicides in field. About 300 color images from 3 potato farms of Qorveh city and 2 farms of Urmia University-Iran, was acquired. Images were acquired in different illumination condition from morning to evening in sunny and cloudy days. Because of overlap and shading of plants in farm condition it is hard to use morphologic parameters. In method used for classifying weeds and potato plants, primary color components of each plant were extracted and the relation between them was estimated for determining discriminant function and classifying plants using discrimination analysis. In addition the decision tree method was used to compare results with discriminant analysis. Three different classifications were applied: first, Classification was applied to discriminate potato plant from all other weeds (two groups, the rate of correct classification was 76.67% for discriminant analysis and 83.82% for decision tree; second classification was applied to discriminate potato plant from separate groups of each weed (6 groups, the rate of correct classification was 87%. And the third, Classification of potato plant versus weed species one by one. As the weeds were different, the results of classification were different in this composition. The decision tree in all conditions showed the better result than discriminant analysis.

  12. Decision Tree Complexity of Graph Properties with Dimension at Most 5

    Institute of Scientific and Technical Information of China (English)

    高随祥; 林国辉

    2000-01-01

    A graph property is a set of graphs such that if the set contains some graph G then it also contains each isomorphic copy of G (with the same vertex set). A graph property P on n vertices is said to be elusive, if every decision tree algorithm recognizing P must examine all n(n - 1)/2 pairs of vertices in the worst case. Karp conjectured that every nontrivial monotone graph property is elusive. In this paper, this conjecture is proved for some cases. Especially, it is shown that if the abstract simplicial complex of a nontrivial monotone graph property P has dimension not exceeding 5, then P is elusive.

  13. Use of decision trees for evaluating severe accident management strategies in nuclear power plants

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moosung [Hanyang Univ., Seoul (Korea, Republic of). Dept. of Nuclerar Engineering; Lee, Yongjin; Jerng, Dong Wook [Chung-Ang Univ., Seoul (Korea, Republic of). School of Energy Systems Engineering

    2016-07-15

    Accident management strategies are defined to innovative actions taken by plant operators to prevent core damage or to maintain the sound containment integrity. Such actions minimize the chance of offsite radioactive substance leaks that lead to and intensify core damage under power plant accident conditions. Accident management extends the concept of Defense in Depth against core meltdown accidents. In pressurized water reactors, emergency operating procedures are performed to extend the core cooling time. The effectiveness of Severe Accident Management Guidance (SAMG) became an important issue. Severe accident management strategies are evaluated with a methodology utilizing the decision tree technique.

  14. The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, Central Iran

    Science.gov (United States)

    Zaremotlagh, S.; Hezarkhani, A.

    2017-04-01

    Some evidences of rare earth elements (REE) concentrations are found in iron oxide-apatite (IOA) deposits which are located in Central Iranian microcontinent. There are many unsolved problems about the origin and metallogenesis of IOA deposits in this district. Although it is considered that felsic magmatism and mineralization were simultaneous in the district, interaction of multi-stage hydrothermal-magmatic processes within the Early Cambrian volcano-sedimentary sequence probably caused some epigenetic mineralizations. Secondary geological processes (e.g., multi-stage mineralization, alteration, and weathering) have affected on variations of major elements and possible redistribution of REE in IOA deposits. Hence, the geochemical behaviors and distribution patterns of REE are expected to be complicated in different zones of these deposits. The aim of this paper is recognizing LREE distribution patterns based on whole-rock chemical compositions and automatic discovery of their geochemical rules. For this purpose, the pattern recognition techniques including decision tree and neural network were applied on a high-dimensional geochemical dataset from Choghart IOA deposit. Because some data features were irrelevant or redundant in recognizing the distribution patterns of each LREE, a greedy attribute subset selection technique was employed to select the best subset of predictors used in classification tasks. The decision trees (CART algorithm) were pruned optimally to more accurately categorize independent test data than unpruned ones. The most effective classification rules were extracted from the pruned tree to describe the meaningful relationships between the predictors and different concentrations of LREE. A feed-forward artificial neural network was also applied to reliably predict the influence of various rock compositions on the spatial distribution patterns of LREE with a better performance than the decision tree induction. The findings of this study could be

  15. Ontology-Based Classification System Development Methodology

    OpenAIRE

    2015-01-01

    The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision ...

  16. Effective use of Fibro Test to generate decision trees in hepatitis C

    Institute of Scientific and Technical Information of China (English)

    Dana Lau-Corona; Luís Alberto Pineda; Héctor Hugo Aviés; Gabriela Gutiérrez-Reyes; Blanca Eugenia Farfan-Labonne; Rafael Núnez-Nateras; Alan Bonder; Rosalinda Martínez-García; Clara Corona-Lau; Marco Antonio Olivera-Martíanez; Maria Concepción Gutiérrez-Ruiz; Guillermo Robles-Díaz; David Kershenobich

    2009-01-01

    AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein,haptoglobin, α2 macroglobulin, and γ-glutamyl FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of F0 and F4 were classified with very high accuracy (18/20 for F0, 9/9 for F0-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in F0 and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression.transpeptidase were used as predictors, and the FibroTest

  17. Comparison of Attribute Reduction Methods for Coronary Heart Disease Data by Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    ZHENG Gang; HUANG Yalou; WANG Pengtao; SHU Guangfu

    2005-01-01

    Attribute reduction is necessary in decision making system. Selecting right attribute reduction method is more important. This paper studies the reduction effects of principal components analysis (PCA) and system reconstruction analysis (SRA) on coronary heart disease data. The data set contains 1723 records, and 71 attributes in each record. PCA and SRA are used to reduce attributes number (less than 71 ) in the data set. And then decision tree algorithms, C4.5, classification and regression tree ( CART), and chi-square automatic interaction detector ( CHAID), are adopted to analyze the raw data and attribute reduced data. The parameters of decision tree algorithms, including internal node number, maximum tree depth, leaves number, and correction rate are analyzed. The result indicates that, PCA and SRA data can complete attribute reduction work,and the decision-making rate on the reduced data is quicker than that on the raw data; the reduction effect of PCA is better than that of SRA, while the attribute assertion of SRA is better than that of PCA. PCA and SRA methods exhibit goodperformance in selecting and reducing attributes.

  18. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  19. Using decision trees to predict benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea.

    Science.gov (United States)

    Pesch, Roland; Pehlke, Hendrik; Jerosch, Kerstin; Schröder, Winfried; Schlüter, Michael

    2008-01-01

    In this article a concept is described in order to predict and map the occurrence of benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea. The approach consists of two work steps: (1) geostatistical analysis of abiotic measurement data and (2) calculation of benthic provinces by means of Classification and Regression Trees (CART) and GIS-techniques. From bottom water measurements on salinity, temperature, silicate and nutrients as well as from punctual data on grain size ranges (0-20, 20-63, 63-2,000 mu) raster maps were calculated by use of geostatistical methods. At first the autocorrelation structure was examined and modelled with help of variogram analysis. The resulting variogram models were then used to calculate raster maps by applying ordinary kriging procedures. After intersecting these raster maps with punctual data on eight benthic communities a decision tree was derived to predict the occurrence of these communities within the study area. Since such a CART tree corresponds to a hierarchically ordered set of decision rules it was applied to the geostatistically estimated raster data to predict benthic habitats within and near the EEZ.

  20. Rejecting Non-MIP-Like Tracks using Boosted Decision Trees with the T2K Pi-Zero Subdetector

    Science.gov (United States)

    Hogan, Matthew; Schwehr, Jacklyn; Cherdack, Daniel; Wilson, Robert; T2K Collaboration

    2016-03-01

    Tokai-to-Kamioka (T2K) is a long-baseline neutrino experiment with a narrow band energy spectrum peaked at 600 MeV. The Pi-Zero detector (PØD) is a plastic scintillator-based detector located in the off-axis near detector complex 280 meters from the beam origin. It is designed to constrain neutral-current induced π0 production background at the far detector using the water target which is interleaved between scintillator layers. A PØD-based measurement of charged-current (CC) single charged pion (1π+) production on water is being developed which will have expanded phase space coverage as compared to the previous analysis. The signal channel for this analysis, which for T2K is dominated by Δ production, is defined as events that produce a single muon, single charged pion, and any number of nucleons in the final state. The analysis will employ machine learning algorithms to enhance CC1π+ selection by studying topological observables that characterize signal well. Important observables for this analysis are those that discriminate a minimum ionizing particle (MIP) like a muon or pion from a proton at the T2K energies. This work describes the development of a discriminator using Boosted Decision Trees to reject non-MIP-like PØD tracks.

  1. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    Science.gov (United States)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  2. 决策树与数据仓库结合的研究与应用%Research and Application of Combination Between Decision Tree and Data Warehouse

    Institute of Scientific and Technical Information of China (English)

    沈学利; 钟华

    2011-01-01

    To the common of decision tree and data warehouse in the data mining field, this paper creatively combines both of them.In order to provide fully decision support, it includes combination of representation and operation based on Online Analytical Processing(OLAP).After adopting the combination, the problems such as high storage occupation, Iow query speed and high error probability of classified decision information which resulted from decision tree's increasing are conquered.After applying this combination to some travel agency's client information database, this project turns out a affirmative feasibility and superiority.%针对决策树因生长而导致的占用存储空间大、查询速度慢、提供分类决策信息失误率高等弊端,研究数据挖掘中决策树与数据仓库的理论共性,提出将决策树与数据仓库相结合,包括表示方法的结合和基于OLAP操作的结合,以达到提供全新决策支持的目的.应用结果证明了该结合的可行性与优越性.

  3. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  4. Decision tree method applied to computerized prediction of ternary intermetallic compounds

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Decision tree method and atomic parameters were used to find the regularities of the formation of ternary intermetallic compounds in alloy systems. The criteria of formation can be expressed by a group of inequalities with two kinds of atomic parameters Zl (number of valence electrons in the atom of constituent element) and Ri/Rj (ratio of the atomic radius of constituent element i and j) as independent variables. The data of 2238 known ternary alloy systems were used to extract the empirical rules governing the formation of ternary intermetallic compounds, and the facts of ternary compound formation of other 1334 alloy systems were used as samples to test the reliability of the empirical criteria found. The rate of correctness of prediction was found to be nearly 95%. An expert system for ternary intermetallic compound formation was built and some prediction results of the expert system were confirmed.

  5. Multi-output decision trees for lesion segmentation in multiple sclerosis

    Science.gov (United States)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  6. Comparison of CIV, SIV and AIV using Decision Tree and SVM

    Directory of Open Access Journals (Sweden)

    Park Hyorin

    2016-01-01

    Full Text Available The H3N2, the canine influenza virus has numerous types of animal hosts that can live and reproduce on. They mostly settle on pigs and birds. However, some concerned voices are rising that there is high possibility that humans could be an additional victim for the canine flu. Consequently, our project group expect that the information about the H3N2’s DNA are valuable, since the information could attribute to development of vaccine and medicine. In the experiments of analysing the properties of CIV, Canine Influenza Virus with the comparison of SIV, Swine Influenza Virus and AIV, Avian Influenza Virus with the decision tree and SVM, Support Vector Machine. The result came out that CIV, SIV and AIV are alike but also different in some aspects.

  7. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  8. Simulation of human behavior elements in a virtual world using decision trees

    Directory of Open Access Journals (Sweden)

    Sandra Mercado Pérez

    2013-05-01

    Full Text Available Human behavior refers to the way an individual responds to certain events or occurrences, naturally cannot predict how an individual can act, for it the computer simulation is used. This paper presents the development of the simulation of five possible human reactions within a virtual world, as well as the steps needed to create a decision tree that supports the selection of any of any of these reactions. For that creation it proposes three types of attributes, those are the personality, the environment and the level of reaction. The virtual world Second Life was selected because of its internal programming language LSL (Linden Scripting Language which allows the execution of predefined animation sequences or creates your own.

  9. Using Decision Trees in Data Mining for Predicting Factors Influencing of Heart Disease

    Directory of Open Access Journals (Sweden)

    Moloud Abdar

    2015-12-01

    Full Text Available Statistics from the World Health Organization (WHO shows that heart disease is one of the leading causes of mortality all over the world. Because of the importance of heart disease, in recent years, many studies have been conducted on this disease using data mining. The main objective of this study is to find a better decision tree algorithm and then use the algorithm for extracting rules in predicting heart disease. Cleveland data, including 303 records are used for this study. These data include 13 features and we have categorized them into five classes. In this paper, C5.0 algorithm with a accuracy value of 85.33% has a better performance compared to the rest of the algorithms used in this study. Considering the rules created by this algorithm, the attributes of Trestbps, Restecg, Thalach, Slope, Oldpeak, and CP were extracted as the most influential causes in predicting heart disease.

  10. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  11. Ontology-Based Classification System Development Methodology

    Directory of Open Access Journals (Sweden)

    Grabusts Peter

    2015-12-01

    Full Text Available The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision tree-based classification systems has been researched. Using such methodologies, the classification accuracy in some cases can be improved.

  12. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva;

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study...

  13. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    Science.gov (United States)

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  14. Inductive Decision Tree Analysis of the Validity Rank of Construction Parameters of Innovative Gear Pump after Tooth Root Undercutting

    Directory of Open Access Journals (Sweden)

    Deptuła A.

    2017-02-01

    Full Text Available The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.

  15. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  16. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    Science.gov (United States)

    Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen

    2017-04-01

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241Am-9Be and 252Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions.

  17. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  18. Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

    Science.gov (United States)

    Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

    2016-05-01

    Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.

  19. The Use of Decision Tree Flowchart in Stomatology Education%决策树流程图辅助口腔临床教学经验介绍

    Institute of Scientific and Technical Information of China (English)

    周敏; 刘宏伟; 何园

    2013-01-01

    Objective:To investigate feasibility of the decision tree flowchart model applying into the clinical teaching of stomatology. Methods: First, a clinical problem of a patient was selected as the target. Then the students were ordered to list all the different possible conditions of the clinical problem or its classifications, and list the indications / contraindications of each treatment method. Finally, a decision tree flowchart was established after the completion of the tasks above. Results: This teaching mode gave full play to the initiative and enthusiasm of the students, which also helped them to classify and summarize the knowledge and developed their logical thinking. It was welcomed and very satisfying for most students. Conclusion: It's more active and effective in dentistry clinical teaching with the help of the decision tree flowchart modal.%目的:探讨将决策树流程图模式应用于口腔临床教学的可行性.方法:2010-08-2012-12期间,对进入牙周科轮转的20名住院医师,临床理论教学采用了决策树流程图方法.以某一临床问题为目标,引导学生通过列举出与这一目标问题的相关分类、不同的临床情况以及所有相应治疗方式的适应证和禁忌证,从而构建出决策树模型.结果:在这一教学模式中学生可以充分发挥积极性,将多个知识点进行归类、梳理和归纳,调动了发散思维和逻辑思维,获得学生好评,取得了满意的教学效果.结论:利用决策树流程图进行教学,可以使口腔临床教学更加积极有效.

  20. DECISION TREE CONSTRUCTION AND COST-EFFECTIVENESS ANALYSIS OF TREATMENT OF ULCERATIVE COLITIS WITH PENTASA® MESALAZINE 2 G SACHET

    Directory of Open Access Journals (Sweden)

    Alvaro Mitsunori NISHIKAWA

    2013-12-01

    Full Text Available Context Unspecified Ulcerative Rectocolitis is a chronic disease that affects between 0.5 and 24.5/105 inhabitants in the world. National and international clinical guidelines recommend the use of aminosalicylates (including mesalazine as first-line therapy for induction of remission of unspecified ulcerative rectocolitis, and recommend the maintenance of these agents after remission is achieved. However, multiple daily doses required for the maintenance of disease remission compromise compliance with treatment, which is very low (between 45% and 65%. Use of mesalazina in granules (2 g sachet once daily - Pentasa® sachets 2 g - can enhance treatment adherence, reflecting in an improvement in patients' outcomes. Objective To evaluate the evidence on the use of mesalazine for the maintenance of remission in patients with unspecified ulcerative rectocolitis and its effectiveness when taken once versus more than once a day. From an economic standpoint, to analyze the impact of the adoption of this dosage in Brazil's public health system, considering patients' adherence to treatment. Methods A decision tree was developed based on the Clinical Protocol and Therapeutic Guidelines for Ulcerative Colitis, published by the Ministry of Health in the lobby SAS/MS n° 861 of November 4 th, 2002 and on the algorithms published by the Associação Brasileira de Colite Ulcerativa e Doença de Crohn, aiming to get the cost-effectiveness of mesalazine once daily in granules compared with mesalazine twice daily in tablets. Results The use of mesalazine increases the chances of remission induction and maintenance when compared to placebo, and higher doses are associated with greater chance of success without increasing the risk of adverse events. Conclusion The use of a single daily dose in the maintenance of remission is effective and related to higher patient compliance when compared to the multiple daily dose regimens, with lower costs.

  1. Effect of training characteristics on object classification: an application using Boosted Decision Trees

    CERN Document Server

    Sevilla-Noarbe, Ignacio

    2015-01-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects tha...

  2. A Fuzzy Optimization Technique for the Prediction of Coronary Heart Disease Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Persi Pamela. I

    2013-06-01

    Full Text Available Data mining along with soft computing techniques helps to unravel hidden relationships and diagnose diseases efficiently even with uncertainties and inaccuracies. Coronary Heart Disease (CHD is akiller disease leading to heart attack and sudden deaths. Since the diagnosis involves vague symptoms and tedious procedures, diagnosis is usually time-consuming and false diagnosis may occur. A fuzzy system is one of the soft computing methodologies is proposed in this paper along with a data mining technique for efficient diagnosis of coronary heart disease. Though the database has 76 attributes, only 14 attributes are found to be efficient for CHD diagnosis as per all the published experiments and doctors’ opinion. So only the essential attributes are taken from the heart disease database. From these attributes crisp rules are obtained by employing CART decision tree algorithm, which are then applied to the fuzzy system. A Particle Swarm Optimization (PSO technique is applied for the optimization of the fuzzy membership functions where the parameters of the membership functions are altered to new positions. The result interpreted from the fuzzy system predicts the prevalence of coronary heart disease and also the system’s accuracy was found to be good.

  3. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    Science.gov (United States)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  4. Validation of probability equation and decision tree in predicting subsequent dengue hemorrhagic fever in adult dengue inpatients in Singapore.

    Science.gov (United States)

    Thein, Tun L; Leo, Yee-Sin; Lee, Vernon J; Sun, Yan; Lye, David C

    2011-11-01

    We developed a probability equation and a decision tree from 1,973 predominantly dengue serotype 1 hospitalized adult dengue patients in 2004 to predict progression to dengue hemorrhagic fever (DHF), applied in our clinic since March 2007. The parameters predicting DHF were clinical bleeding, high serum urea, low serum protein, and low lymphocyte proportion. This study validated these in a predominantly dengue serotype 2 cohort in 2007. The 1,017 adult dengue patients admitted to Tan Tock Seng Hospital, Singapore had a median age of 35 years. Of 933 patients without DHF on admission, 131 progressed to DHF. The probability equation predicted DHF with a sensitivity (Sn) of 94%, specificity (Sp) 17%, positive predictive value (PPV) 16%, and negative predictive value (NPV) 94%. The decision tree predicted DHF with a Sn of 99%, Sp 12%, PPV 16%, and NPV 99%. Both tools performed well despite a switch in predominant dengue serotypes.

  5. Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    CERN Document Server

    Ball, N M; Myers, A D; Tcheng, D; Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-01-01

    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness an...

  6. Cost-effectiveness of exercise {sup 201}Tl myocardial SPECT in patients with chest pain assessed by decision-tree analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kosuda, Shigeru; Momiyama, Yukihiko; Ohsuzu, Fumitaka; Kusano, Shoichi [National Defense Medical Coll., Tokorozawa, Saitama (Japan); Ichihara, Kiyoshi

    1999-09-01

    To evaluate the potential cost-effectiveness of exercise {sup 201}Tl myocardial SPECT in outpatients with angina-like chest pain, we developed a decision-tree model which comprises three 1000-patients groups, i.e., a coronary arteriography (CAG) group, a follow-up group, and a SPECT group, and total cost and cardiac events, including cardiac deaths, were calculated. Variables used for the decision-tree analysis were obtained from references and the data available at out hospital. The sensitivity and specificity of {sup 201}Tl SPECT for diagnosing angina pectoris, and its prevalence were assumed to be 95%, 85%, and 33%, respectively. The mean costs were 84.9 x 10{sup 4} yen/patient in the CAG group, 30.2 x 10{sup 4} yen/patient in the follow-up group, and 71.0 x 10{sup 4} yen/patient in the SPECT group. The numbers of cardiac events and cardiac deaths were 56 and 15, respectively in the CAG group, 264 and 81 in the follow-up group, and 65 and 17 in the SPECT group. SPECT increases cardiac events and cardiac deaths by 0.9% and 0.2%, but it reduces the number of CAG studies by 50.3%, and saves 13.8 x 10{sup 4} yen/patient, as compared to the CAG group. In conclusion, the exercise {sup 201}Tl myocardial SPECT strategy for patients with chest pain has the potential to reduce health care costs in Japan. (author)

  7. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  8. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  9. Agent Based Model of Livestock Movements

    Science.gov (United States)

    Miron, D. J.; Emelyanova, I. V.; Donald, G. E.; Garner, G. M.

    The modelling of livestock movements within Australia is of national importance for the purposes of the management and control of exotic disease spread, infrastructure development and the economic forecasting of livestock markets. In this paper an agent based model for the forecasting of livestock movements is presented. This models livestock movements from farm to farm through a saleyard. The decision of farmers to sell or buy cattle is often complex and involves many factors such as climate forecast, commodity prices, the type of farm enterprise, the number of animals available and associated off-shore effects. In this model the farm agent's intelligence is implemented using a fuzzy decision tree that utilises two of these factors. These two factors are the livestock price fetched at the last sale and the number of stock on the farm. On each iteration of the model farms choose either to buy, sell or abstain from the market thus creating an artificial supply and demand. The buyers and sellers then congregate at the saleyard where livestock are auctioned using a second price sealed bid. The price time series output by the model exhibits properties similar to those found in real livestock markets.

  10. Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness.

    Directory of Open Access Journals (Sweden)

    Lukas Tanner

    Full Text Available BACKGROUND: Dengue is re-emerging throughout the tropical world, causing frequent recurrent epidemics. The initial clinical manifestation of dengue often is confused with other febrile states confounding both clinical management and disease surveillance. Evidence-based triage strategies that identify individuals likely to be in the early stages of dengue illness can direct patient stratification for clinical investigations, management, and virological surveillance. Here we report the identification of algorithms that differentiate dengue from other febrile illnesses in the primary care setting and predict severe disease in adults. METHODS AND FINDINGS: A total of 1,200 patients presenting in the first 72 hours of acute febrile illness were recruited and followed up for up to a 4-week period prospectively; 1,012 of these were recruited from Singapore and 188 from Vietnam. Of these, 364 were dengue RT-PCR positive; 173 had dengue fever, 171 had dengue hemorrhagic fever, and 20 had dengue shock syndrome as final diagnosis. Using a C4.5 decision tree classifier for analysis of all clinical, haematological, and virological data, we obtained a diagnostic algorithm that differentiates dengue from non-dengue febrile illness with an accuracy of 84.7%. The algorithm can be used differently in different disease prevalence to yield clinically useful positive and negative predictive values. Furthermore, an algorithm using platelet count, crossover threshold value of a real-time RT-PCR for dengue viral RNA, and presence of pre-existing anti-dengue IgG antibodies in sequential order identified cases with sensitivity and specificity of 78.2% and 80.2%, respectively, that eventually developed thrombocytopenia of 50,000 platelet/mm(3 or less, a level previously shown to be associated with haemorrhage and shock in adults with dengue fever. CONCLUSION: This study shows a proof-of-concept that decision algorithms using simple clinical and haematological parameters

  11. Forecasting Reading Anxiety for Promoting English-Language Reading Performance Based on Reading Annotation Behavior

    Science.gov (United States)

    Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao

    2016-01-01

    To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…

  12. Determination of fetal state from cardiotocogram using LS-SVM with particle swarm optimization and binary decision tree.

    Science.gov (United States)

    Yılmaz, Ersen; Kılıkçıer, Cağlar

    2013-01-01

    We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable classification accuracy rate of 91.62%.

  13. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  14. 胶东半岛果园TM影像信息的提取决策树方法%Decision tree classification of orchard information extraction from TM imagery in Jiaodong Peninsula of China

    Institute of Scientific and Technical Information of China (English)

    于新洋; 张安定; 侯西勇

    2012-01-01

    Decision tree classification is a kind of classification model which uses certain classification rules to gradually thin the research image. It has been widely used for information extraction from remote sensing images due to its goodness of intuitive and high efficiency. Jiaodong Peninsula is one of the most famous areas in China for the production of fruits; therefore, it is very significant to monitor the distribution of orchards. In this paper, the decision tree classification was used to extract the area of orchard in Jiaodong Peninsula. Specifically, Landsat5 TM image (path 120 row034, October24, 2005) was available and five most representative cities (Penglai, Longkou, Laizhou, Qixia, Zhaoyuan) were selected as the study area. It turned out that the decision tree classification had satisfactory performance, the classification results were acceptable and could be used as the original inputs for related researches.%本文选取胶东半岛最具代表性的5个果品县(市)为研究区,以Landsat TM影像数据为分类影像,尝试提取果园信息.选用可以“无缝”融入多种辅助信息的决策树分类方法,综合NDVI、地形地貌和缨帽变换等多种辅助信息,利用年内物候变化最大的果园与背景地物的光谱差异,进行果园信息提取;利用SPOT影像以及野外考察资料作为检验样本进行精度验证.表明综合多种辅助信息,利用决策树分类法提取TM影像果园信息可行且准确性较高.

  15. Real-time Container Transport Planning with Decision Trees based on Offline Obtained Optimal Solutions

    NARCIS (Netherlands)

    B. van Riessen (Bart); R.R. Negenborn (Rudy); R. Dekker (Rommert)

    2016-01-01

    textabstractHinterland networks for container transportation require planning methods in order to increase efficiency and reliability of the inland road, rail and waterway connections. In this paper we aim to derive real-time decision rules for suitable allocations of containers to inland services b

  16. Tailored Approach in Inguinal Hernia Repair – Decision Tree Based on the Guidelines

    OpenAIRE

    2014-01-01

    The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch, and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society, and the European Association of Endoscopic Surgery. Eighty-two percent of experienced hernia surgeons use the “tailored approach,” the differentiated use of the several inguinal hernia repair techniques depending on the findings of ...

  17. Predictors and patterns of problematic Internet game use using a decision tree model.

    Science.gov (United States)

    Rho, Mi Jung; Jeong, Jo-Eun; Chun, Ji-Won; Cho, Hyun; Jung, Dong Jin; Choi, In Young; Kim, Dai-Jin

    2016-09-01

    Background and aims Problematic Internet game use is an important social issue that increases social expenditures for both individuals and nations. This study identified predictors and patterns of problematic Internet game use. Methods Data were collected from online surveys between November 26 and December 26, 2014. We identified 3,881 Internet game users from a total of 5,003 respondents. A total of 511 participants were assigned to the problematic Internet game user group according to the Diagnostic and Statistical Manual of Mental Disorders Internet gaming disorder criteria. From the remaining 3,370 participants, we used propensity score matching to develop a normal comparison group of 511 participants. In all, 1,022 participants were analyzed using the chi-square automatic interaction detector (CHAID) algorithm. Results According to the CHAID algorithm, six important predictors were found: gaming costs (50%), average weekday gaming time (23%), offline Internet gaming community meeting attendance (13%), average weekend and holiday gaming time (7%), marital status (4%), and self-perceptions of addiction to Internet game use (3%). In addition, three patterns out of six classification rules were explored: cost-consuming, socializing, and solitary gamers. Conclusion This study provides direction for future work on the screening of problematic Internet game use in adults.

  18. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  19. Network Traffic Classification Using SVM Decision Tree%基于SVM决策树的网络流量分类

    Institute of Scientific and Technical Information of China (English)

    邱婧; 夏靖波; 柏骏

    2012-01-01

    In order to solve the unrecognized area and long training time problems existed when using Support Vector Machine ( SVM) method in network traffic classification, SVM decision tree was used in network traffic classification by using its advantages in multi-class classification. The authoritative flow data sets were tested. The experiment results show that SVM decision tree method has shorter training time and better classification performance than ordinary "one-on-one" and "one-on-more"SVM method in network traffic classification, whose classification accuracy rate can reach 98. 8%.%提出一种用支持向量机(SVM)决策树来对网络流量进行分类的方法,利用SVM决策树在多类分类方面的优势,解决SVM在流量分类中存在的无法识别区域和训练时间较长的问题.对权威流量数据集进行了测试,实验结果表明,SVM决策树在流量分类中比普通的“一对一”和“一对多”SVM方法具有更短的训练时问和更好的分类性能,分类准确率可以达到98.8%.

  20. Two improvements on CART decision tree and its application%CART决策树的两种改进及应用

    Institute of Scientific and Technical Information of China (English)

    张亮; 宁芊

    2015-01-01

    Fayyad boundary point determination principle was used to improve the method of choosing continuous-valued attri-butes’segmentation threshold in CART decision tree.Through Fayyad boundary point determination principle,in the process of selecting continuous-valued attributes’segmentation threshold,adjacent boundary points which were sorted and in different clas-ses were checked,instead of getting every split point checked.And the key decision factor was used to improve the classification accuracy when the main classes of sample set distributed imbalanced.CART classifier was constructed based on these methods. The experimental result shows that Fayyad boundary point determination principle is appropriate for CART algorithm,the effi-ciency of building decision tree is improved by about 45 percent,and when the main classes of sample set distribute imbalanced, the classification accuracy of the improved algorithm is higher than that of the original one.%利用Fayyad边界点判定原理对CART决策树选取连续属性的分割阈值的方法进行改进,由Fayyad边界点判定原理可知,建树过程中选取连续属性的分割阈值时,不需要检查每一个分割点,只要检查样本排序后,该属性相邻不同类别的分界点即可;针对样本集主类类属分布不平衡时,样本量占相对少数的小类属样本不能很好地对分类进行表决的情况,采用关键度度量的方法进行改进。基于这两点改进构建CART分类器。实验结果表明,Fayyad边界点判定原理适用于CART算法,利用改进后的CART算法生成决策树的效率提高了近45%,在样本集主类类属分布不平衡的情况下,分类准确率也略有提高。

  1. Bringing Science and Pragmatism together - a Tiered Approach for Modelling Toxicological Impacts in LCA

    DEFF Research Database (Denmark)

    Guinée, J; De Koning, A; Pennington, David W.;

    2004-01-01

    , there is insufficient knowledge and/or resources to have high data availability as well as high data quality and high model quality at the same time. Results. The OMNIITOX project is developing two inter-related models in order to be able to provide LCA impact assessment characterisation factors for toxic releases...... for as broad a range of chemicals as possible: 1) A base model representing a state-of-the-art multimedia model and 2) a simple model derived from the base model using statistical tools. Discussion. A preliminary decision tree for using the OMNIITOX information system (IS) is presented. The decision tree aims...... categories. The OMNIITOX project is developing a tiered model approach for this. It is foreseen that a first version of the base model will be ready in late summer of 2004, whereas a first version of the simple base model is expected a few months later....

  2. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  3. Assessment of the potential enhancement of rural food security in Mexico using decision tree land use classification on medium resolution satellite imagery

    Science.gov (United States)

    Bermeo, A.; Couturier, S.

    2017-01-01

    Because of its renewed importance in international agendas, food security in sub-tropical countries has been the object of studies at different scales, although the spatial components of food security are still largely undocumented. Among other aspects, food security can be assessed using a food selfsufficiency index. We propose a spatial representation of this assessment in the densely populated rural area of the Huasteca Poblana, Mexico, where there is a known tendency towards the loss of selfsufficiency of basic grains. The main agricultural systems in this area are the traditional milpa (a multicrop practice with maize as the main basic crop) system, coffee plantations and grazing land for bovine livestock. We estimate a potential additional milpa - based maize production by smallholders identifying the presence of extensive coffee and pasture systems in the production data of the agricultural census. The surface of extensive coffee plantations and pasture land were estimated using the detailed coffee agricultural census data, and a decision tree combining unsupervised and supervised spectral classification techniques of medium scale (Landsat) satellite imagery. We find that 30% of the territory would benefit more than 50% increment in food security and 13% could theoretically become maize self-sufficient from the conversion of extensive systems to the traditional multicrop milpa system.

  4. Improved γ/hadron separation for the detection of faint γ-ray sources using boosted decision trees

    Science.gov (United States)

    Krause, Maria; Pueschel, Elisa; Maier, Gernot

    2017-03-01

    Imaging atmospheric Cherenkov telescopes record an enormous number of cosmic-ray background events. Suppressing these background events while retaining γ-rays is key to achieving good sensitivity to faint γ-ray sources. The differentiation between signal and background events can be accomplished using machine learning algorithms, which are already used in various fields of physics. Multivariate analyses combine several variables into a single variable that indicates the degree to which an event is γ-ray-like or cosmic-ray-like. In this paper we will focus on the use of "boosted decision trees" for γ/hadron separation. We apply the method to data from the Very Energetic Radiation Imaging Telescope Array System (VERITAS), and demonstrate an improved sensitivity compared to the VERITAS standard analysis.

  5. Comparison between SARS CoV and MERS CoV Using Apriori Algorithm, Decision Tree, SVM

    Directory of Open Access Journals (Sweden)

    Jang Seongpil

    2016-01-01

    Full Text Available MERS (Middle East Respiratory Syndrome is a worldwide disease these days. The number of infected people is 1038(08/03/2015 in Saudi Arabia and 186(08/03/2015 in South Korea. MERS is all over the world including Europe and the fatality rate is 38.8%, East Asia and the Middle East. The MERS is also known as a cousin of SARS (Severe Acute Respiratory Syndrome because both diseases show similar symptoms such as high fever and difficulty in breathing. This is why we compared MERS with SARS. We used data of the spike glycoprotein from NCBI. As a way of analyzing the protein, apriori algorithm, decision tree, SVM were used, and particularly SVM was iterated by normal, polynomial, and sigmoid. The result came out that the MERS and the SARS are alike but also different in some way.

  6. 采用决策树分类方法进行煤矸石信息提取研究%Research on using the decision tree classification method to extract coal gangue information

    Institute of Scientific and Technical Information of China (English)

    冯稳; 张志; 乌云其其格; 孟丹

    2011-01-01

    利用遥感技术快速、准确地调查煤矸石堆分布情况,对预防地质灾害以及保护生态环境和居民生命财产安全有着重要的指导意义.基于TM多光谱影像,运用知识决策树分类方法对江西萍乡煤矿区进行煤矸石信息提取试验.首先,在研究区背景知识的基础下,统计分析矿区内煤矸石及其他典型地物在影像上的光谱特征,建立了研究区的分类知识库;其次,在决策树分类模型支撑下,分别运用归一化差异植被指数、改进型归一化差异水体指数以及光谱阈值法对图像进行分类;最后,利用地学知识和几何特征进行分类后处理,分类精度达到82.97%.试验表明,该方法适用于煤矸石信息的自动提取,结合目视解译方法,可以提高解译的效率及准确度.%Using remote sensing technique to survey coal gangue' s distribution quickly and accurately has important guiding significance for the prevention of geological disasters and the protection of the ecological environment and residents' life and property securities. Based on TM multi-spectral image, it is adopted the decision tree classification method to extract Pingxiang coal mining area' coal gangue information in Jiangxi Province. Firstly, under the foundation of study area' s background knowledge, counted and analyzed the area' s coal gangue' s and other typical surface objects' spectral characteristics in RS image, then established the study area' s classification databases.Secondly, on the support of the decision tree classification model, used Normalized Difference Vegetation Index,Modified Normalized Difference Water Index and Spectrum Threshold Method to classify the image respectively. Ultimately, post-process the classified image by using geological knowledge and geometric feature. The total classification accuracy was up to 82. 97%. The experiment demonstrates that this method is suitable for coal gangue information's automatic extraction

  7. Regression-based air temperature spatial prediction models: an example from Poland

    Directory of Open Access Journals (Sweden)

    Mariusz Szymanowski

    2013-10-01

    Full Text Available A Geographically Weighted Regression ? Kriging (GWRK algorithm, based on the local Geographically Weighted Regression (GWR, is applied for spatial prediction of air temperature in Poland. Hengl's decision tree for selecting a suitable prediction model is extended for varying spatial relationships between the air temperature and environmental predictors with an assumption of existing environmental dependence of analyzed temperature variables. The procedure includes the potential choice of a local GWR instead of the global Multiple Linear Regression (MLR method for modeling the deterministic part of spatial variation, which is usual in the standard regression (residual kriging model (MLRK. The analysis encompassed: testing for environmental correlation, selecting an appropriate regression model, testing for spatial autocorrelation of the residual component, and validating the prediction accuracy. The proposed approach was performed for 69 air temperature cases, with time aggregation ranging from daily to annual average air temperatures. The results show that, irrespective of the level of data aggregation, the spatial distribution of temperature is better fitted by local models, and hence is the reason for choosing a GWR instead of the MLR for all variables analyzed. Additionally, in most cases (78% there is spatial autocorrelation in the residuals of the deterministic part, which suggests that the GWR model should be extended by ordinary kriging of residuals to the GWRK form. The decision tree used in this paper can be considered as universal as it encompasses either spatially varying relationships of modeled and explanatory variables or random process that can be modeled by a stochastic extension of the regression model (residual kriging. Moreover, for all cases analyzed, the selection of a method based on the local regression model (GWRK or GWR does not depend on the data aggregation level, showing the potential versatility of the technique.

  8. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Greve, Mogens Humlekrog; Bøcher, Peder Klith

    2010-01-01

    field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v...... the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature......) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME...

  9. Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules

    Directory of Open Access Journals (Sweden)

    G.MeeraGandhi

    2010-11-01

    Full Text Available In the era of information society, computer networks and their related applications are the emerging technologies. Network Intrusion Detection aims at distinguishing the behavior of the network. As the network attacks have increased in huge numbers over the past few years, Intrusion Detection System (IDS is increasingly becoming a critical component to secure the network. Owing to large volumes of security audit data in a network in addition to intricate and vibrant properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, the field of machine learning attempts to characterize how such changes can occur by designing, implementing, running, and analyzing algorithms that can be run on computers. The discipline draws on ideas, with the goal of understanding the computational character of learning. Learning always occurs in the context of some performance task, and that a learning method should always be coupled with a performance element that uses the knowledge acquired during learning. In this research, machine learning is being investigated as a technique for making the selection, using as training data and their outcome. In this paper, we evaluate the performance of a set of classifier algorithms of rules (JRIP, Decision Tabel, PART, and OneR and trees (J48, RandomForest, REPTree, NBTree. Based on the evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The empirical simulation result shows the comparison between the noticeable performance improvements. The classification models were trained using the data collected from Knowledge Discovery Databases (KDD for Intrusion Detection. The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The

  10. Fuzzy Decision Trees with Possibility Distributions as Output%输出为可能性分布的模糊决策树

    Institute of Scientific and Technical Information of China (English)

    袁修久; 张文修

    2003-01-01

    More than one possible classifications for a given instance is supposed. A possibility distribution is assigned at a terminal node of a fuzzy decision tree. The possibility distribution of given instance with known value of attributes is determined by using simple fuzzy reasoning. The inconsistency in determining a single class for a given instance diminishes here.

  11. A Decision Tree Analysis to Support Potential Climate Change Adaptations of Striped Catfish (Pangasianodon hypophthalmus Sauvage) Farming in the Mekong Delta, Vietnam

    NARCIS (Netherlands)

    Nguyen, L.A.; Verreth, J.A.J.; Leemans, H.B.J.; Bosma, R.H.; Silva, De S.

    2016-01-01

    This study uses the decision tree framework to analyse possible climate change impact adaptation options for pangasius (Pangasianodon hypopthalmus Sauvage) farming in the Mekong Delta. Here we present the risks for impacts and the farmers' autonomous and planned public adaptation by using primary an

  12. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    Institute of Scientific and Technical Information of China (English)

    潘大胜; 屈迟文

    2016-01-01

    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  13. Data Optimization with Multilayer Perceptron Neural Network and Using New Pattern in Decision Tree Comparatively

    Directory of Open Access Journals (Sweden)

    Murat Kayri

    2010-01-01

    Full Text Available Problem statement: The aim of the present study is to exemplify the use of Artificial Neural Networks (ANN for parameter prediction. Missing value or unreal approach to some questions in scale is a problem for unbiased findings. To learn a real pattern with ANN provides robust and unbiased parameter estimation. Approach: To this end, data was collected from 906 students using ?Scale of student views about the expected situations and the current expectations from their families during learning process? for the study entitled ?Student views about the expected situations and the current expectations from their families during learning process?. In the study, first the initial data set gathered using the measurement tool and the new data set produced by Multi-Layer Receptors algorithm, which was considered as the highest predictive level of ANN for the research were individually analyzed by Chaid analysis and the results of the two analyses were compared. Results: The findings showed that as a result of Chaid analysis with the initial data set the variable ?education level of mother? had a considerable effect on total score dependent variable, while ?education level of father? was the influential variable on the attitude level in the data set predicted by ANN, unlike the previous model. Conclusion/Recommendations: The findings of the research show Artificial Neural Networks could be used for parameter estimation in cause-effect based studies. It is also thought the research will contribute to extensive use of advanced statistical methods.

  14. Study on Acoustic Modeling in a Mandarin Continuous Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    PENG Di; LIU Gang; GUO Jun

    2007-01-01

    The design of acoustic models is of vital importance to build a reliable connection between acoustic waveform and linguistic messages in terms of individual speech units. According to the characteristic of Chinese phonemes,the base acoustic phoneme units set is decided and refined and a decision tree based state tying approach is explored.Since one of the advantages of top-down tying method is flexibility in maintaining a balance between model accuracy and complexity, relevant adjustments are conducted, such as the stopping criterion of decision tree node splitting, during which optimal thresholds are captured. Better results are achieved in improving acoustic modeling accuracy as well as minimizing the scale of the model to a trainable extent.

  15. DoD Information Assurance Certification and Accreditation Process (DIACAP) Survey and Decision Tree

    Science.gov (United States)

    2011-07-01

    CVC Compliance and Validation Certification DAA designated accrediting authority DATO denial of authorization to operate DIACAP DoD Information...standard based on implementation of the best practices listed in paragraph 2.3. c. Direct the DSG to rename the Data Protection Committee to the...Information Grid (GIG)- based environment. Figure A-1. DoD IA program management. 1.1.1 DIACAP Background. a. Interim DIACAP signed 6 July 2006

  16. Assessing and monitoring the risk of desertification in Dobrogea, Romania, using Landsat data and decision tree classifier.

    Science.gov (United States)

    Vorovencii, Iosif

    2015-04-01

    The risk of the desertification of a part of Romania is increasingly evident, constituting a serious problem for the environment and the society. This article attempts to assess and monitor the risk of desertification in Dobrogea using Landsat Thematic Mapper (TM) satellite images acquired in 1987, 1994, 2000, 2007 and 2011. In order to assess the risk of desertification, we used as indicators the Modified Soil Adjustment Vegetation Index 1 (MSAVI1), the Moving Standard Deviation Index (MSDI) and the albedo, indices relating to the vegetation conditions, the landscape pattern and micrometeorology. The decision tree classifier (DTC) was also used on the basis of pre-established rules, and maps displaying six grades of desertification risk were obtained: non, very low, low, medium, high and severe. Land surface temperature (LST) was also used for the analysis. The results indicate that, according to pre-established rules for the period of 1987-2011, there are two grades of desertification risk that have an ascending trend in Dobrogea, namely very low and medium desertification. An investigation into the causes of the desertification risk revealed that high temperature is the main factor, accompanied by the destruction of forest shelterbelts and of the irrigation system and, to a smaller extent, by the fragmentation of agricultural land and the deforestation in the study area.

  17. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    Science.gov (United States)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  18. Bayesian Decision Tree for the Classification of the Mode of Motion in Single-Molecule Trajectories

    CERN Document Server

    Türkcan, Silvan

    2015-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and buil...

  19. Application of decision trees to the analysis of soil radon data for earthquake prediction.

    Science.gov (United States)

    Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  20. Application of decision trees to the analysis of soil radon data for earthquake prediction

    Energy Technology Data Exchange (ETDEWEB)

    Zmazek, B. E-mail: boris.zmazek@ijs.si; Todorovski, L.; Dzeroski, S.; Vaupotic, J.; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  1. Triage Decision Trees and Triage Protocols: Changing Strategies for Medical Rescue in Civilian Mass Casualty Situations.

    Science.gov (United States)

    1984-02-06

    o . .. ° . - . . - • . . . . state of fruition by Gtnerall Basil Pruitt of Brooke Army Medical Center. A simple calculation based on the depth...A.S., Berstein R.S. and Johnson D.S., Ocular effects following the volcanic eruptions of Mount St. Helens, Arch Ophthalmol 101 Mar 83 p. 376 79

  2. Assessment of Poor College Student in Guizhou Province via C4.5 Decision Tree%一种基于C4.5决策树的贵州省高校贫困生评定方法

    Institute of Scientific and Technical Information of China (English)

    李明江; 卢玉; 刘彦

    2013-01-01

    A C4.5 decision tree based assessment approach of poor college students in Guizhou province was proposed in this paper. Firstly the index system was established from the consumer behavior of students, the economic condition of their families and their work-study status. Secondly, 15 indexes are taken as the attributes of data to be classified by C4.5 decision tree, and the continuous attributes are discrete according to the information gain-ratio of attributes. The tree is pruned using the prediction error to obtain the four most important attributes to characterize the poor students. Fi-nally some real data is used to validate the efficency of our proposed method, and the experiments results show that it is of simple principle, and cgariacteristic of rapid and accurate calaulation. Compare with its counterparts, it not only does not rely on the statitical distribution of data, but also need not choose the model parameters, so it is an efficient technolo-gy for assement of poor college students.%  提出了一种基于C4.5决策树的贵州省高校贫困生评定方法。首先从贵州省大学生的消费行为、家庭情况、贷款与助学行为3个方面建立了大学生贫困资格评定的指标体系;其次,将获得的15项指标作为C4.5决策树的特征属性,基于信息增益率完成对连续变量的离散化处理,将知识表示成树的形式,采用错误预测率进行修剪,得到了影响贫困学生评定的4个最重要变量;最后将该方法进行实证分析。结果显示,它不仅原理简单,解释直观,而且计算快速准确。相比同类方法,它不依赖于数据的统计分布,也不需要选择模型参数,是一种有效的高校贫困生分类评定技术。

  3. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    Science.gov (United States)

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation.

  4. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  5. Assessment of the potential allergenicity of ice structuring protein type III HPLC 12 using the FAO/WHO 2001 decision tree for novel foods

    DEFF Research Database (Denmark)

    Bindslev-Jensen, C; Sten, E; Earl, L K

    2003-01-01

    no sequence similarity to known allergens nor was it stable to proteolytic degradation using standardised methods. Using sera from 20 patients with a well-documented clinical history of fish allergy, positive in skin prick tests to ocean pout, eel pout and eel were used, positive IgE-binding in vitro...... as to individuals potentially susceptible to producing IgE responses to proteins. Furthermore, the practicability of the new decision tree was confirmed....

  6. Nosocomial infections in brazilian pediatric patients: using a decision tree to identify high mortality groups

    Directory of Open Access Journals (Sweden)

    Julia M.M. Lopes

    2009-04-01

    Full Text Available Nosocomial infections (NI are frequent events with potentially lethal outcomes. We identified predictive factors for mortality related to NI and developed an algorithm for predicting that risk in order to improve hospital epidemiology and healthcare quality programs. We made a prospective cohort NI surveillance of all acute-care patients according to the National Nosocomial Infections Surveillance System guidelines since 1992, applying the Centers for Disease Control and Prevention 1988 definitions adapted to a Brazilian pediatric hospital. Thirty-eight deaths considered to be related to NI were analyzed as the outcome variable for 754 patients with NI, whose survival time was taken into consideration. The predictive factors for mortality related to NI (p < 0.05 in the Cox regression model were: invasive procedures and use of two or more antibiotics. The mean survival time was significantly shorter (p < 0.05 with the Kaplan-Meier method for patients who suffered invasive procedures and for those who received two or more antibiotics. Applying a tree-structured survival analysis (TSSA, two groups with high mortality rates were identified: one group with time from admission to the first NI less than 11 days, received two or more antibiotics and suffered invasive procedures; the other group had the first NI between 12 and 22 days after admission and was subjected to invasive procedures. The possible modifiable factors to prevent mortality involve invasive devices and antibiotics. The TSSA approach is helpful to identify combinations of predictors and to guide protective actions to be taken in continuous-quality-improvement programs.

  7. Decision Tree Phytoremediation

    Science.gov (United States)

    1999-12-01

    trichloromethane ), and hydrophobic organic compounds 2 - Rhizodegradation, phytostimulation, rhizosphere bioremediation, or plant-assisted...nitrobenzene, picric acid, nitrotoluene), atrazine, halogenated compounds (tetrachloromethane, trichloromethane , hexachloroethane, carbon tetrachloride, TCE...Chlorinated solvents (tetrachloromethane and trichloromethane ), organic VOCs, BTEX, MTBE *In practice, only a few of these compounds have been proven to

  8. Causal Decision Trees

    OpenAIRE

    2015-01-01

    Uncovering causal relationships in data is a major objective of data analytics. Causal relationships are normally discovered with designed experiments, e.g. randomised controlled trials, which, however are expensive or infeasible to be conducted in many cases. Causal relationships can also be found using some well designed observational studies, but they require domain experts' knowledge and the process is normally time consuming. Hence there is a need for scalable and automated methods for c...

  9. Assessment of the potential allergenicity of ice structuring protein type III HPLC 12 using the FAO/WHO 2001 decision tree for novel foods.

    Science.gov (United States)

    Bindslev-Jensen, C; Sten, E; Earl, L K; Crevel, R W R; Bindslev-Jensen, U; Hansen, T K; Stahl Skov, P; Poulsen, L K

    2003-01-01

    The introduction of novel proteins into foods carries a risk of eliciting allergic reactions in individuals sensitive to the introduced protein. Therefore, decision trees for evaluation of the risk have been developed, the latest being proposed by WHO/FAO early in 2001. Proteins developed using modern biotechnology and derived from fish are being considered for use in food and other applications, and since allergy to fish is well established, a potential risk from such proteins to susceptible human beings exists. The overall aim of the study was to investigate the potential allergenicity of an Ice Structuring Protein (ISP) originating from an arctic fish (the ocean pout, Macrozoarces americanus) using the newly developed decision tree proposed by FAO/WHO. The methods used were those proposed by FAO/WHO including amino acid sequence analysis for sequence similarity to known allergens, methods for assessing degradability under standardised conditions, assays for detection of specific IgE against the protein (Maxisorb RAST) and histamine release from human basophils. In the present paper we describe the serum screening phase of the study and discuss the overall application of the decision tree to the assessment of the potential allergenicity of ISP Type III. In an accompanying paper [Food Chem. Toxicol. 40 (2002) 965], we detail the specific methodology used for the sequence analysis and assessment of resistance to pepsin-catalysed proteolysis of this protein. The ISP showed no sequence similarity to known allergens nor was it stable to proteolytic degradation using standardised methods. Using sera from 20 patients with a well-documented clinical history of fish allergy, positive in skin prick tests to ocean pout, eel pout and eel were used, positive IgE-binding in vitro to extracts of the same fish was confirmed. The sera also elicited histamine release in vitro in the presence of the same extracts. The ISP was negative in all cases in the same experiments. Using the

  10. Comparison of tree types of models for the prediction of final academic achievement

    Directory of Open Access Journals (Sweden)

    Silvana Gasar

    2002-12-01

    Full Text Available For efficient prevention of inappropriate secondary school choices and by that academic failure, school counselors need a tool for the prediction of individual pupil's final academic achievements. Using data mining techniques on pupils' data base and expert modeling, we developed several models for the prediction of final academic achievement in an individual high school educational program. For data mining, we used statistical analyses, clustering and two machine learning methods: developing classification decision trees and hierarchical decision models. Using an expert system shell DEX, an expert system, based on a hierarchical multi-attribute decision model, was developed manually. All the models were validated and evaluated from the viewpoint of their applicability. The predictive accuracy of DEX models and decision trees was equal and very satisfying, as it reached the predictive accuracy of an experienced counselor. With respect on the efficiency and difficulties in developing models, and relatively rapid changing of our education system, we propose that decision trees are used in further development of predictive models.

  11. Gene function classification using Bayesian models with hierarchy-based priors

    Directory of Open Access Journals (Sweden)

    Neal Radford M

    2006-10-01

    Full Text Available Abstract Background We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs from the E. coli genome. Results The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.

  12. Application of breast MRI for prediction of lymph node metastases - systematic approach using 17 individual descriptors and a dedicated decision tree

    Energy Technology Data Exchange (ETDEWEB)

    Dietzel, Matthias; Baltzer, Pascal A.T.; Groeschel, Tobias; Kaiser, Werner A. (Inst. of Diagnostic and Interventional Radiology, Friedrich-Schiller-Univ. Jena (Germany)), e-mail: matthias.dietzel@med.uni-jena.de; Vag, Tibor (Dept. of Radiology, Klinikum rechts der Isar der Technischen Universitaet, Munich (Germany)); Gajda, Mieczyslaw (Inst. of Pathology, Friedrich-Schiller-Univ., Jena (Germany)); Camara, Oumar (Clinic of Gynecology, Friedrich-Schiller-Univ., Jena (Germany))

    2010-10-15

    Background: The presence of lymph node metastases (LNMs) is one of the most important prognostic factors in breast cancer. Purpose: To correlate a detailed catalog of 17 descriptors in breast MRI (bMRI) with the presence of LNMs and to identify useful combinations of such descriptors for the prediction of LNMs using a dedicated decision tree. Material and Methods: A standardized protocol and study design was applied in this IRB-approved study (T1-weighted FLASH; 0.1 mmol/kg body weight Gd-DTPA; T2-weighted TSE; histological verification after bMRI). Two experienced radiologists performed prospective evaluation of the previously acquired examination in consensus. In every lesion 17 previously published descriptors were assessed. Subgroups of primary breast cancers with (N+: 97) and without LNM were created (N-: 253). The prevalence and diagnostic accuracy of each descriptor were correlated with the presence of LNM (chi-square test; diagnostic odds ratio/DOR). To identify useful combinations of descriptors for the prediction of LNM a chi-squared automatic interaction detection (CHAID) decision tree was applied. Results: Seven of 17 descriptors were significantly associated with LNMs. The most accurate were 'Skin thickening' (P < 0.001; DOR 5.9) and 'Internal enhancement' (P < 0.001; DOR =13.7). The CHAID decision tree identified useful combinations of descriptors: 'Skin thickening' plus 'Destruction of nipple line' raised the probability of N+ by 40% (P< 0.05). In case of absence of 'Skin thickening', 'Edema', and 'Irregular margins', the likelihood of N+ was 0% (P<0.05). Conclusion: Our data demonstrate the close association of selected breast MRI descriptors with nodal status. If present, such descriptors can be used - as stand alone or in combination - to accurately predict LNM and to stratify the patient's prognosis

  13. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  14. 决策树算法的研究及其在大学生心理健康数据处理中的应用%Research on Decision Tree and Its Application on Students′Mental Health Data Treatment

    Institute of Scientific and Technical Information of China (English)

    晏杰

    2015-01-01

    Classification of decision tree is an important method in data mining. The basic ideas and common algorithms of decision tree algorithm are discussed,the decision tree mining is applied to students′ mental health data analysis ,and to analyse the impacting factors on students′ mental health. With the C5.0 algorithm, performed by Clementine 12.0, the decision tree mining model was constructed ,the data flow was also set ,with continuous test and analysis ,discovered that compulsion was the main symptom which impacted the mental health of students. To view the model with compulsion as the classification object ,it can be find out that anxiety and social relationship also have big influences. The target attribute were set as anxiety_degree and social relationship_degree ,output variables were set as the left nine factors ,dug out the main factors which cause the compulsion ,to provide the reference to the mental health domain.%决策树分类是数据挖掘中的一种重要方法。探讨了决策树算法的基本思想和常用算法,并将决策树挖掘技术应用于大学生心理健康数据,分析挖掘影响大学生心理健康的因素。文章选择C5.0算法,通过Clemen⁃tine12.0进行决策树挖掘模型的构建,建立数据流,通过不断测试分析,发现影响大学生心理健康主要症状是强迫症。以强迫症为分类目标查看模型,可以了解到焦虑症和人际关系也起到很大的影响作用。将目标属性分别设置为焦虑_程度和人际关系_程度,输出变量设为剩余的9个因子变量,执行数据流挖掘出导致强迫症的主要原因,为指导心理健康的工作人员提供参考。

  15. Millon´s Personality Model and ischemic cardiovascular acute episodes: Profiles of risk in a decision tree

    Directory of Open Access Journals (Sweden)

    María M. Richard's

    2008-01-01

    Full Text Available La identificación de subgrupos de riesgo permite a los psicólogos clínicos desarrollar intervenciones específicas para esos subgrupos. El principal propósito de este trabajo fue encontrar asociaciones estadísticas entre características de personalidad -rasgos y trastornos- y la existencia de episodios isquémicos cardiovasculares agudos según el modelo de personalidad de Theodore Millon. Los análisis del presente estudio se fundamentaron en una muestra de 313 mujeres y hombres entre 31 y 80 años de edad, divididos en dos grupos: un grupo clínico formado por 143 participantes internados a causa de episodios isquémicos cardiovasculares agudos y un grupo control constituido por 170 personas sin antecedentes de enfermedades cardiovasculares. Los resultados mostraron cuatro perfiles de riesgo de personalidad asociados con la existencia de episodios isquémicos agudos y, por tanto, esto posibilita a los psicólogos clínicos el diseño de intervenciones específicas para aquellos subgrupos.

  16. Prediction of Frost Occurrences Using Statistical Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Hyojin Lee

    2016-01-01

    Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.

  17. 基于 CART 决策树方法的 MODIS 数据海冰反演%Retrieval of the sea ice area from MODIS data by CART decision tree method

    Institute of Scientific and Technical Information of China (English)

    张娜; 张庆河

    2014-01-01

    The methodology of CART decision tree is utilized to retrieve the sea ice area from MODIS satellite remote sensing data, which realizes the automatic classification by multi -bands of visible light, near infrared light and thermal infrared light, and effectively eliminates the misjudgment by traditional threshold methods when retrieving sea ice in the marine environment such as high sediment suspension. The retrieved results are verified by the Small Satellite Constellation for Environment and Disaster Monitoring and Forecasting (HJ-1A/1B) with the high spatial resolution,which suggests that the retrieved data based on the CART decision tree have the higher resolution. The process of sea ice growing and melting in the Liaodong Bay during winter since 2003 is retrieved by this methodology, which provides precise and reliable data for analyzing the relationship between meteorologic factor and sea ice evolution.%采用 CART 决策树方法从 MODIS 卫星遥感数据反演海冰面积,同时对可见光、近红外和热红外多波段进行自动分类计算,有效消除了传统阈值法在反演高悬沙等海洋环境时出现的海冰误判。经较高分辨率的中国环境与灾害监测预报小卫星(HJ-1A/1B)校验,结果表明基于决策树方法所得出的反演数据具有较高精度。利用建立的 CART 决策树方法反演了2003年以来辽东湾冬季海冰面积的生消变化过程,为进一步分析和建立气象因素与海冰演化规律的关系提供了精确可靠的基础资料。

  18. Online Rule Generation Software Process Model

    Directory of Open Access Journals (Sweden)

    Sudeep Marwaha

    2013-07-01

    Full Text Available For production systems like expert systems, a rule generation software can facilitate the faster deployment. The software process model for rule generation using decision tree classifier refers to the various steps required to be executed for the development of a web based software model for decision rule generation. The Royce’s final waterfall model has been used in this paper to explain the software development process. The paper presents the specific output of various steps of modified waterfall model for decision rules generation.

  19. CCAST: a model-based gating strategy to isolate homogeneous subpopulations in a heterogeneous population of single cells.

    Directory of Open Access Journals (Sweden)

    Benedict Anchang

    2014-07-01

    Full Text Available A model-based gating strategy is developed for sorting cells and analyzing populations of single cells. The strategy, named CCAST, for Clustering, Classification and Sorting Tree, identifies a gating strategy for isolating homogeneous subpopulations from a heterogeneous population of single cells using a data-derived decision tree representation that can be applied to cell sorting. Because CCAST does not rely on expert knowledge, it removes human bias and variability when determining the gating strategy. It combines any clustering algorithm with silhouette measures to identify underlying homogeneous subpopulations, then applies recursive partitioning techniques to generate a decision tree that defines the gating strategy. CCAST produces an optimal strategy for cell sorting by automating the selection of gating markers, the corresponding gating thresholds and gating sequence; all of these parameters are typically manually defined. Even though CCAST is optimized for cell sorting, it can be applied for the identification and analysis of homogeneous subpopulations among heterogeneous single cell data. We apply CCAST on single cell data from both breast cancer cell lines and normal human bone marrow. On the SUM159 breast cancer cell line data, CCAST indicates at least five distinct cell states based on two surface markers (CD24 and EPCAM and provides a gating sorting strategy that produces more homogeneous subpopulations than previously reported. When applied to normal bone marrow data, CCAST reveals an efficient strategy for gating T-cells without prior knowledge of the major T-cell subtypes and the markers that best define them. On the normal bone marrow data, CCAST also reveals two major mature B-cell subtypes, namely CD123+ and CD123- cells, which were not revealed by manual gating but show distinct intracellular signaling responses. More generally, the CCAST framework could be used on other biological and non-biological high dimensional data

  20. Development of decision tree software and protein profiling using surface enhanced laser desorption/ionization-time of flight-mass spectrometry (SELDI-TOF-MS) in papillary thyroid cancer

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, Joon Kee; An, Young Sil; Park, Bok Nam; Yoon, Seok Nam [Ajou University School of Medicine, Suwon (Korea, Republic of); Lee, Jun [Konkuk University, Seoul (Korea, Republic of)

    2007-08-15

    The aim of this study was to develop a bioinformatics software and to test it in serum samples of papillary thyroid cancer using mass spectrometry (SELDI-TOF-MS). Development of 'Protein analysis' software performing decision tree analysis was done by customizing C4.5. Sixty-one serum samples from 27 papillary thyroid cancer, 17 autoimmune thyroiditis, 17 controls were applied to 2 types of protein chips, CM10 (weak cation exchange) and IMAC3 (metal binding - Cu). Mass spectrometry was performed to reveal the protein expression profiles. Decision trees were generated using 'Protein analysis' software, and automatically detected biomarker candidates. Validation analysis was performed for CM10 chip by random sampling. Decision tree software, which can perform training and validation from profiling data, was developed. For CM10 and IMAC3 chips, 23 of 113 and 8 of 41 protein peaks were significantly different among 3 groups ({rho} < 0.05), respectively. Decision tree correctly classified 3 groups with an error rate of 3.3% for CM10 and 2.0% for IMAC3, and 4 and 7 biomarker candidates were detected respectively. In 2 group comparisons, all cancer samples were correctly discriminated from non-cancer samples (error rate = 0%) for CM10 by single node and for IMAC3 by multiple nodes. Validation results from 5 test sets revealed SELDI-TOF-MS and decision tree correctly differentiated cancers from non-cancers (54/55, 98%), while predictability was moderate in 3 group classification (36/55, 65%). Our in-house software was able to successfully build decision trees and detect biomarker candidates, therefore it could be useful for biomarker discovery and clinical follow up of papillary thyroid cancer.

  1. Identification of Hadronically-Decaying W Bosons and Top Quarks Using High-Level Features as Input to Boosted Decision Trees and Deep Neural Networks in ATLAS at $\\sqrt{s}$ = 13 TeV

    CERN Document Server

    The ATLAS collaboration

    2017-01-01

    The application of boosted decision trees and deep neural networks to the identification of hadronically-decaying W bosons and top quarks using high-level jet observables as inputs is investigated using Monte Carlo simulations. In the case of both boosted decision trees and deep neural networks, the use of machine learning techniques is found to improve the background rejection with respect to simple reference single jet substructure and mass taggers. Linear correlations between the resulting classifiers and the substructure variables are also presented.

  2. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with rresources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  3. The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease

    Directory of Open Access Journals (Sweden)

    Mehdi Birjandi

    2016-01-01

    Full Text Available Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD, considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered.

  4. Realization and Application of Customer Attrition Early Warning Model in Security Company

    Directory of Open Access Journals (Sweden)

    Shen Yizhen

    2012-09-01

    Full Text Available In this paper, we propose the customer attrition early warning model based on data warehouse and data mining technologies, which is achieved and applied in our security company. The modeling variables can be selected by means of the combination with decision tree and the gradual regression in Logistic regression. Then customer attrition early warning model can be constructed based on Logistic regression. The results show that the model can strongly promote the customer attrition capturing rate, push on the building of the company customer marketing management and customer service management organization, and economize the marketing cost. The company profits promotion and trade competitive power can be promised.

  5. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    Directory of Open Access Journals (Sweden)

    S. Galelli

    2013-02-01

    Full Text Available Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees, in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART; (ii is computationally very efficient; and, (iii allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore and Canning River (Western Australia representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5 and parametric data-driven approaches (ANNs and multiple linear regression. Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5 in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  6. A Dynamic Web Page Prediction Model Based on Access Patterns to Offer Better User Latency

    CERN Document Server

    Mukhopadhyay, Debajyoti; Saha, Dwaipayan; Kim, Young-Chon

    2011-01-01

    The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer limited performance improvement because it is difficult for caching to handle the large number of increasingly diverse files. Studies have been conducted on prefetching models based on decision trees, Markov chains, and path analysis. However, the increased uses of dynamic pages, frequent changes in site structure and user access patterns have limited the efficacy of these static techniques. In this paper, we have proposed a methodology to cluster related pages into different categories based on the access patterns. Additionally we use page ranking to build up our prediction model at the initial stages when users haven't already started sending requests. This way we have tried to overcome the problems of maintaining huge databases which is needed in case of log based techn...

  7. Modelos matemáticos para la evaluación económica: los modelos dinámicos basados en ecuaciones diferenciales Mathematical models for economic evaluation: dynamic models based on differential equations

    Directory of Open Access Journals (Sweden)

    Roberto Pradas Velasco

    2009-10-01

    Full Text Available La utilización conjunta de árboles de decisión y modelos epidemiológicos basados en ecuaciones diferenciales es un método apropiado para la evaluación económica de medidas profilácticas ante enfermedades infecciosas. Estos modelos permiten combinar el comportamiento dinámico de la enfermedad con el consumo de recursos sanitarios. Para ilustrar este tipo de modelos se ajusta un sistema dinámico de ecuaciones diferenciales al comportamiento epidémico de la gripe en España, con el fin de proyectar el impacto epidemiológico de la vacunación antigripal. Los resultados del modelo dinámico se implementan en un diagrama con estructura de árbol para medir el consumo de recursos sanitarios y su repercusión en términos monetarios.The joint utilization of both decision trees and epidemiological models based on differential equations is an appropriate method for the economic evaluation of preventative interventions applied to infectious diseases. These models can combine the dynamic pattern of the disease together with health resource consumption. To illustrate this type of model, we adjusted a dynamic system of differential equations to the epidemic behavior of influenza in Spain, with a view to projecting the epidemiologic impact of influenza vaccination. The results of the epidemic model are implemented in a diagram with the structure of a decision tree so that health resource consumption and the economic implications can be calculated.

  8. Network Traffic Anomalies Identification Based on Classification Methods

    Directory of Open Access Journals (Sweden)

    Donatas Račys

    2015-07-01

    Full Text Available A problem of network traffic anomalies detection in the computer networks is analyzed. Overview of anomalies detection methods is given then advantages and disadvantages of the different methods are analyzed. Model for the traffic anomalies detection was developed based on IBM SPSS Modeler and is used to analyze SNMP data of the router. Investigation of the traffic anomalies was done using three classification methods and different sets of the learning data. Based on the results of investigation it was determined that C5.1 decision tree method has the largest accuracy and performance and can be successfully used for identification of the network traffic anomalies.

  9. An Assessment of the Effectiveness of Tree-Based Models for Multi-Variate Flood Damage Assessment in Australia

    Directory of Open Access Journals (Sweden)

    Roozbeh Hasanzadeh Nafari

    2016-07-01

    Full Text Available Flood is a frequent natural hazard that has significant financial consequences for Australia. In Australia, physical losses caused by floods are commonly estimated by stage-damage functions. These methods usually consider only the depth of the water and the type of buildings at risk. However, flood damage is a complicated process, and it is dependent on a variety of factors which are rarely taken into account. This study explores the interaction, importance, and influence of water depth, flow velocity, water contamination, precautionary measures, emergency measures, flood experience, floor area, building value, building quality, and socioeconomic status. The study uses tree-based models (regression trees and bagging decision trees and a dataset collected from 2012 to 2013 flood events in Queensland, which includes information on structural damages, impact parameters, and resistance variables. The tree-based approaches show water depth, floor area, precautionary measures, building value, and building quality to be important damage-influencing parameters. Furthermore, the performance of the tree-based models is validated and contrasted with the outcomes of a multi-parameter loss function (FLFArs from Australia. The tree-based models are shown to be more accurate than the stage-damage function. Consequently, considering more parameters and taking advantage of tree-based models is recommended. The outcome is important for improving established Australian flood loss models and assisting decision-makers and insurance companies dealing with flood risk assessment.

  10. Decision Tree Technology Application in the Clients Division of Hospital%决策树技术在医院住院客户划分中的应用

    Institute of Scientific and Technical Information of China (English)

    罗强

    2011-01-01

    The history of the provincial MCH hospital business data as a sample,through data mining decision tree modeling method to build their hospital customers into the model,to get classification rules,on this basis,the customer will be divided into different patient groups.Through the division of customers and characteristics analysis,the hospital can be a clear understanding of key customers and key customer groups to provide customized according to need personalized service,which will greatly enhance this part of the customer loyalty and satisfaction,to ensure the hospital mainstream profits and long-term source of income and stability.%本文以省妇幼保健院历史的住院业务数据为样本,通过数据挖掘的决策树建模方法建立其住院客户的划分模型,得到分类规则,在此基础上将住院客户划分为不同的群体。通过对客户的划分及其特征分析,医院可清楚的了解重点客户并给予重点客户群体以按需要定制的个性化服务,这将极大提升这部分客户的忠诚度和满意度,从而确保医院主流利润和收入来源的长期性和稳定性。

  11. Decision-tree sensitivity analysis for cost-effectiveness of whole-body FDG PET in the management of patients with non-small-cell lung carcinoma in Japan

    Energy Technology Data Exchange (ETDEWEB)

    Kosuda, Shigeru; Kobayashi, Hideo; Kusano, Shoichi [National Defense Medical Coll., Tokorozawa, Saitama (Japan); Ichihara, Kiyoshi [Kawasaki Medical School, Kurashiki, Okayama (Japan); Watanabe, Masazumi [Keio Univ., Tokyo (Japan). School of Medicine

    2002-06-01

    Whole-body 2-fluoro-2-D-[{sup 18}F]deoxyglucose[FDG] positron emission tomography (WB-PET) may be more cost-effective than chest PET because WB-PET does not require conventional imaging (CI) for extrathoracic staging. The cost-effectiveness of WB-PET for the management of Japanese patients with non-small-cell lung carcinoma (NSCLC) was assessed. Decision-tree sensitivity analysis was designed, based on the two competing strategies of WB-PET vs. CI. WB-PET was assumed to have a sensitivity and specificity for detecting metastases, of 90% to 100% and CI of 80% to 90%. The prevalences of M1 disease were 34% and 20%. On thousand patients suspected of having NSCLC were simulated in each strategy. We surveyed the relevant literature for the choice of variables. Expected cost saving (CS) and expected life expectancy (LE) for NSCLC patients were calculated. The WB-PET strategy yielded an expected CS of $951 US to $1,493 US per patient and an expected LE of minus 0.0246 years to minus 0.0136 years per patient for the 71.4% NSCLC and 34% M1 disease prevalence at our hospital. PET avoided unnecessary bronchoscopies and thoracotomies for incurable and benign disease. Overall, the CS for each patient was $833 US to $2,010 US at NSCLC prevalences ranging from 10% to 90%. The LE of the WB-PET strategy was similar to that of the CI strategy. The CS and LE minimally varied in the two situations of 34% and 20% M1 disease prevalence. The introduction of a WB-PET strategy in place of CI for managing NSCLC patients is potentially cost-effective in Japan. (author)

  12. 基于灰度共生矩阵的肝癌B超纹理特征决策树诊断分析%Decision Tree Diagnosis Analysis of Liver B Ultrasonic Feature

    Institute of Scientific and Technical Information of China (English)

    张慧; 迟庆云; 刘彩霞

    2015-01-01

    Objective To study the application of the liver B ultrasonic image texture feature in malignant liver lesions through the method of data mining on liver B ultrasonic texture image feature analysis, based on gray level co-occurrence matrix (GLCM) and decision tree classification. Method 120cases of liver B ultrasound image of normal, benign and malignant tumors were selected for analysis. After enhancement denoising processing, the parameter information of texture feature was extracted through constructing the GLCM, reflecting angle information of co-occurrence matrix. Then analysis diagnosis was performed in combination with decision tree algorithm (all of the patients were examined with preoperative 2D ultrasound, and confirmed by the pathological examination).Results Using this method, the liver typical pathological image classification accuracy can reach 83.33%. For malignant lesions, the recall rate was 83.3%, the precision rate was 73.9%, the harmonic mean F_mean was 90.9% and receiver operating characteristic (ROC) 85.3%. These results show that this method has higher diagnostic rate.Conclusion Texture features calculation method in this paper is a rapid and effective method to analyze the liver B ultrasonic texture image feature, with higher classification accuracy than other methods. This method may be an effective way for clinical assistant diagnosis. It can provide quantitative basis for diagnosis of liver disease. Besides, it also provides typical data for image recognition, data mining and image indexing.%目的:应用灰度共生矩阵和决策树分类的挖掘的方法对肝脏B超纹理特征进行分析,探讨肝脏B超影像纹理特征在肝脏恶性病灶中的应用。方法随即选取120例正常肝脏、肝脏良性病变,肝脏恶性肿瘤的肝脏B超影像进行增强去噪处理,通过构建反映共生矩阵各角度信息的灰度共生矩阵提取纹理特征参数,结合决策树算法进行分析诊断(所有患者术

  13. EVFDT: An Enhanced Very Fast Decision Tree Algorithm for Detecting Distributed Denial of Service Attack in Cloud-Assisted Wireless Body Area Network

    Directory of Open Access Journals (Sweden)

    Rabia Latif

    2015-01-01

    Full Text Available Due to the scattered nature of DDoS attacks and advancement of new technologies such as cloud-assisted WBAN, it becomes challenging to detect malicious activities by relying on conventional security mechanisms. The detection of such attacks demands an adaptive and incremental learning classifier capable of accurate decision making with less computation. Hence, the DDoS attack detection using existing machine learning techniques requires full data set to be stored in the memory and are not appropriate for real-time network traffic. To overcome these shortcomings, Very Fast Decision Tree (VFDT algorithm has been proposed in the past that can handle high speed streaming data efficiently. Whilst considering the data generated by WBAN sensors, noise is an obvious aspect that severely affects the accuracy and increases false alarms. In this paper, an enhanced VFDT (EVFDT is proposed to efficiently detect the occurrence of DDoS attack in cloud-assisted WBAN. EVFDT uses an adaptive tie-breaking threshold for node splitting. To resolve the tree size expansion under extreme noise, a lightweight iterative pruning technique is proposed. To analyze the performance of EVFDT, four metrics are evaluated: classification accuracy, tree size, time, and memory. Simulation results show that EVFDT attains significantly high detection accuracy with fewer false alarms.

  14. Research of H5N6 Treatment by Comparing with H6N1 and H10N8 by Using Decision Tree and Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Kim Sunghyun

    2016-01-01

    Full Text Available Since 2003, 608 people in 15 countries have infected with human-infectious AI viruses and 359 of them died. Especially, in China, H6N1 and H10N8 viruses were wide-spread and a lot of people were infected and died. Recently, H5N6 virus emerged in China and the number of patients has been increasing gradually. Therefore, this research compared amino acid strain of Matrix Protein, Hemagglutinin, Neuraminidase and Nucleoprotein of H5N6, H6N1 and H10N8, by using Decision tree and Apriori Algorithm, to figure out their similarity and devise the treatment. In result, Matrix protein and Nucleoprotein sequences of H5N6 were similar with those of H6N1 and H10N8. Therefore, this research concluded that the treatment targeting those proteins of H6N1 and H10N8 will be also effective to H5N6.

  15. Fault detection and diagnosis for gas turbines based on a kernelized information entropy model.

    Science.gov (United States)

    Wang, Weiying; Xu, Zhiqiang; Tang, Rui; Li, Shuying; Wu, Wei

    2014-01-01

    Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms.

  16. Fault Detection and Diagnosis for Gas Turbines Based on a Kernelized Information Entropy Model

    Directory of Open Access Journals (Sweden)

    Weiying Wang

    2014-01-01

    Full Text Available Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms.

  17. A best-first soft/hard decision tree searching MIMO decoder for a 4 × 4 64-QAM system

    KAUST Repository

    Shen, Chungan

    2012-08-01

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. © 2011 IEEE.

  18. Predicting aquatic toxicities of chemical pesticides in multiple test species using nonlinear QSTR modeling approaches.

    Science.gov (United States)

    Basant, Nikita; Gupta, Shikha; Singh, Kunwar P

    2015-11-01

    In this study, we established nonlinear quantitative-structure toxicity relationship (QSTR) models for predicting the toxicities of chemical pesticides in multiple aquatic test species following the OECD (Organization for Economic Cooperation and Development) guidelines. The decision tree forest (DTF) and decision tree boost (DTB) based QSTR models were constructed using a pesticides toxicity dataset in Selenastrum capricornutum and a set of six descriptors. Other six toxicity data sets were used for external validation of the constructed QSTRs. Global QSTR models were also constructed using the combined dataset of all the seven species. The diversity in chemical structures and nonlinearity in the data were evaluated. Model validation was performed deriving several statistical coefficients for the test data and the prediction and generalization abilities of the QSTRs were evaluated. Both the QSTR models identified WPSA1 (weighted charged partial positive surface area) as the most influential descriptor. The DTF and DTB QSTRs performed relatively better than the single decision tree (SDT) and support vector machines (SVM) models used as a benchmark here and yielded R(2) of 0.886 and 0.964 between the measured and predicted toxicity values in the complete dataset (S. capricornutum). The QSTR models applied to six other aquatic species toxicity data yielded R(2) of >0.92 (DTF) and >0.97 (DTB), respectively. The prediction accuracies of the global models were comparable with those of the S. capricornutum models. The results suggest for the appropriateness of the developed QSTR models to reliably predict the aquatic toxicity of chemicals and can be used for regulatory purpose.

  19. Análise dos atributos do solo e da produtividade da cultura de cana-de-açúcar com o uso da geoestatística e árvore de decisão Analyze the soil attributes and sugarcane yield culture with the use of geostatistics and decision trees

    Directory of Open Access Journals (Sweden)

    Zigomar Menezes de Souza

    2010-04-01

    , applying the cell criterion, by using a yield monitor that allowed the elaboration of a digital map representing the surface of production of the studied area. To determine the soil attributes, soil samples were collected at the beginning of the harvest in 2006/2007 using a regular grid of 50 x 50m, in the depths of 0.0-0.2m and 0.2-0.4m. Soil attributes and sugarcane yield data were analyzed by using geostatistics techniques and were classified into three yield levels for the elaboration of the decision tree. The decision tree was induced in the software SAS Enterprise Miner, using an algorithm based on entropy reduction. Altitude and potassium presented the highest values of correlation with sugarcane yield. The induction of decision trees showed that the altitude is the variable with the greatest potential to interpret the sugarcane yield maps, then assisting in precision agriculture and, revealing an adjusted tool for the study of management definition zones in area cropped with sugarcane.

  20. Model-based geostatistics

    CERN Document Server

    Diggle, Peter J

    2007-01-01

    Model-based geostatistics refers to the application of general statistical principles of modeling and inference to geostatistical problems. This volume provides a treatment of model-based geostatistics and emphasizes on statistical methods and applications. It also features analyses of datasets from a range of scientific contexts.

  1. A decision support model for improving a multi-family housing complex based on CO2 emission from electricity consumption.

    Science.gov (United States)

    Hong, Taehoon; Koo, Choongwan; Kim, Hyunjoong

    2012-12-15

    The number of deteriorated multi-family housing complexes in South Korea continues to rise, and consequently their electricity consumption is also increasing. This needs to be addressed as part of the nation's efforts to reduce energy consumption. The objective of this research was to develop a decision support model for determining the need to improve multi-family housing complexes. In this research, 1664 cases located in Seoul were selected for model development. The research team collected the characteristics and electricity energy consumption data of these projects in 2009-2010. The following were carried out in this research: (i) using the Decision Tree, multi-family housing complexes were clustered based on their electricity energy consumption; (ii) using Case-Based Reasoning, similar cases were retrieved from the same cluster; and (iii) using a combination of Multiple Regression Analysis, Artificial Neural Network, and Genetic Algorithm, the prediction performance of the developed model was improved. The results of this research can be used as follows: (i) as basic research data for continuously managing several energy consumption data of multi-family housing complexes; (ii) as advanced research data for predicting energy consumption based on the project characteristics; (iii) as practical research data for selecting the most optimal multi-family housing complex with the most potential in terms of energy savings; and (iv) as consistent and objective criteria for incentives and penalties.

  2. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Directory of Open Access Journals (Sweden)

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  3. Diagnosis of three types of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther I.

    2016-03-24

    We study the depth of decision trees for diagnosis of three types of constant faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis and each type of faults, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in networks. For bases containing networks with at most 10 edges, we find sharp coefficients for linear bounds.

  4. Diagnosis of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther I.

    2015-03-01

    We study the depth of decision trees for diagnosis of constant 0 and 1 faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in the networks. For bases containing networks with at most 10 edges we find coefficients for linear bounds which are close to sharp. © 2014 Elsevier B.V. All rights reserved.

  5. On Cascading small decision trees

    OpenAIRE

    Minguillón, Julià

    2003-01-01

    Consultable des del TDX Títol obtingut de la portada digitalitzada Aquesta tesi tracta sobre la utilització d'arbres de decisió petits per a la classificació i la mineria de dades. La idea intuïtiva darrera d'aquesta tesi és que una seqüència d'arbres de decisió petits pot rendir millor que un arbre de decisió gran, reduint tan el cost d'entrenament com el d'explotació. El nostre primer objectiu va ser desenvolupar un sistema capaç de reconèixer diferents tipus d'elements presents en un...

  6. Tempest in a Teapot – The Role of the Decision Tree in Enhancing Juror Comprehension and Whether It Interferes with the Jury’s Right to Deliberate Freely?

    Directory of Open Access Journals (Sweden)

    Marie Comiskey

    2016-06-01

    Full Text Available This article explores the potential of the decision tree (also referred to as a flow-chart, “Route to Verdict” or question-trail to improve the legal comprehension of jurors in criminal trials. It examines why the decision tree has not yet been adopted as a mainstream jury aid in the United States and suggests that the hesitancy is rooted in longstanding distrust of any attempt to encroach on the freedom of the jury and the concern that a list of questions to guide jury deliberations may unduly influence and compel a verdict that the jury would not otherwise render. The findings from research from England, Canada, Australia and the United States on the effectiveness of decision trees in enhancing juror comprehension is discussed. The reliance on decision trees in medicine to facilitate patient comprehension of treatment options and in assisting physicians to navigate through complex treatment protocols is also considered as instructive for the legal system. The paper suggests that decision trees neither interfere with a defendant’s constitutional right to a jury trial nor with a jury’s right to deliberate freely, and that greater use of this tool should be considered given the promising indications from empirical research that decision trees can enhance jurors’ recall and comprehension of legal concepts. Any concerns about the potential misuse of decision trees are overstated and can be remedied through clear instructions to the jury. En este artículo se analiza el potencial del árbol de decisiones (también conocido como diagrama de flujo, “ruta al veredicto” o camino de preguntas para mejorar la comprensión legal de los miembros del jurado en los juicios penales. Analiza por qué en Estados Unidos aún no se ha adoptado el árbol de decisiones como una ayuda habitual al jurado y sugiere que la duda tiene sus raíces en la desconfianza antigua de cualquier intento de invadir la libertad del jurado y en la preocupación de que

  7. Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.

    Directory of Open Access Journals (Sweden)

    Christos T Nakas

    Full Text Available Electronic Health Record (EHR data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858. Penalized logistic regression methodology provided a better result (AUC = 0.872. Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively. Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR data that have a significant amount of missing records (in up to >50% of the studied features eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures.

  8. Feature-based decision rules for control charts pattern recognition: A comparison between CART and QUEST algorithm

    Directory of Open Access Journals (Sweden)

    Shankar Chakraborty

    2012-01-01

    Full Text Available Control chart pattern (CCP recognition can act as a problem identification tool in any manufacturing organization. Feature-based rules in the form of decision trees have become quite popular in recent years for CCP recognition. This is because the practitioners can clearly understand how a particular pattern has been identified by the use of relevant shape features. Moreover, since the extracted features represent the main characteristics of the original data in a condensed form, it can also facilitate efficient pattern recognition. The reported feature-based decision trees can recognize eight types of CCPs using extracted values of seven shape features. In this paper, a different set of seven most useful features is presented that can recognize nine main CCPs, including mixture pattern. Based on these features, decision trees are developed using CART (classification and regression tree and QUEST (quick unbiased efficient statistical tree algorithms. The relative performance of the CART and QUEST-based decision trees are extensively studied using simulated pattern data. The results show that the CART-based decision trees result in better recognition performance but lesser consistency, whereas, the QUEST-based decision trees give better consistency but lesser recognition performance.

  9. Perspective: Materials Informatics and Big Data: Realization of the Fourth Paradigm of Science in Materials Science

    Science.gov (United States)

    2016-08-17

    techniques can be used. Some techniques are capable of doing both classification and regression. There also exist several ensemble learning techniques...Decision stump21 Both A weak tree-based machine learning model consisting of a single-level decision tree J48 (C4.5) decision tree22 Classification A...decision tree model that identifies the splitting attribute based on information gain/gini impurity Alternating decision tree23 Classification Tree

  10. Improvement of Tone Intelligibility for Average-Voice-Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2012-01-01

    Full Text Available Problem statement: Tone intelligibility in speech synthesis is an important attribute that should be taken into account. The tone correctness of the synthetic speech is degraded considerably in the average-voice-based HMM-based Thai speech synthesis. The tying mechanism in the decision tree based context clustering without appropriate criterion causes unexpected tone neutralization. Incorporation of the phrase intonation to the context clustering process in the training stage was proposed early. However, the tone correctness is not satisfied. Approach: This study proposes a number of tonal features including tone-geometrical features and phrase intonation features to be exploited in the context clustering process of HMM training stage. Results: In the experiments, subjective evaluations of both average voice and adapted voice in terms of the intelligibility of tone are conducted. Effects on decision trees of the extracted features are also evaluated. By considering gender in training speech, two core experiments were conducted. The first experiment shows that the proposed tonal features can improve the tone intelligibility for female speech model above that of male speech model, while the second experiment shows that the proposed tonal features improve the tone intelligibility for gender dependent model than for gender independent model. Conclusion: All of the experimental results confirm that the tone correctness of the synthesized speech from the average-voice-based HMM-based Thai speech synthesis is significantly improved when using most of the extracted features.

  11. Web Based VRML Modelling

    NARCIS (Netherlands)

    Kiss, S.

    2001-01-01

    Presents a method to connect VRML (Virtual Reality Modeling Language) and Java components in a Web page using EAI (External Authoring Interface), which makes it possible to interactively generate and edit VRML meshes. The meshes used are based on regular grids, to provide an interaction and modeling

  12. Designing efficient nitrous oxide sampling strategies in agroecosystems using simulation models

    Science.gov (United States)

    Saha, Debasish; Kemanian, Armen R.; Rau, Benjamin M.; Adler, Paul R.; Montes, Felipe

    2017-04-01

    Annual cumulative soil nitrous oxide (N2O) emissions calculated from discrete chamber-based flux measurements have unknown uncertainty. We used outputs from simulations obtained with an agroecosystem model to design sampling strategies that yield accurate cumulative N2O flux estimates with a known uncertainty level. Daily soil N2O fluxes were simulated for Ames, IA (corn-soybean rotation), College Station, TX (corn-vetch rotation), Fort Collins, CO (irrigated corn), and Pullman, WA (winter wheat), representing diverse agro-ecoregions of the United States. Fertilization source, rate, and timing were site-specific. These simulated fluxes surrogated daily measurements in the analysis. We ;sampled; the fluxes using a fixed interval (1-32 days) or a rule-based (decision tree-based) sampling method. Two types of decision trees were built: a high-input tree (HI) that included soil inorganic nitrogen (SIN) as a predictor variable, and a low-input tree (LI) that excluded SIN. Other predictor variables were identified with Random Forest. The decision trees were inverted to be used as rules for sampling a representative number of members from each terminal node. The uncertainty of the annual N2O flux estimation increased along with the fixed interval length. A 4- and 8-day fixed sampling interval was required at College Station and Ames, respectively, to yield ±20% accuracy in the flux estimate; a 12-day interval rendered the same accuracy at Fort Collins and Pullman. Both the HI and the LI rule-based methods provided the same accuracy as that of fixed interval method with up to a 60% reduction in sampling events, particularly at locations with greater temporal flux variability. For instance, at Ames, the HI rule-based and the fixed interval methods required 16 and 91 sampling events, respectively, to achieve the same absolute bias of 0.2 kg N ha-1 yr-1 in estimating cumulative N2O flux. These results suggest that using simulation models along with decision trees can reduce

  13. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS

    Science.gov (United States)

    Tien Bui, Dieu; Pradhan, Biswajeet; Nampak, Haleh; Bui, Quang-Thanh; Tran, Quynh-An; Nguyen, Quoc-Phi

    2016-09-01

    This paper proposes a new artificial intelligence approach based on neural fuzzy inference system and metaheuristic optimization for flood susceptibility modeling, namely MONF. In the new approach, the neural fuzzy inference system was used to create an initial flood susceptibility model and then the model was optimized using two metaheuristic algorithms, Evolutionary Genetic and Particle Swarm Optimization. A high-frequency tropical cyclone area of the Tuong Duong district in Central Vietnam was used as a case study. First, a GIS database for the study area was constructed. The database that includes 76 historical flood inundated areas and ten flood influencing factors was used to develop and validate the proposed model. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Receiver Operating Characteristic (ROC) curve, and area under the ROC curve (AUC) were used to assess the model performance and its prediction capability. Experimental results showed that the proposed model has high performance on both the training (RMSE = 0.306, MAE = 0.094, AUC = 0.962) and validation dataset (RMSE = 0.362, MAE = 0.130, AUC = 0.911). The usability of the proposed model was evaluated by comparing with those obtained from state-of-the art benchmark soft computing techniques such as J48 Decision Tree, Random Forest, Multi-layer Perceptron Neural Network, Support Vector Machine, and Adaptive Neuro Fuzzy Inference System. The results show that the proposed MONF model outperforms the above benchmark models; we conclude that the MONF model is a new alternative tool that should be used in flood susceptibility mapping. The result in this study is useful for planners and decision makers for sustainable management of flood-prone areas.

  14. Structural Equation Model Trees

    Science.gov (United States)

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  15. Classificação espectral de área plantada com a cultura da cana-de-açúcar por meio da árvore de decisão Spectral classification of planted area with sugarcane through the decision tree

    Directory of Open Access Journals (Sweden)

    Rafael C. Delgado

    2012-04-01

    Full Text Available O objetivo deste trabalho foi testar o classificador "árvore de decisão", em dados provenientes de sensores orbitais, para identificar área plantada com cana-de-açúcar, em diferentes épocas de plantio na Fazenda Boa Fé, localizada no Triângulo Mineiro, mais especificamente no município de Conquista, Minas Gerais. Acoplaram-se técnicas de Sensoriamento Remoto (SR em um módulo de Sistema de Informação Geográfica (SIG, permitindo uma análise temporal do uso e ocupação do solo, especialmente com vistas a identificar e a monitorar as áreas agrícolas. Com base no cálculo do viés médio (VM, o presente estudo mostrou que, em áreas de cana-de-açúcar, onde a irrigação é frequente e ocorrem chuvas significativas que antecedem a passagem do satélite Landsat-5, os valores foram ligeiramente subestimados, com valor deste indicador de -0,13 ha. Foi verificado, também, que os valores de NDVI mais altos proporcionaram uma leve superestimativa dos resultados, com valores de viés médio variando de 0,04 a 0,23 ha. Conforme os resultados, o classificador árvore de decisão apresentou um grande potencial para o mapeamento das áreas cultivadas com cana-de-açúcar.This study was carried out to test the "decision tree" classifier via remote sensing (RS, to identify planted areas with sugarcane, at different planting dates in Boa Fé, located in the Triângulo Mineiro, more specifically in the town of Conquista, Minas Gerais, Brazil. RS techniques, integrated into a Geographic Information System (GIS, allow a temporal analysis of land use and occupation, especially in order to identify and monitor agricultural areas. Based on the calculation of mean bias (VM, this study showed that in areas of sugarcane, where irrigation is frequent and significant rainfall occurring prior to the passage of Landsat-5, the estimated values were slightly underestimated, with the value of this indicator equal to -0.13 ha. It was also verified that the

  16. Improved diagnosis in children with partial epilepsy using a multivariable prediction model based on EEG network characteristics.

    Directory of Open Access Journals (Sweden)

    Eric van Diessen

    Full Text Available BACKGROUND: Electroencephalogram (EEG acquisition is routinely performed to support an epileptic origin of paroxysmal events in patients referred with a possible diagnosis of epilepsy. However, in children with partial epilepsies the interictal EEGs are often normal. We aimed to develop a multivariable diagnostic prediction model based on electroencephalogram functional network characteristics. METHODOLOGY/PRINCIPAL FINDINGS: Routinely performed interictal EEG recordings at first presentation of 35 children diagnosed with partial epilepsies, and of 35 children in whom the diagnosis epilepsy was excluded (control group, were used to develop the prediction model. Children with partial epilepsy were individually matched on age and gender with children from the control group. Periods of resting-state EEG, free of abnormal slowing or epileptiform activity, were selected to construct functional networks of correlated activity. We calculated multiple network characteristics previously used in functional network epilepsy studies and used these measures to build a robust, decision tree based, prediction model. Based on epileptiform EEG activity only, EEG results supported the diagnosis of with a sensitivity and specificity of 0.77 and 0.91 respectively. In contrast, the prediction model had a sensitivity of 0.96 [95% confidence interval: 0.78-1.00] and specificity of 0.95 [95% confidence interval: 0.76-1.00] in correctly differentiating patients from controls. The overall discriminative power, quantified as the area under the receiver operating characteristic curve, was 0.89, defined as an excellent model performance. The need of a multivariable network analysis to improve diagnostic accuracy was emphasized by the lack of discriminatory power using single network characteristics or EEG's power spectral density. CONCLUSIONS/SIGNIFICANCE: Diagnostic accuracy in children with partial epilepsy is substantially improved with a model combining functional

  17. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  18. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  19. Assessment for the Model Predicting of the Cognitive and Language Ability in the Mild Dementia by the Method of Data-Mining Technique

    Directory of Open Access Journals (Sweden)

    Haewon Byeon

    2016-06-01

    Full Text Available Assessments of cognitive and verbal functions are widely used as screening tests to detect early dementia. This study developed an early dementia prediction model for Korean elderly based on random forest algorithm and compared its results and precision with those of logistic regression model and decision tree model. Subjects of the study were 418 elderly (135 males and 283 females over the age of 60 in local communities. Outcome was defined as having dementia and explanatory variables included digit span forward, digit span backward, confrontational naming, Rey Complex Figure Test (RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, RCFT recognition false positive, Seoul Verbal Learning Test (SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, Korean Color Word Stroop Test (K-CWST color reading correct, and K-CWST color reading error. The Random Forests algorithm was used to develop prediction model and the result was compared with logistic regression model and decision tree based on chi-square automatic interaction detector (CHAID. As the result of the study, the tests with high level of predictive power in the detection of early dementia were verbal memory, visuospatial memory, naming, visuospatial functions, and executive functions. In addition, the random forests model was more accurate than logistic regression and CHIAD. In order to effectively detect early dementia, development of screening test programs is required which are composed of tests with high predictive power.

  20. 一种改进的SVM决策树文本分类算法%Text Classifier Based on an Improved SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    赵天昀

    2010-01-01

    将SVM和二叉决策树结合起来构成SVM决策树的方法能够较好地解决多类文本分类问题,在此基础上引入了一种基于支持向量数据描述(SVDD)的类间可分性度量方法,对SVM决策树分类器进行改进,实验表明,该方法有效地提高了SVM决策树多类分类器的分类精度和速度.

  1. A Packet-classification Algorithm Based on Hash and AQT Decision Tree%基于Hash和AQT的类决策树包分类算法研究

    Institute of Scientific and Technical Information of China (English)

    赵国锋; 陈群丽

    2010-01-01

    多维包分类算法是网络安全、网络测量、服务质量、流路由等技术的重要组成部分,然而设计一种在时间上和空间上均占优的包分类算法却十分困难.在研究现有的经典IP包分类算法的基础上,根据协议类型域有限取值的特点提出了一种基于Hash函数和AQT的决策树的新型IP包分类算法.仿真结果表明:相比传统包分类算法,该算法具有更低的时空复杂度.

  2. Applied Research on Data Mining Based on CART Decision Tree Algorithm%基于CART决策树数据挖掘算法的应用研究

    Institute of Scientific and Technical Information of China (English)

    陈辉林; 夏道勋

    2011-01-01

    分类与回归树CART算法是数据挖掘技术中重要的算法.依据CART算法理论,采用类型变量求解决策树,并引入优化的分裂函数,然后利用基于类型变量的论域划分创建二叉树,抽取和筛选预测准则,从而为职能部门决策提供科学而可靠的依据.最后以贵州师范大学教学与管理中的数据,给出算法的应用实例.

  3. Analysis of Polytechnic Students Entrepreneurial Intent Based on Decision Tree%基于决策树的高职学生创业倾向分析

    Institute of Scientific and Technical Information of China (English)

    陈玉珍

    2010-01-01

    介绍了决策树的概念和基于CART算法的决策树的生成方法;给出了利用CART算法建立的高职学生创业倾向因素分析的决策树模型;分析并概括了高职学生创业倾向的一些关键规则和模式.

  4. 结合丰度特征的决策树及其土地覆盖分类%Landcover Classification Based on Decision Tree with Abundance

    Institute of Scientific and Technical Information of China (English)

    张滔; 张友静; 谢丽军

    2010-01-01

    将混合像元分解的丰度加入特征集,结合光谱信息和DEM数据生成决策分类规则.运用陆地卫星TM影像对黄河源区的玛多县进行土地覆盖分类试验.通过特征提取、决策分类和后处理,得到该县的土地覆盖类型图.采用1∶10万土地覆盖类型图和实地考察数据进行精度评价,结果表明:结合丰度的决策树与最大似然分类和普通决策树分类(不加丰度信息)相比,分类精度分别提高了17.3%和9.5%.

  5. Research on Keyphrases Extraction Based on DecisionTree%基于决策树的关键短语抽取的研究

    Institute of Scientific and Technical Information of China (English)

    严春风

    2009-01-01

    Keywords extraction can be considered as the basic and the key technology for all automatically-handling text. Many texts have no keyword. Meanwhile, manual indexing is an arduous and time consuming job with high subjectivity. Therefore, automatic indexing of keywords is a kind of technology worth researching.%关键词提取可以作为所有文本自动处理的基础与核心技术.较多文档都不具有关键词,同时手工标引费力费时且主观性较强,因此关键词自动标引是一项值得研究的技术.

  6. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  7. The relation of student behavior, peer status, race, and gender to decisions about school discipline using CHAID decision trees and regression modeling.

    Science.gov (United States)

    Horner, Stacy B; Fireman, Gary D; Wang, Eugene W

    2010-04-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about discipline. Exploratory results using classification tree analyses indicated students nominated as average or highly overtly aggressive were more likely to be disciplined than others. Among these students, race was the most significant predictor, with African American students more likely to be disciplined than Caucasians, Hispanics, or Others. Among the students nominated as low in overt aggression, a lack of prosocial behavior was the most significant predictor. Confirmatory analysis using hierarchical logistic regression supported the exploratory results. Similarities with other biased referral patterns, proactive classroom management strategies, and culturally sensitive recommendations are discussed.

  8. Statistical Model for Prediction of Diabetic Foot Disease in Type 2 Diabetic Patients

    Directory of Open Access Journals (Sweden)

    Raúl López Fernández

    2016-02-01

    Full Text Available Background: the need to predict and study diabetic foot problems is a critical issue and represents a major medical challenge. The reduction of its incidence can lead to positive results for improving the quality of life of patients and the impact on the socio-economic sphere, due to the high prevalence of diabetes in the working population. Objective: to design a statistical model for prediction of diabetic foot disease in type 2 diabetic patients. Methods: a descriptive study was conducted in patients attending the Diabetes Clinic in Cienfuegos from 2010 to 2013. Significant risk factors for diabetic foot disease were analyzed as variables. To design the model, binary logistic regression analysis and Chi-squared automatic interaction detection decision tree were used. Results: two models that behaved similarly based on the comparison criteria considered (percentage of correct classification, sensitivity and specificity were developed. Validation was established through the receiver operating characteristic curve. The model using Chi-squared automatic interaction detection showed the best predictive results. Conclusions: Chi-squared automatic interaction detection decision trees have an adequate predictive capacity, which can be used in the Diabetes Clinic of Cienfuegos municipality.

  9. Model Based Definition

    Science.gov (United States)

    Rowe, Sidney E.

    2010-01-01

    In September 2007, the Engineering Directorate at the Marshall Space Flight Center (MSFC) created the Design System Focus Team (DSFT). MSFC was responsible for the in-house design and development of the Ares 1 Upper Stage and the Engineering Directorate was preparing to deploy a new electronic Configuration Management and Data Management System with the Design Data Management System (DDMS) based upon a Commercial Off The Shelf (COTS) Product Data Management (PDM) System. The DSFT was to establish standardized CAD practices and a new data life cycle for design data. Of special interest here, the design teams were to implement Model Based Definition (MBD) in support of the Upper Stage manufacturing contract. It is noted that this MBD does use partially dimensioned drawings for auxiliary information to the model. The design data lifecycle implemented several new release states to be used prior to formal release that allowed the models to move through a flow of progressive maturity. The DSFT identified some 17 Lessons Learned as outcomes of the standards development, pathfinder deployments and initial application to the Upper Stage design completion. Some of the high value examples are reviewed.

  10. Machine Learning Approaches for Modeling Spammer Behavior

    CERN Document Server

    Islam, Md Saiful; Islam, Md Rafiqul

    2010-01-01

    Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Na\\"ive Bayesian classifier (Na\\"ive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

  11. lD3决策树算法对教学辅助系统的优化%Application of lD3 decision tree mining in teaching assistant system

    Institute of Scientific and Technical Information of China (English)

    樊妍妍

    2016-01-01

    为实现教学辅助系统的个性化,使用ID3决策树算法,通过将学生某门课某章节的在线学习信息作为挖掘对象,找出其中影响学习效果的分类规则,分析学生的学习情况,给出个性化提示,从而实现因材施教。%In order to achieve personalized teaching assistant system, use of ID3 decision tree algorithm, through to students in a class a chapter on test scores as mining object, find out which influence the results of learning classification rules predicted the learning effect of students, analysis of students' learning situation, give personalized tips, to teach students in accordance with their aptitude.

  12. A Comparative on Noise Resistance for Two Heuristic Algorithms in Decision Tree Generation%2种启发式算法抗噪能力的对比研究

    Institute of Scientific and Technical Information of China (English)

    周宁; 谢博鋆; 王涛

    2011-01-01

    The capability of noise cancelling in decision tree is the critical factor in heuristic algorithms design. The comparison between ID3 and DoI, the two heuristic algorithms for the capacity of resisting noise was investigated. The investigation was aiming at giving some experimentally comparative advantages on the robustness for the two heuristics.%决策树抵抗噪声的能力是启发式算法设计中的关键因素.对ID3和DoI 2种启发式算法在抵抗噪声的能力上做了对比研究.通过实验比较得出由DoI算法构建出的决策树在抵抗噪声的干扰方面与根据ID3算法构建出的决策树相比具有一定优势.

  13. Testing and Treating Women after Unsuccessful Conservative Treatments for Overactive Bladder or Mixed Urinary Incontinence: A Model-Based Economic Evaluation Based on the BUS Study

    Science.gov (United States)

    Barton, Pelham; Middleton, Lee J.; Deeks, Jonathan J.; Daniels, Jane P.; Latthe, Pallavi; Coomarasamy, Arri; Rachaneni, Suneetha; McCooty, Shanteela; Verghese, Tina S.; Roberts, Tracy E.

    2016-01-01

    Objective To compare the cost-effectiveness of bladder ultrasonography, clinical history, and urodynamic testing in guiding treatment decisions in a secondary care setting for women failing first line conservative treatment for overactive bladder or urgency-predominant mixed urinary incontinence. Design Model-based economic evaluation from a UK National Health Service (NHS) perspective using data from the Bladder Ultrasound Study (BUS) and secondary sources. Methods Cost-effectiveness analysis using a decision tree and a 5-year time horizon based on the outcomes of cost per woman successfully treated and cost per Quality-Adjusted Life-Year (QALY). Deterministic and probabilistic sensitivity analyses, and a value of information analysis are also undertaken. Results Bladder ultrasonography is more costly and less effective test-treat strategy than clinical history and urodynamics. Treatment on the basis of clinical history alone has an incremental cost-effectiveness ratio (ICER) of £491,100 per woman successfully treated and an ICER of £60,200 per QALY compared with the treatment of all women on the basis of urodynamics. Restricting the use of urodynamics to women with a clinical history of mixed urinary incontinence only is the optimal test-treat strategy on cost-effectiveness grounds with ICERs of £19,500 per woman successfully treated and £12,700 per QALY compared with the treatment of all women based upon urodynamics. Conclusions remained robust to sensitivity analyses, but subject to large uncertainties. Conclusions Treatment based upon urodynamics can be seen as a cost-effective strategy, and particularly when targeted at women with clinical history of mixed urinary incontinence only. Further research is needed to resolve current decision uncertainty. PMID:27513926

  14. Probabilistic, meso-scale flood loss modelling

    Science.gov (United States)

    Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno

    2016-04-01

    Flood risk analyses are an important basis for decisions on flood risk management and adaptation. However, such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments and even more for flood loss modelling. State of the art in flood loss modelling is still the use of simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood loss models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we demonstrate and evaluate the upscaling of the approach to the meso-scale, namely on the basis of land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany (Botto et al. submitted). The application of bagging decision tree based loss models provide a probability distribution of estimated loss per municipality. Validation is undertaken on the one hand via a comparison with eight deterministic loss models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official loss data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of loss estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation approach is that it inherently provides quantitative information about the uncertainty of the prediction. References: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64. Botto A, Kreibich H, Merz B, Schröter K (submitted) Probabilistic, multi-variable flood loss modelling on the meso-scale with BT-FLEMO. Risk Analysis.

  15. Reputation Detection of Credit Card Based on SVM%基于组合分类器的信用卡信誉检测

    Institute of Scientific and Technical Information of China (English)

    周宓

    2012-01-01

    给出了支持向量机的信用卡信誉检测模型和基于决策树的信用卡信誉检测模型的建立方法,并在这两种单一分类器的基础上,归纳总结支持向量机方法和决策树方法对信用卡信誉检测的偏好特性,提出了一种基于偏好特性进行组合的组合分类器模型建立方法.%Credit testing model of support vector machine and construction mehtod of credit testing model based on decision tree were given. Based on the two single classifier, preferences of credift card credit test- ing supporting support vector machine and decision tree were concluded and summarized. Construction meh- ted of combined classification model was proposed based on combination of preference characteristics.

  16. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  17. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  18. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    Energy Technology Data Exchange (ETDEWEB)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  19. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...

  20. Support vector machine model for diagnosing pneumoconiosis based on wavelet texture features of digital chest radiographs.

    Science.gov (United States)

    Zhu, Biyun; Chen, Hui; Chen, Budong; Xu, Yan; Zhang, Kuan

    2014-02-01

    This study aims to explore the classification ability of decision trees (DTs) and support vector machines (SVMs) to discriminate between the digital chest radiographs (DRs) of pneumoconiosis patients and control subjects. Twenty-eight wavelet-based energy texture features were calculated at the lung fields on DRs of 85 healthy controls and 40 patients with stage I and stage II pneumoconiosis. DTs with algorithm C5.0 and SVMs with four different kernels were trained by samples with two combinations of the texture features to classify a DR as of a healthy subject or of a patient with pneumoconiosis. All of the models were developed with fivefold cross-validation, and the final performances of each model were compared by the area under receiver operating characteristic (ROC) curve. For both SVM (with a radial basis function kernel) and DT (with algorithm C5.0), areas under ROC curves (AUCs) were 0.94 ± 0.02 and 0.86 ± 0.04 (P = 0.02) when using the full feature set and 0.95 ± 0.02 and 0.88 ± 0.04 (P = 0.05) when using the selected feature set, respectively. When built on the selected texture features, the SVM with a polynomial kernel showed a higher diagnostic performance with an AUC value of 0.97 ± 0.02 than SVMs with a linear kernel, a radial basis function kernel and a sigmoid kernel with AUC values of 0.96 ± 0.02 (P = 0.37), 0.95 ± 0.02 (P = 0.24), and 0.90 ± 0.03 (P = 0.01), respectively. The SVM model with a polynomial kernel built on the selected feature set showed the highest diagnostic performance among all tested models when using either all the wavelet texture features or the selected ones. The model has a good potential in diagnosing pneumoconiosis based on digital chest radiographs.

  1. Model Construct Based Enterprise Model Architecture and Its Modeling Approach

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    In order to support enterprise integration, a kind of model construct based enterprise model architecture and its modeling approach are studied in this paper. First, the structural makeup and internal relationships of enterprise model architecture are discussed. Then, the concept of reusable model construct (MC) which belongs to the control view and can help to derive other views is proposed. The modeling approach based on model construct consists of three steps, reference model architecture synthesis, enterprise model customization, system design and implementation. According to MC based modeling approach a case study with the background of one-kind-product machinery manufacturing enterprises is illustrated. It is shown that proposal model construct based enterprise model architecture and modeling approach are practical and efficient.

  2. HMM-based Trust Model

    DEFF Research Database (Denmark)

    ElSalamouny, Ehab; Nielsen, Mogens; Sassone, Vladimiro

    2010-01-01

    with their dynamic behaviour. Using Hidden Markov Models (HMMs) for both modelling and approximating the behaviours of principals, we introduce the HMM-based trust model as a new approach to evaluating trust in systems exhibiting dynamic behaviour. This model avoids the fixed behaviour assumption which is considered...... the major limitation of existing Beta trust model. We show the consistency of the HMM-based trust model and contrast it against the well known Beta trust model with the decay principle in terms of the estimation precision....

  3. Model-based Software Engineering

    DEFF Research Database (Denmark)

    2010-01-01

    The vision of model-based software engineering is to make models the main focus of software development and to automatically generate software from these models. Part of that idea works already today. But, there are still difficulties when it comes to behaviour. Actually, there is no lack in models...

  4. Model-Based Reasoning

    Science.gov (United States)

    Ifenthaler, Dirk; Seel, Norbert M.

    2013-01-01

    In this paper, there will be a particular focus on mental models and their application to inductive reasoning within the realm of instruction. A basic assumption of this study is the observation that the construction of mental models and related reasoning is a slowly developing capability of cognitive systems that emerges effectively with proper…

  5. Decision tree algorithm of automatically extracting mangrove forests information from Landsat 8 OLI imagery%基于决策树方法的Landsat8 OLI影像红树林信息自动提取

    Institute of Scientific and Technical Information of China (English)

    张雪红

    2016-01-01

    NDMI ( normalized difference moisture index ) is widely used to assess and retrieve vegetation liquid water content. In this study, decision tree method was employed to automatically extract mangrove forests information combining the NDMI and MNDPI ( modified normalized difference pond index) , modified according to the mangrove characteristics, with Landsat8 OLI imagery acquired at Shankou mangrove national ecosystem nature reserve in Guangxi. The research results show that mangrove forests spectra consist of vegetation and wetland characteristics due to the unique near-shore coastal habitat of mangrove forests. MNDPI and NDMI can represent the spectral contrast between shortwave infrared region and visible region, near infrared region respectively. Therefore, the two spectral indices can successfully be employed to extract wetland vegetation and effectively discriminate mangrove forests from other land cover types. The decision tree method effectively extracted mangrove forests information by combining the classification features of MNDPI and NDMI and using Landsat8 OLI remotely sensed data. The commission error and omission error of mangrove forests were 5. 34% and 1. 69% respectively.%基于广西山口国家红树林生态自然保护区的Landsat 8 OLI 影像数据,选用广泛应用于植被液态水含量反演的归一化差值湿度指数( normalized difference moisture index,NDMI)和修正的归一化差值池塘指数( modified normal-ized difference pond index, MNDPI)作为分类特征,运用决策树方法进行红树林信息的自动提取。研究结果表明:红树林独特的滨海湿地生境特点,使其光谱同时包含植被和湿地信息; MNDPI和NDMI可分别反映可见光-近红外波段反射率同短波红外波段反射光谱的反差,可成功应用于湿地植被信息的提取,能有效地将红树林同其他地物相区分;采用Landsat8 OLI遥感数据,并结合NDMI和MNDPI分类特征构建的决策树模型可有效地

  6. Principles of models based engineering

    Energy Technology Data Exchange (ETDEWEB)

    Dolin, R.M.; Hefele, J.

    1996-11-01

    This report describes a Models Based Engineering (MBE) philosophy and implementation strategy that has been developed at Los Alamos National Laboratory`s Center for Advanced Engineering Technology. A major theme in this discussion is that models based engineering is an information management technology enabling the development of information driven engineering. Unlike other information management technologies, models based engineering encompasses the breadth of engineering information, from design intent through product definition to consumer application.

  7. Element-Based Computational Model

    Directory of Open Access Journals (Sweden)

    Conrad Mueller

    2012-02-01

    Full Text Available A variation on the data-flow model is proposed to use for developing parallel architectures. While the model is a data driven model it has significant differences to the data-flow model. The proposed model has an evaluation cycleof processing elements (encapsulated data that is similar to the instruction cycle of the von Neumann model. The elements contain the information required to process them. The model is inherently parallel. An emulation of the model has been implemented. The objective of this paper is to motivate support for taking the research further. Using matrix multiplication as a case study, the element/data-flow based model is compared with the instruction-based model. This is done using complexity analysis followed by empirical testing to verify this analysis. The positive results are given as motivation for the research to be taken to the next stage - that is, implementing the model using FPGAs.

  8. An Efficient Machine Learning Based Classification Scheme for Detecting Distributed Command & Control Traffic of P2P Botnets

    Directory of Open Access Journals (Sweden)

    Pijush Barthakur

    2013-11-01

    Full Text Available Biggest internet security threat is the rise of Botnets having modular and flexible structures. The combined power of thousands of remotely controlled computers increases the speed and severity of attacks. In this paper, we provide a comparative analysis of machine-learning based classification of botnet command & control(C&C traffic for proactive detection of Peer-to-Peer (P2P botnets. We combine some of selected botnet C&C traffic flow features with that of carefully selected botnet behavioral characteristic features for better classification using machine learning algorithms. Our simulation results show that our method is very effective having very good test accuracy and very little training time. We compare the performances of Decision Tree (C4.5, Bayesian Network and Linear Support Vector Machines using performance metrics like accuracy, sensitivity, positive predictive value(PPV and F-Measure. We also provide a comparative analysis of our predictive models using AUC (area under ROC curve. Finally, we propose a rule induction algorithm from original C4.5 algorithm of Quinlan. Our proposed algorithm produces better accuracy than the original decision tree classifier.

  9. 决策树 ID3算法在客户信息分类中的应用%Application of decision tree ID3 algorithm in classification of customer information

    Institute of Scientific and Technical Information of China (English)

    吴建源

    2014-01-01

    In modern enterprises, how to retain ecustomers is important research direction of the enterprise customer management.This paper uses the decision tree ID3 algorithm to analyze characteristics of customer attributes, realize the classification of customer information, find out the characteristics of all kinds of customers, and specifically improve the relationship with the customers, so as to avoid the customer loss, and improve the market share.%在现代企业,如何保留客户是企业客户管理的重要研究方向。使用决策树 ID3算法,分析客户的属性特征,实现客户信息的分类,找出各类客户的特征,有针对性地改善客户关系,从而避免客户流失,提高市场的占有率。

  10. 利用决策树进行数据挖掘中的信息熵计算%Calculation of Information Entropy in Data Mining with Decision Tree

    Institute of Scientific and Technical Information of China (English)

    张维东; 张凯; 董青; 孙维华

    2001-01-01

    We introduce the algorithm of how to build a decision tree by the comparison of information value or entropy and how to deal with something special, e.g. how to handle high-branching attributes, numcric attributes, missing values and how to prune. Finally, we show some source code of some modules in our implementation of this algorithm and give some introduction about the research of data mining at home and abroad.%介绍了怎样通过信息量或熵的比较来构造一个决策树的数据挖掘算法,并且就一些特殊的地方进行了讨论分析,例如怎样处理高分枝属性、数值属性和缺失数据以及怎样剪枝.利用模型系统的一些源代码来具体地实现算法中的一些模块,并且描述了国内外的有关数据挖掘的研究情况.

  11. A computer-based microarray experiment design-system for gene-regulation pathway discovery.

    Science.gov (United States)

    Yoo, Changwon; Cooper, Gregory F

    2003-01-01

    This paper reports the methods and evaluation of a computer-based system that recommends microarray experimental design for biologists - causal discovery in Gene Expression data using Expected Value of Experimentation (GEEVE). The GEEVE system uses causal Bayesian networks and generates a decision tree for recommendations. To evaluate the GEEVE system, we first built an expression simulation model based on a gene regulation model assessed by an expert biologist. Using the simulation model, we conducted a controlled study that involved 10 biologists, some of whom used GEEVE and some of whom did not. The results show that biologists who used GEEVE reached correct causal assessments about gene regulation more often than did those biologists who did not use GEEVE.

  12. Graph Model Based Indoor Tracking

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lu, Hua; Yang, Bin

    2009-01-01

    The tracking of the locations of moving objects in large indoor spaces is important, as it enables a range of applications related to, e.g., security and indoor navigation and guidance. This paper presents a graph model based approach to indoor tracking that offers a uniform data management...... infrastructure for different symbolic positioning technologies, e.g., Bluetooth and RFID. More specifically, the paper proposes a model of indoor space that comprises a base graph and mappings that represent the topology of indoor space at different levels. The resulting model can be used for one or several...... indoor positioning technologies. Focusing on RFID-based positioning, an RFID specific reader deployment graph model is built from the base graph model. This model is then used in several algorithms for constructing and refining trajectories from raw RFID readings. Empirical studies with implementations...

  13. Model-based consensus

    NARCIS (Netherlands)

    Boumans, Marcel

    2014-01-01

    The aim of the rational-consensus method is to produce “rational consensus”, that is, “mathematical aggregation”, by weighing the performance of each expert on the basis of his or her knowledge and ability to judge relevant uncertainties. The measurement of the performance of the experts is based on

  14. Model-based consensus

    NARCIS (Netherlands)

    M. Boumans

    2014-01-01

    The aim of the rational-consensus method is to produce "rational consensus", that is, "mathematical aggregation", by weighing the performance of each expert on the basis of his or her knowledge and ability to judge relevant uncertainties. The measurement of the performance of the experts is based on

  15. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals

    Directory of Open Access Journals (Sweden)

    Patricia Batres-Mendoza

    2016-03-01

    Full Text Available Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA to represent EEG signals obtained using a brain-computer interface (BCI device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT, support vector machine (SVM and k-nearest neighbor (KNN techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.

  16. Development of Interpretable Predictive Models for BPH and Prostate Cancer

    Science.gov (United States)

    Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, JA

    2015-01-01

    BACKGROUND Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. METHODS An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. RESULTS Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. CONCLUSION PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced. PMID:25780348

  17. Autoencoder-based identification of predictors of Indian monsoon

    Science.gov (United States)

    Saha, Moumita; Mitra, Pabitra; Nanjundiah, Ravi S.

    2016-10-01

    Prediction of Indian summer monsoon uses a number of climatic variables that are historically known to provide a high skill. However, relationships between predictors and predictand could be complex and also change with time. The present work attempts to use a machine learning technique to identify new predictors for forecasting the Indian monsoon. A neural network-based non-linear dimensionality reduction technique, namely, the sparse autoencoder is used for this purpose. It extracts a number of new predictors that have prediction skills higher than the existing ones. Two non-linear ensemble prediction models of regression tree and bagged decision tree are designed with identified monsoon predictors and are shown to be superior in terms of prediction accuracy. Proposed model shows mean absolute error of 4.5 % in predicting the Indian summer monsoon rainfall. Lastly, geographical distribution of the new monsoon predictors and their characteristics are discussed.

  18. Grid-based Support for Different Text Mining Tasks

    Directory of Open Access Journals (Sweden)

    Martin Sarnovský

    2009-12-01

    Full Text Available This paper provides an overview of our research activities aimed at efficient useof Grid infrastructure to solve various text mining tasks. Grid-enabling of various textmining tasks was mainly driven by increasing volume of processed data. Utilizing the Gridservices approach therefore enables to perform various text mining scenarios and alsoopen ways to design distributed modifications of existing methods. Especially, some partsof mining process can significantly benefit from decomposition paradigm, in particular inthis study we present our approach to data-driven decomposition of decision tree buildingalgorithm, clustering algorithm based on self-organizing maps and its application inconceptual model building task using the FCA-based algorithm. Work presented in thispaper is rather to be considered as a 'proof of concept' for design and implementation ofdecomposition methods as we performed the experiments mostly on standard textualdatabases.

  19. Event-Based Conceptual Modeling

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    The paper demonstrates that a wide variety of event-based modeling approaches are based on special cases of the same general event concept, and that the general event concept can be used to unify the otherwise unrelated fields of information modeling and process modeling. A set of event......-based modeling approaches are analyzed and the results are used to formulate a general event concept that can be used for unifying the seemingly unrelated event concepts. Events are characterized as short-duration processes that have participants, consequences, and properties, and that may be modeled in terms...... of information structures. The general event concept can be used to guide systems analysis and design and to improve modeling approaches....

  20. Empirically Based, Agent-based models

    Directory of Open Access Journals (Sweden)

    Elinor Ostrom

    2006-12-01

    Full Text Available There is an increasing drive to combine agent-based models with empirical methods. An overview is provided of the various empirical methods that are used for different kinds of questions. Four categories of empirical approaches are identified in which agent-based models have been empirically tested: case studies, stylized facts, role-playing games, and laboratory experiments. We discuss how these different types of empirical studies can be combined. The various ways empirical techniques are used illustrate the main challenges of contemporary social sciences: (1 how to develop models that are generalizable and still applicable in specific cases, and (2 how to scale up the processes of interactions of a few agents to interactions among many agents.