WorldWideScience

Sample records for based decision-tree models

  1. Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in

  2. Applying decision tree models to SMEs: A statistics-based model for customer relationship management

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-07-01

    Full Text Available Customer Relationship Management (CRM has been an important part of enterprise decision-making and management. In this regard, Decision Tree (DT models are the most common tools for investigating CRM and providing an appropriate support for the implementation of CRM systems. Yet, this method does not yield any estimate of the degree of separation of different subgroups involved in analysis. In this research, we compute three decision-making models in SMEs, analyzing different decision tree methods (C&RT, C4.5 and ID3. The methods are then used to compute ME and VoE for the models and they were then used to calculate the Mean Errors (ME and Variance of Errors (VoE estimates to investigate the predictive power of these methods. These decision tree methods were used to analyze small- and medium-sized enterprises (SME’s datasets. The paper proposes a powerful technical support for better directing market tends and mining in CRM. According to the findings, C&RT shows a better degree of separation. As a result, we recommend using decision tree methods together with ME and VoE to determine CRM factors.

  3. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data.With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM hi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  4. Automated soil resources mapping based on decision tree and Bayesian predictive modeling

    Institute of Scientific and Technical Information of China (English)

    周斌; 张新刚; 王人潮

    2004-01-01

    This article presents two approaches for automated building of knowledge bases of soil resources mapping.These methods used decision tree and Bayesian predictive modeling,respectively to generate knowledge from training data.With these methods,building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach.The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area,Zhejiang Province,China using TM bi-temporal imageries and GIS data.To evaluate the performance of the resultant knowledge bases,the classification results were compared to existing soil map based on field survey.The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.

  5. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    Science.gov (United States)

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  6. Statistical Decision-Tree Models for Parsing

    CERN Document Server

    Magerman, D M

    1995-01-01

    Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...

  7. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    Science.gov (United States)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  8. English BNP identification based on corpus-trained decision tree

    Institute of Scientific and Technical Information of China (English)

    孟遥; 赵铁军; 李生; 张晓光

    2001-01-01

    Finding simple, non-recursive, base noun phrase is an important step for many natural language processing applications. This paper presents a new corpus-based approach using decision tree for that purpose. In contrast to previous methods for Base NP identification, we adopt a decision tree trained from Penn Treebank to identify Base NP. And a self-learning mechanism is further integrated into our model. Experimental results show good performances using our method. The method can also be applied to processing of any other language.

  9. INDUCTION OF DECISION TREES BASED ON A FUZZY NEURAL NETWORK

    Institute of Scientific and Technical Information of China (English)

    Tang Bin; Hu Guangrui; Mao Xiaoquan

    2002-01-01

    Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.

  10. Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model

    Directory of Open Access Journals (Sweden)

    Takada Masahiro

    2012-06-01

    Full Text Available Abstract Background The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method—the alternating decision tree (ADTree. Methods Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174 and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC curve analysis to discriminate node-positive patients from node-negative patients. Results The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI, 0.689–0.850] for the model training dataset and 0.772 (95% CI: 0.689–0.856 for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763–0.774. Conclusions Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.

  11. Extracting impervious surfaces from multi-source satellite imagery based on unified conceptual model by decision tree algorithm

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Extraction of impervious surfaces is one of the necessary processes in urban change detection.This paper derived a unified conceptual model (UCM) from the vegetation-impervious surface-soil (VIS) model to make the extraction more effective and accurate.UCM uses the decision tree algorithm with indices of spectrum and texture,etc.In this model,we found both dependent and independent indices for multi-source satellite imagery according to their similarity and dissimilarity.The purpose of the indices is to remove the other land-use and land-cover types (e.g.,vegetation and soil) from the imagery,and delineate the impervious surfaces as the result.UCM has the same steps conducted by decision tree algorithm.The Landsat-5 TM image (30 m) and the Satellite Probatoire d’Observation de la Terre (SPOT-4) image (20 m) from Chaoyang District (Beijing) in 2007 were used in this paper.The results show that the overall accuracy in Landsat-5 TM image is 88%,while 86.75% in SPOT-4 image.It is an appropriate method to meet the demand of urban change detection.

  12. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  13. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Directory of Open Access Journals (Sweden)

    Barbara Kraszewska-Głomba

    2016-03-01

    Full Text Available As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT and C-reactive protein (CRP in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42 or viral (n=39 infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30, the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  14. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  15. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  16. Minimum description length criterion based decision tree dynamic pruning method in speech recognition

    Institute of Scientific and Technical Information of China (English)

    XU Xianghua; HE lin

    2006-01-01

    In phonetic decision tree based state tying, decision trees with varying leaf nodes denote models with different complexity. By studying the influence of model complexity on system performance and speaker adaptation, a decision tree dynamic pruning method based on Minimum Description Length (MDL) criterion is presented. In the method, a well-trained,large-sized phonetic decision tree is selected as an initial model set, and model complexity is computed by adding a penalty parameter which alters according to the amount of adaptation data. Largely attributed to the reasonable selection of initial models and the integration of stochastic and aptotic of MDL criterion, the proposed method gains high performance by combining with speaker adaptation.

  17. Generating Decision Trees Method Based on Improved ID3 Algorithm

    Institute of Scientific and Technical Information of China (English)

    Yang Ming; Guo Shuxu1; Wang Jun3

    2011-01-01

    The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.

  18. Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.

    Science.gov (United States)

    Beck, Kirk A.

    2005-01-01

    This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…

  19. Soil Organic Matter Mapping by Decision Tree Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHOU Bin; ZHANG Xing-Gang; WANG Fan; WANG Ren-Chao

    2005-01-01

    Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data.The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform,geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative,landscape factor-associated regular system. This system could be used to predict continuous SOM spatial distribution.By analyzing factors such as elevation, geological unit, soil type, land use, remotely sensed data, upslope contributing area, slope, aspect, planform curvature, and profile curvature, the decision tree could predict distribution of soil organic matter levels. Among these factors, elevation, land use, aspect, soil type, the first principle component of bitemporal Landsat TM, and upslope contributing area were considered the most important variables for predicting SOM. Results of the prediction between SOM content and landscape types sorted by the decision tree showed a close relationship with an accuracy of 81.1%.

  20. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    Science.gov (United States)

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value. PMID:27221983

  1. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    OpenAIRE

    R. Bou Kheir; P. K. Bøcher; M. B. Greve; M. H. Greve

    2010-01-01

    Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow directio...

  2. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  3. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-06-01

    Full Text Available Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation. This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to explain organic/mineral field measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in organic/mineral field measurements. The overall accuracy of the predictive organic/inorganic landscapes' map produced (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to facilitate the implementation of pedological/hydrological plans for conservation

  4. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-01-01

    Full Text Available Accurate information about soil organic carbon (SOC, presented in a spatially form, is prerequisite for many land resources management applications (including climate change mitigation. This paper aims to investigate the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes at unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to statistically explain SOC field measurements in hydromorphic landscapes of the chosen Danish area. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in field SOC measurements. The overall accuracy of the produced predictive SOC map (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to help with the implementation of pedological/hydrological plans for conservation and sustainable

  5. Cost effectiveness of community-based therapeutic care for children with severe acute malnutrition in Zambia: decision tree model

    OpenAIRE

    Bachmann Max O

    2009-01-01

    Abstract Background Children aged under five years with severe acute malnutrition (SAM) in Africa and Asia have high mortality rates without effective treatment. Primary care-based treatment of SAM can have good outcomes but its cost effectiveness is largely unknown. Method This study estimated the cost effectiveness of community-based therapeutic care (CTC) for children with severe acute malnutrition in government primary health care centres in Lusaka, Zambia, compared to no care. A decision...

  6. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  7. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    International Nuclear Information System (INIS)

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  8. Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference

    OpenAIRE

    Tran, Tung; Yang, Bo-Suk; Oh, Myung-Suck; Tan, Andy Chit Chiow

    2009-01-01

    This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and le...

  9. Cost Effectiveness of Imiquimod 5% Cream Compared with Methyl Aminolevulinate-Based Photodynamic Therapy in the Treatment of Non-Hyperkeratotic, Non-Hypertrophic Actinic (Solar) Keratoses: A Decision Tree Model

    OpenAIRE

    Wilson, Edward C F

    2010-01-01

    Background: Actinic keratosis (AK) is caused by chronic exposure to UV radiation (sunlight). First-line treatments are cryosurgery, topical 5-fluorouracil (5-FU) and topical diclofenac. Where these are contraindicated or less appropriate, alternatives are imiquimod and photodynamic therapy (PDT). Objective: To compare the cost effectiveness of imiquimod and methyl aminolevulinate-based PDT (MAL-PDT) from the perspective of the UK NHS. Methods: A decision tree model was populated with data fro...

  10. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  11. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  12. Computer Crime Forensics Based on Improved Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Ying Wang

    2014-04-01

    Full Text Available To find out the evidence of crime-related evidence and association rules among massive data, the classic decision tree algorithms such as ID3 for classification analysis have appeared in related prototype systems. So how to make it more suitable for computer forensics in variable environments becomes a hot issue. When selecting classification attributes, ID3 relies on computation of information entropy. Then the attributes owning more value are selected as classification nodes of the decision tress. Such classification is unrealistic under many cases. During the process of ID3 algorithm there are too many logarithms, so it is complicated to handle with the dataset which has various classification attributes. Therefore, contraposing the special demand for computer crime forensics, ID3 algorithm is improved and a novel classification attribute selection method based on Maclaurin-Priority Value First method is proposed. It adopts the foot changing formula and infinitesimal substitution to simplify the logarithms in ID3. For the errors generated in this process, an apposite constant is introduced to be multiplied by the simplified formulas for compensation. The idea of Priority Value First is introduced to solve the problems of value deviation. The performance of improved method is strictly proved in theory. Finally, the experiments verify that our scheme has advantage in computation time and classification accuracy, compared to ID3 and two existing algorithms

  13. Decision-tree induction from self-mapping space based on web

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shu-yu; ZHU Zhong-ying

    2007-01-01

    An improved decision tree method for web information retrieval with self-mapping attributes is proposed. The self-mapping tree has a value of self-mapping attribute in its internal node, and information based on dissimilarity between a pair of mapping sequences. This method selects self-mapping which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that the improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that the selfmapping decision tree is promising for data mining and knowledge discovery.

  14. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  15. PERFORMANCE EVALUATION OF C-FUZZY DECISION TREE BASED IDS WITH DIFFERENT DISTANCE MEASURES

    Directory of Open Access Journals (Sweden)

    Vinayak Mantoor

    2012-01-01

    Full Text Available With the ever-increasing growth of computer networks and emergence of electronic commerce in recent years, computer security has become a priority. Intrusion detection system (IDS is often used as another wall of protection in addition to intrusion prevention techniques. This paper introduces a concept and design of decision trees based on Fuzzy clustering. Fuzzy clustering is the core functional part of the overall decision tree development and the developed tree will be referred to as C-fuzzy decision trees. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study the performance of C-fuzzy decision tree based IDS with different distance measures. We analyzed the results of our study using KDD Cup 1999 data and compared the accuracy of the classifier with different distance measures.

  16. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    Science.gov (United States)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  17. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  18. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  19. CLOUD DETECTION BASED ON DECISION TREE OVER TIBETAN PLATEAU WITH MODIS DATA

    Directory of Open Access Journals (Sweden)

    L. Xu

    2012-07-01

    Full Text Available Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  20. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  1. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    Science.gov (United States)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  2. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev;

    2010-01-01

    distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index......) topographic parameters were generated from Digital Elevation Models (DEMs) acquired using airborne LIDAR (Light Detection and Ranging) systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type) to explain organic/mineral field...... measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186) were developed using (1) all of the parameters, (2) the primary DEM-derived topographic (morphological/hydrological) parameters only, (3) selected pairs of parameters and (4) excluding...

  3. COMBINING DECISION TREES AND K-NN FOR CASE-BASED PLANNING

    Directory of Open Access Journals (Sweden)

    Sofia Benbelkacem

    2014-11-01

    Full Text Available In everyday life, we are often faced with similar problems which we resolve with our experience. Case-based reasoning is a paradigm of problem solving based on past experience. Thus, case-based reasoning is considered as a valuable technique for the implementation of various tasks involving solving planning problem. Planning is considered as a decision support process designed to provide resources and required services to achieve specific objectives, allowing the selection of a better solution among several alternatives. However, we propose to exploit decision trees and k-NN combination to choose the most appropriate solutions. In a previous work [1], we have proposed a new planning approach guided by case-based reasoning and decision tree, called DTR, for case retrieval. In this paper, we use a classifier combination for similarity calculation in order to select the best solution to the target case. Thus, the use of the decision trees and k-NN combination allows improving the relevance of results and finding the most relevant cases.

  4. Design of TV Fault Repair Model Based on Decision Tree Algorithm%基于决策树算法的电视机故障维修模型设计

    Institute of Scientific and Technical Information of China (English)

    武彤; 程辉

    2014-01-01

    Before a television set comes into market,it is required to undergo a series of examination to guarantee its quality. Once a flaw is found,it will go to back shop to be doubly checked and repaired. The fault reason and fault component located are usually determined by their own working experience. It places very strict requirements on the workers,and cannot improve the repair efficiency. TV produc-tion line fault repair model based on the decision tree algorithm is researched which is able to accurately and quickly find out the relation-ship among the fault type,fault reason and product type. So it saves the time of looking for the fault reason and type,considerably eleva-ting the productivity of repairing.%在电视机生产线中,有许多产品质量控制检查点。产品在某个检查点查出存在质量问题,将进入返修线进行修理。在返修点由修理工人凭经验来确定故障原因及定位故障元器件类型,这样就对修理工有很高的要求,而且不能有效地提高维修工作效率。文中研究的基于决策树算法的电视机生产线故障维修模型,能够通过模型找出产品类型、故障现象与故障原因之间的关系,从而快速地确定故障类型,这样节省了查找故障原因及类型的时间,提高了维修效率。

  5. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain;

    2014-01-01

    This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power......-effective algorithm is adopted in this proposed approach to optimize the trajectory of preventive control. The paper also proposes an importance sampling algorithm on database preparation for efficient DT training for power systems with high penetration of wind power and distributed generation. The performance...... of the approach is demonstrated on a 400-bus, 200-line operational model of western Danish power system....

  6. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe;

    2014-01-01

    With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...... with outlier identification show high accuracy in the presence of variance and uncertainties due to wind power generation and other dispersed generation units. The performance of this approach is demonstrated on the operational model of western Danish power system with the scale of around 200 lines and 400...

  7. Hyper-Graph Based Documents Categorization on Knowledge from Decision Trees

    Directory of Open Access Journals (Sweden)

    Merjulah Roby

    2012-03-01

    Full Text Available This document has devised a novel representation that compactly captures a Hyper-graph Partitioning and Clustering of the documents based on the weightages. The approach we take integrates data mining and decision making to improve the effectiveness of the approach, we also present a NeC4.5 decision trees. This algorithm is creating the cluster and sub clusters according to the user query. This project is forming sub clustering in the database. Some of the datas in the database may be efficient one, so we are clustering the datas depending upon the ability.

  8. Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    Jing-Fen Zhang; Si-Min He; Jin-Jin Cai; Xing-Jun Cao; Rui-Xiang Sun; Yan Fu; Rong Zeng; Wen Gao

    2005-01-01

    In this study, we present a preprocessing method for quadrupole time-of-flight(Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.

  9. A Decision Tree-Structured Algorithm of Speaker Adaptation Based on Gaussian Similarity Analysis

    Institute of Scientific and Technical Information of China (English)

    WU Ji; WANG Zuoying

    2001-01-01

    Gaussian Similarity Analysis (GSA)algorithm can be used to estimate the similarity between two Gaussian distributed variables with full covariance matrix. Based on this algorithm, we propose a method in speaker adaptation of covariance. It is different from the traditional algorithms, which mainly focus on the adaptation of mean vector of state observation probability density. A binary decision tree is constructed offline with the similarity measure and the adaptation procedure is data-driven. It can be shown from the experiments that we can get a significant further improvement over the mean vectors adaptation.

  10. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    Directory of Open Access Journals (Sweden)

    Wan-Yu Chang

    2015-09-01

    Full Text Available In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  11. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Science.gov (United States)

    Pavlopoulos, Sotiris A; Stasis, Antonis CH; Loukis, Euripides N

    2004-01-01

    Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS) and "clear" Mitral Regurgitation (MR) using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that describe S1, S2 and the systolic

  12. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  13. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  14. Reweighting with Boosted Decision Trees

    CERN Document Server

    Rogozhnikov, A

    2016-01-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the "signal" events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  15. A Decision Tree Based Pedometer and its Implementation on the Android Platform

    Directory of Open Access Journals (Sweden)

    Juanying Lin

    2015-02-01

    Full Text Available This paper describes a decision tree (DT based ped ometer algorithm and its implementation on Android. The DT- based pedometer can classify 3 gai t patterns, including walking on level ground (WLG, up stairs (WUS and down stairs (WDS . It can discard irrelevant motion and count user’s steps accurately. The overall classifi cation accuracy is 89.4%. Accelerometer, gyroscope and magnetic field sensors are used in th e device. When user puts his/her smart phone into the pocket, the pedometer can automatica lly count steps of different gait patterns. Two methods are tested to map the acceleration from mobile phone’s reference frame to the direction of gravity. Two significant features are employed to classify different gait patterns.

  16. A Genetic Algorithm Optimized Decision Tree-SVM based Stock Market Trend Prediction System

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2010-12-01

    Full Text Available Prediction of stock market trends has been an area of great interest both to researchers attempting to uncover the information hidden in the stock market data and for those who wish to profit by trading stocks. The extremely nonlinear nature of the stock market data makes it very difficult to design a system that can predict the future direction of the stock market with sufficient accuracy. This work presents a data mining based stock market trend prediction system, which produces highly accurate stock market forecasts. The proposed system is a genetic algorithm optimized decision tree-support vector machine (SVM hybrid, which can predict one-day-ahead trends in stockmarkets. The uniqueness of the proposed system lies in the use ofthe hybrid system which can adapt itself to the changing market conditions and in the fact that while most of the attempts at stockmarket trend prediction have approached it as a regression problem, present study converts the trend prediction task into a classification problem, thus improving the prediction accuracysignificantly. Performance of the proposed hybrid system isvalidated on the historical time series data from the Bombaystock exchange sensitive index (BSE-Sensex. The system performance is then compared to that of an artificial neural network (ANN based system and a naïve Bayes based system. It is found that the trend prediction accuracy is highest for the hybrid system and the genetic algorithm optimized decision tree- SVM hybrid system outperforms both the artificial neural network and the naïve bayes based trend prediction systems.

  17. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Meisam Azad-Manjiri

    2014-04-01

    Full Text Available Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory and uses decision tree algorithm to abstract relationships between ethical principles and morality of actions. We apply this architecture to build an agent that provides guidance to health care workers faced with ethical dilemmas. Our results show that the agent is able to learn ethic well.

  18. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe;

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... in DIgSILENT PowerFactory environment and applied to western Danish Power System which is passing through a phase of major transformation. The results have shown that phasing out of central power plants coupled with large scale wind energy integration and more dependence on international ties can have...... Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented...

  19. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  20. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard;

    2013-01-01

    integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy...... hypothetical conditions influencing their trip decisions, covering the duration of fishing time, choice of fishing ground(s), when to stop fishing and return to port, and the choice of the port for landing. Fleet-based energy and economy efficiency are linked to the decision (choice) dynamics. Larger fuel...... efficiency for the value of catch per unit of fuel consumed is analysed by merging the questionnaire, logbook and VMS (vessel monitoring system) information. Logic decision trees and conditional behaviour probabilities are established from the responses of fishermen regarding a range of sequential...

  1. MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS%基于决策树的消费行为因素建模与实现

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  2. 基于决策树的消费行为因素建模与实现%MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  3. FPGA-Based Network Traffic Security:Design and Implementation Using C5.0 Decision Tree Classifier

    Institute of Scientific and Technical Information of China (English)

    Tarek Salah Sobh; Mohamed Ibrahiem Amer

    2013-01-01

    In this work, a hardware intrusion detection system (IDS) model and its implementation are introduced to perform online real-time traffic monitoring and analysis. The introduced system gathers some advantages of many IDSs: hardware based from implementation point of view, network based from system type point of view, and anomaly detection from detection approach point of view. In addition, it can detect most of network attacks, such as denial of services (DoS), leakage, etc. from detection behavior point of view and can detect both internal and external intruders from intruder type point of view. Gathering these features in one IDS system gives lots of strengths and advantages of the work. The system is implemented by using field programmable gate array (FPGA), giving a more advantages to the system. A C5.0 decision tree classifier is used as inference engine to the system and gives a high detection ratio of 99.93%.

  4. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    Science.gov (United States)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  5. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time...... of western Danish power system which is characterized by its large scale wind energy penetration and high proportion of distributed generation (DG). DIgSILENT PowerFactory is adopted for the power system simulation and Salford Predictive Modeler (SPM) is used for data mining.......-domain simulation and the process of data mining, which is then implemented online as guidelines for preventive control schemes. An algorithm named Classification and Regression Trees (CART) is used to train the DT and key to this approach lies on the accuracy of DT. This paper proposes contingency oriented DT...

  6. Induction of hybrid decision tree based on post-discretization strategy

    Institute of Scientific and Technical Information of China (English)

    WANG Limin; YUAN Senmiao

    2004-01-01

    By redefining test selection measure, we propose in this paper a new algorithm, Flexible NBTree, which induces a hybrid of decision tree and Naive Bayes. Flexible NBTree mitigates the negative effect of information loss on test selection by applying postdiscretization strategy: at each internal node in the tree, we first select the test which is the most useful for improving classification accuracy, then apply discretization of continuous tests. The finial decision tree nodes contain univariate splits as regular decision trees, but the leaves contain Naive Bayesian classifiers. To evaluate the performance of Flexible NBTree, we compare it with NBTree and C4.5, both applying pre-discretization of continuous attributes. Experimental results on a variety of natural domains indicate that the classification accuracy of Flexible NBTree is substantially improved.

  7. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  8. A Decision Tree Based Classifier to Analyze Human Ovarian Cancer cDNA Microarray Datasets.

    Science.gov (United States)

    Tsai, Meng-Hsiun; Wang, Hsin-Chieh; Lee, Guan-Wei; Lin, Yi-Chen; Chiu, Sheng-Hsiung

    2016-01-01

    Ovarian cancer is the deadliest gynaecological disease because of the high mortality rate and there is no any symptom in cancer early stage. It was often the terminal cancer period when patients were diagnosed with ovarian cancer and thus delays a good opportunity of treatment. The current common method for detecting ovarian cancer is blood testing for analyzing the tumor marker CA-125 of serum. However, specificity and sensitivity of CA-125 are insufficient for early detection. Therefore, it has become an urgent issue to look for an efficient method which precisely detects the tumor markers for ovarian cancer. This study aims to find the target genes of ovarian cancer by different algorithms of information science. Feature selection and decision tree were applied to analyze 9600 ovarian cancer-related genes. After screening the target genes, candidate genes will be analyzed by Ingenuity Pathway Analysis (IPA) software to create a genetic pathway model and to understand the interactive relationship in the different pathological stages of ovarian cancer. Finally, this research found 9 oncogenes associated with ovarian cancer and some genes had not been discovered in previous studies. This system will assist medical staffs in diagnosis and treatment at cancer early stage and improve the patient's survival. PMID:26531754

  9. 基于决策树技术的预离网客户识别模型%Identifying Model for Anticipated Communication Service-discontinuing Customers Based on Decision Tree Technology

    Institute of Scientific and Technical Information of China (English)

    李智勇; 冷夔

    2011-01-01

    The loss of customers will directly impact the survival and development of telecom enterprises,therefore,it is necessary to use the data to develop technology,identify anticipated communication service-discontinuing customers by establishing forecast test model as well as carrying out effective measures to retain.Taking the CRISP-DM(Cross-Industry Standard Process for Data Mining) as a tool,from aspects of business understanding,data understanding,data preparation,establishment of model,model evaluation and outcome arrangement,the method of establishing the model for identifying anticipated communication service-discontinuing customers was discussed in detail.Decision tree node model was used as data mining tool and technology to establish the identifying model.The model has played a positive role in the process of retaining works for communication customers and achieved good effect.%客户流失将直接影响到通信运营企业的生存与发展.对此,需要利用数据挖掘技术通过建立预测模型,将有离网倾向的客户(预离网客户)识别出来,并采用有效措施进行保有.以CRISP-DM(跨行业数据挖掘过程标准)为工具,从商业理解、数据理解、数据准备、建立模型、模型评估和结果部署6个阶段,详细阐述了预离网客户识别模型的构建方法,并以决策树节点模型作为数据挖掘工具及数据挖掘技术来建立预离网客户识别模型.预离网客户识别模型已经在移动客户保有工作当中起到了积极的作用,并取得了良好的实际效果.

  10. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Yanfeng Hong; Kun Deng

    2006-01-01

    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  11. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  12. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    Science.gov (United States)

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  13. 聚类支持下决策树模型的借阅数据分析%Analysis Of The Lending Data With Decision Tree Model Based On Clustering

    Institute of Scientific and Technical Information of China (English)

    翟剑锋

    2012-01-01

    通过对高校图书馆提供的借阅数据进行筛选、净化、转换等数据处理,研究了聚类支持下决策树分类技术及其在图书馆借阅数据中的应用。利用聚类得到决策树的训练样本,以期得到高质量的决策树并进一步提高推荐的准确率。以某高校图书馆借阅数据为例,将以上研究结果应用于该校图书馆借阅数据分析,分析的结果提供给图书馆管理者,作为馆藏政策、图书推荐、图书馆管理的参考依据。%Through the choice, purification and transfer of lending data provided by the library, probes into the features of library lending data by using data-mining technique, and then puts the research result into the use of library information system. The paper explores Decision Tree technique supported by clustering and its application in library, uses clustering analysis to obtain the training samples of Decision Tree, and then to obtain high-quality DecisionTree and further improve the preciseness of books' recommendation. Taking an University Library as an example ,the paper applies the above research results to analyze lending data. The result of the analysis offers a basis to collection-policy-making, books recommendation and library management for library managers.

  14. Teratozoospermia Classification Based on the Shape of Sperm Head Using OTSU Threshold and Decision Tree

    Directory of Open Access Journals (Sweden)

    Masdiyasa I Gede Susrama

    2016-01-01

    Full Text Available Teratozoospermia is one of the results of expert analysis of male infertility, by conducting lab tests microscopically to determine the morphology of spermatozoa, one of which is the normal and abnormal form of the head of spermatozoa. The laboratory test results are in the form of a complete image of spermatozoa. In this study, the shape of the head of spermatozoa was taken from a WHO standards book. The pictures taken had a fairly clear imaging and still had noise, thus to differentiate between the head of normal and abnormal spermatozoa, several processes need to be performed, which include: a pre-process or image adjusting, a threshold segmentation process using Otsu threshold method, and a classification process using a decision tree. Training and test data are presented in stages, from 5 to 20 data. Test results of using Otsu segmentation and a decision tree produced different errors in each level of training data, which were 70%, 75%, and 80% for training data of size 5×2, 10×2, and 20×2, respectively, with an average error of 75%. Thus, this study of using Otsu threshold segmentation and a Decision Tree can classify the form of the head of spermatozoa as abnormal or Normal

  15. Skin autofluorescence based decision tree in detection of impaired glucose tolerance and diabetes.

    Directory of Open Access Journals (Sweden)

    Andries J Smit

    Full Text Available AIM: Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons. METHODS: Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE. RESULTS: 218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%. CONCLUSION: SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.

  16. An Applied Research of Decision Tree Algorithm in Track and Field Equipment Training

    OpenAIRE

    Liu Shaoqing; Wang Kebin

    2015-01-01

    This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering c...

  17. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    Science.gov (United States)

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers. PMID:27386995

  18. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions

    Science.gov (United States)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3- contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3- pollution activities via an unsupervised learning algorithm based on δ15N- and δ18O-NO3- and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3- contamination via a decision tree model. When a combination of δ15N-, δ18O-NO3- and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO42 - and Cl- variables. The NO3- and the δ15N- and δ18O-NO3- variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3- concentrations and isotopic values. Although only the SO42 - and Cl- were selected as important discriminating variables, concentration data alone could not identify the specific NO3- sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3- contamination, an integrated approach should be set-up by combining N and O isotopes of NO3- with land-uses and physico-chemical properties, especially in areas with complex agricultural activities.

  19. Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM

    Directory of Open Access Journals (Sweden)

    Shweta Tiwari

    2012-06-01

    Full Text Available Around the world, trading in the stock market has gained huge attractiveness as a means through which, one can obtain vast profits. Attempting to profitably and precisely predict the financial market has long engrossed the interests and attention of bankers, economists and scientists alike. Stock market prediction is the act of trying, to determine the future value of a company’s stock or other financial instrument traded on a financial exchange. Accurate stock market predictions are important for many reasons. Chief among all is the need for investors, to hedge against potential market risks and the opportunities for arbitrators and speculators, to make profits by trading indexes. Stock Market is a place, where shares are issued and traded. These shares are either traded through Stock exchanges or Overthe-Counter in physical or electronic form. Data mining, as a process of discovering useful patterns, correlations has its own role in financial modeling. Data mining is a discipline in computational intelligence that deals with knowledge discovery, data analysis and full and semi-autonomous decision making. Prediction of stock market by data mining techniques has been receiving a lot of attention recently. This paper presents a hybrid system based on decision tree- rough set, for predicting the trends in the Bombay Stock Exchange (BSESENSEX with the combination of Hierarchical Hidden Markov Model. In this paper we present future trends on the bases of price earnings and dividend. The data on accounting earnings when averaged over many years help to predict the present value of future dividends.

  20. Geometric Decision Tree

    CERN Document Server

    Manwani, Naresh

    2010-01-01

    In this paper we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node, are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.

  1. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays.

  2. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. PMID:26796566

  3. Childhood Cancer-a Hospital based study using Decision Tree Techniques

    Directory of Open Access Journals (Sweden)

    K. Kalaivani

    2011-01-01

    Full Text Available Problem statement: Cancer is generally regarded as a disease of adults. But there being a higher proportion of childhood cancer (ALL-Acute Lymphoblastic Leukemia in India. The incidence of childhood cancer has increased over the last 25 years, but the increase is much larger in females. The aim was to increase our understanding of the determinants of south Indian parental reactions and needs. This facilitates the development of the care and follow-up routines for families, paying attention to both individual risk and resilience factors and to ways in which limitations related to treatment centre and organizational characteristics could be compensated. Approach: Decision Trees may be used for classification, clustering, affinity, grouping, prediction or estimation and description. One of the useful medical applications in India is the management of Leukemia, as it accounts for about 33% of childhood malignancies. Results: Female survivors showed greater functional disability in comparison to male survivors-demonstrated by poorer overall health status. Family stress results from a perceived imbalance between the demands on the family and the resources available to meet such demands. Conclusion: The pattern and severity of health and functional outcomes differed significantly between survivors in diagnostic subgroups. Family impact was aggravated by patients’ lasting sequelae and by parent perceived shortcomings of long-term follow-up. Female survivors were at greater risk for health related late effects.

  4. An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

    Science.gov (United States)

    Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

    2014-03-01

    Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. PMID:24296116

  5. 基于分类矩阵的决策树算法%Decision tree algorithm based on classification matrix

    Institute of Scientific and Technical Information of China (English)

    陶道强; 马良荔; 彭超

    2012-01-01

    为了提高决策树分类的速度和精确率,提出了一种基于分类矩阵的决策树算法.介绍了ID3算法的理论基础,定义了一种分类矩阵,指出了ID3算法的取值偏向性并利用分类矩阵给出了证明.在此基础上,引入了一个权重因子,抑制了原有算法的取值偏向,并利用分类矩阵给出相应证明,同时根据基于分类矩阵增益的特点,提出了新的决策树分类方案,旨在运算速率上进行优化,与原有算法进行了实验比较.对实验结果分析表明,优化后的方案在性能上有明显改善.%To improve the classification speed and accuracy of the decision tree algorithm, a new program is proposed based on classification matrix. Firstly, the basic theory of the ID3 algorithm is introduced and a classification matrix is defined. Then the variety bias of this algorithm is pointed out, which is proved using the classification matrix. On the basis of the above, a weighting factor is cited to suppress the variety bias of the ID3 algorithm on the premise of a corresponding proof. According to the characteristics of the gain based on the classification matrix, a new decision tree scheme is proposed, aiming to optimize computing speed. Finally, the program is compared with the ID3 algorithm through experiment Experimental results show that the optimized scheme is obviously better than the original one in performance.

  6. Decision tree methods:applicaitons for classiifcaiton and prediciton

    Institute of Scientific and Technical Information of China (English)

    Yan-yan SONG; Ying LU

    2015-01-01

    Summary:Decision tree methodology is a commonly used data mining method for establishing classiifcaiton systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can effciently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validaiton datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the opitmal ifnal model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  7. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  8. Design of a new hybrid artificial neural network method based on decision trees for calculating the Froude number in rigid rectangular channels

    Directory of Open Access Journals (Sweden)

    Ebtehaj Isa

    2016-09-01

    Full Text Available A vital topic regarding the optimum and economical design of rigid boundary open channels such as sewers and drainage systems is determining the movement of sediment particles. In this study, the incipient motion of sediment is estimated using three datasets from literature, including a wide range of hydraulic parameters. Because existing equations do not consider the effect of sediment bed thickness on incipient motion estimation, this parameter is applied in this study along with the multilayer perceptron (MLP, a hybrid method based on decision trees (DT (MLP-DT, to estimate incipient motion. According to a comparison with the observed experimental outcome, the proposed method performs well (MARE = 0.048, RMSE = 0.134, SI = 0.06, BIAS = -0.036. The performance of MLP and MLP-DT is compared with that of existing regression-based equations, and significantly higher performance over existing models is observed. Finally, an explicit expression for practical engineering is also provided.

  9. Application of portfolio theory in decision tree analysis.

    Science.gov (United States)

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation.

  10. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    Science.gov (United States)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the

  11. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India.

    Science.gov (United States)

    Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K

    2013-01-01

    The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index.

  12. Application of Random Forest Survival Models to Increase Generalizability of Decision Trees: A Case Study in Acute Myocardial Infarction

    Directory of Open Access Journals (Sweden)

    Iman Yosefian

    2015-01-01

    Full Text Available Background. Tree models provide easily interpretable prognostic tool, but instable results. Two approaches to enhance the generalizability of the results are pruning and random survival forest (RSF. The aim of this study is to assess the generalizability of saturated tree (ST, pruned tree (PT, and RSF. Methods. Data of 607 patients was randomly divided into training and test set applying 10-fold cross-validation. Using training sets, all three models were applied. Using Log-Rank test, ST was constructed by searching for optimal cutoffs. PT was selected plotting error rate versus minimum sample size in terminal nodes. In construction of RSF, 1000 bootstrap samples were drawn from the training set. C-index and integrated Brier score (IBS statistic were used to compare models. Results. ST provides the most overoptimized statistics. Mean difference between C-index in training and test set was 0.237. Corresponding figure in PT and RSF was 0.054 and 0.007. In terms of IBS, the difference was 0.136 in ST, 0.021 in PT, and 0.0003 in RSF. Conclusion. Pruning of tree and assessment of its performance of a test set partially improve the generalizability of decision trees. RSF provides results that are highly generalizable.

  13. Method for Walking Gait Identification in a Lower Extremity Exoskeleton based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors’ information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person’s motion.

  14. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    OpenAIRE

    Wan-Yu Chang; Chung-Cheng Chiu; Jia-Horng Yang

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and in...

  15. Skin Autofluorescence Based Decision Tree in Detection of Impaired Glucose Tolerance and Diabetes

    NARCIS (Netherlands)

    Smit, Andries J.; Smit, Jitske M.; Botterblom, Gijs J.; Mulder, Douwe J.

    2013-01-01

    Aim: Diabetes (DM) and impaired glucose tolerance (IGT) detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF) is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE) which are considered to be a carrier of glycometabolic memory. We compare

  16. Building Customers` Credit Scoring Models with Combination of Feature Selection and Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Zahra Davoodabadi

    Full Text Available Today`s financial transactions have been increased through banks and financial institutions. Therefore, credit scoring is a critical task to forecast the customers’ credit. We have created 9 different models for the credit scoring by combining three metho ...

  17. A Decision Tree Based Word Sense Disambiguation System in Manipuri Language

    OpenAIRE

    Richard Laishram Singh; Krishnendu Ghosh1; Kishorjit Nongmeikapam; Sivaji Bandyopadhyay

    2014-01-01

    This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri language. The paper discusses related attempts made in the Manipuri language followed by the proposed plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study. Conventional positional and context based features are suggested to capture the sense of the words, which have ambiguous and multiple senses. The proposed work is expected ...

  18. A Decision Tree Based Word Sense Disambiguation System in Manipuri Language

    Directory of Open Access Journals (Sweden)

    Richard Laishram Singh

    2014-07-01

    Full Text Available This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri language. The paper discusses related attempts made in the Manipuri language followed by the proposed plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study. Conventional positional and context based features are suggested to capture the sense of the words, which have ambiguous and multiple senses. The proposed work is expected to predict the senses of the polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The system produces an accuracy of 71.75 %.

  19. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  20. A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree

    Directory of Open Access Journals (Sweden)

    Suresh Ramakrishnan

    2015-04-01

    Full Text Available The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector indicators. The results show significant relationship between probability of default and sector indicators. The results of this study may improve the default prediction models performance and reduce the costs of risk management.

  1. Modelling alcohol consumption during adolescence using zero inflated negative binomial and decision trees

    Directory of Open Access Journals (Sweden)

    Alfonso Palmer

    2010-07-01

    Full Text Available Alcohol is currently the most consumed substance among the Spanish adolescent population. Some of the variables that bear an influence on this consumption include ease of access, use of alcohol by friends and some personality factors. The aim of this study was to analyze and quantify the predictive value of these variables specifically on alcohol consumption in the adolescent population. The useful sample was made up of 6,145 adolescents (49.8% boys and 50.2% girls with a mean age of 15.4 years (SE= 1.2. The data were analyzed using the statistical model for a count variable and Data Mining techniques. The results show the influence of ease of access, alcohol consumption by the group of friends, and certain personality factors on alcohol intake, allowing us to quantify the intensity of this influence according to age and gender. Knowing these factors is the starting point in elaborating specific preventive actions against alcohol consumption.

  2. Maximal standard dose of parenteral iron for hemodialysis patients: an MRI-based decision tree learning analysis.

    Directory of Open Access Journals (Sweden)

    Guy Rostoker

    Full Text Available Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin, in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%. The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4 with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.

  3. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    Directory of Open Access Journals (Sweden)

    Wided Khiari

    2013-09-01

    Full Text Available This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through the nature (type of information disclosed. Applying the decision tree method, the obtained Tree diagrams provide ways to know the characteristics of a particular firm regardless of its level of disclosure. Obtained results show that the characteristics of corporate governance to achieve good quality of disclosure are not unique for all firms. These structures are not necessarily all of the recommendations of best practices, but converge towards the best combination. Indeed, in practice, there are companies which have a good quality of disclosure but are not well governed. However, we hope that by improving their governance system their level of disclosure may be better. These findings show, in a general way, a convergence towards the standards of corporate governance with a few exceptions related to the specificity of Tunisian listed firms and show the need for the adoption of a code for each context. These findings shed the light on corporate governance features that enhance incentives for good disclosure. It allows identifying, for each firm and in any date, corporate governance determinants of disclosure quality. More specifically, and all being equal, obtained tree makes a rule of decision for the company to know the level of disclosure based on certain characteristics of the governance strategy adopted by the latter.

  4. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  5. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. Khader

    2012-12-01

    Full Text Available Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i ignore the health risk of nitrate contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  6. Establishment of the Associated Model between Turbid Phlegm Syndrome and Clinical Indicators in the Patients of Diabetes Type 2 Based on Decision Tree Method%基于决策树方法的2型糖尿病患者痰浊证与临床指标关联模式的建立

    Institute of Scientific and Technical Information of China (English)

    赵灵燕; 毕力夫; 张亚军; 陈建新; 赵慧辉; 戴军有; 王伟

    2014-01-01

    目的:采用决策树的数据挖掘方法建立2型糖尿病患者痰浊证与临床常规检测指标间的关联模式。方法采用多中心临床流行病学调查方法,在全国5家三级甲等医院共收集249例合格2型糖尿病病例,综合分析基本信息、中医四诊信息、临床常规检测指标。在t检验、非参数检验、Pearson相关分析基础上,进一步采用决策树的数据挖掘方法建立痰浊证与临床常规检测指标间的关联模式。结果249例患者中有106例为痰浊证,占42.57%。以尿素氮、白细胞、平均红细胞体积、超敏C反应蛋白、红细胞、甲状腺素6个核心指标建立了痰浊证决策树模型,10倍交叉验证得到模型的灵敏度为75.47%、特异度为76.22%,检测总正确率为75.90%。结论决策树模型可以清晰、直观的进行2型糖尿病患者痰浊证的判断,在证候客观化研究中显示了一定的优势。%Objective To establish the associated model between turbid phlegm syndrome and clini-cal routine indicators in the patients of diabetes type 2,using data-mining method of decision tree.Methods The multi-central clinical epidemiological investigation was adopted.Two hundred and forty-nine cases of diabetes type 2 were collected from 5 Three-A hospitals.The basic information,the information of four di-agnostic methods of TCM and clinical routine indicators were analyzed comprehensively.On the basis of the t test,nonparametric test and Pearson correlation analysis,the data-mining method of decision tree was adopt-ed further to set up the association model between turbid phlegm syndrome and clinical indicators.Results Of 249 cases,1 06 cases(42.57%)were differentiated as turbid phlegm syndrome.Six core indicators inclu-ding urea nitrogen,white blood cells,average red blood cell volume,hypersensitive C-reactive protein,eryth-rocyte and thyroxin were used to establish decision tree model of turbid phlegm syndrome

  7. A similarity study between the query mass and retrieved masses using decision tree content-based image retrieval (DTCBIR) CADx system for characterization of ultrasound breast mass images

    Science.gov (United States)

    Cho, Hyun-Chong; Hadjiiski, Lubomir; Chan, Heang-Ping; Sahiner, Berkman; Helvie, Mark; Paramagul, Chintana; Nees, Alexis V.

    2012-03-01

    We are developing a Decision Tree Content-Based Image Retrieval (DTCBIR) CADx scheme to assist radiologists in characterization of breast masses on ultrasound (US) images. Three DTCBIR configurations, including decision tree with boosting (DTb), decision tree with full leaf features (DTL), and decision tree with selected leaf features (DTLs) were compared. For DTb, the features of a query mass were combined first into a merged feature score and then masses with similar scores were retrieved. For DTL and DTLs, similar masses were retrieved based on the Euclidean distance between the feature vector of the query and those of the selected references. For each DTCBIR configuration, we investigated the use of the full feature set and the subset of features selected by the stepwise linear discriminant analysis (LDA) and simplex optimization method, resulting in six retrieval methods. Among the six methods, we selected five, DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, for the observer study. For a query mass, three most similar masses were retrieved with each method and were presented to the radiologists in random order. Three MQSA radiologists rated the similarity between the query mass and the computer-retrieved masses using a ninepoint similarity scale (1=very dissimilar, 9=very similar). For DTb-lda, DTL-lda, DTb-full, DTL-full and DTLs-full, the average Az values were 0.90+/-0.03, 0.85+/-0.04, 0.87+/-0.04, 0.79+/-0.05 and 0.71+/-0.06, respectively, and the average similarity ratings were 5.00, 5.41, 4.96, 5.33 and 5.13, respectively. Although the DTb measures had the best classification performance among the DTCBIRs studied, and DTLs had the worst performance, DTLs-full obtained higher similarity ratings than the DTb measures.

  8. 基于差分演化的GEP决策树算法%Decision Tree Algorithm by Gene Expression Programming Based on Differential Evolution

    Institute of Scientific and Technical Information of China (English)

    王卫红; 阮薇; 李曲

    2011-01-01

    基于均匀常数分布的基因表达式编程决策树算法存在对多属性数据分类效果不佳的问题.为此,提出一种基于差分演化的基因表达式编程决策树算法,该算法通过引入差分演化的方法对其附加阈值进行改进,从而使均匀的常数数组在保持均匀分布的同时仍不失多样性.实验结果表明,该方法在多属性数据的分类问题上能够得到良好的效果.%Uniformly distributed constants-based decision tree evolved by Gene Expression Programming(GEP) is a kind of classifier with fairly high accuracy, but its performance on multi-attribute data classification is not satisfactory. This paper presents an algorithm of Differential Evolution (DE)-based decision tree algorithm by GEP. This new algorithm uses differential evolution method to improve the additional threshold, and makes the uniform constant array have both uniformly and diversity. Experiments on benchmark datsets show it performs better on multi-attribute classification problems than basic GEP decision tree.

  9. A new decision tree learning algorithm

    Institute of Scientific and Technical Information of China (English)

    FANG Yong; QI Fei-hu

    2005-01-01

    In order to improve the generalization ability of binary decision trees, a new learning algorithm, the MMDT algorithm, is presented. Based on statistical learning theory the generalization performance of binary decision trees is analyzed, and the assessment rule is proposed. Under the direction of the assessment rule, the MMDT algorithm is implemented. The algorithm maps training examples from an original space to a high dimension featurespace, and constructs a decision tree in it. In the feature space, a new decision node splitting criterion, the max-min rule, is used, and the margin of each decision node is maximized using a support vector machine, to improve the generalization performance. Experimental results show that the new learning algorithm is much superior to others such as C4. 5 and OC1.

  10. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  11. Remote Sensing Image Classification Based on Decision Tree in the Karst Rocky Desertification Areas: A Case Study of Kaizuo Township

    Institute of Scientific and Technical Information of China (English)

    Shuyong; MA; Xinglei; ZHU; Yulun; AN

    2014-01-01

    Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.

  12. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  13. 基于LBP和SVM决策树的人脸表情识别%Facial Expression Recognition Based on LBP and SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    李扬; 郭海礁

    2014-01-01

    为了提高人脸表情识别的识别率,提出一种LBP和SVM决策树相结合的人脸表情识别算法。首先利用LBP算法将人脸表情图像转换为LBP特征谱,然后将LBP特征谱转换成LBP直方图特征序列,最后通过SVM决策树算法完成人脸表情的分类和识别,并且在JAFFE人脸表情库的识别中证明该算法的有效性。%In order to improve the recognition rate of facial expression, proposes a facial expression recognition algorithm based on a LBP and SVM decision tree. First facial expression image is converted to LBP characteristic spectrum using LBP algorithm, and then the LBP character-istic spectrum into LBP histogram feature sequence, finally completes the classification and recognition of facial expression by SVM deci-sion tree algorithm, and proves the effectiveness of the proposed method in the recognition of facial expression database in JAFFE.

  14. A Customer Churn Alarm Model based on the C5 .0 Decision Tree-Taking the Postal Short Message as an Example%一种基于 C5.0决策树的客户流失预测模型研究

    Institute of Scientific and Technical Information of China (English)

    张宇; 张之明

    2015-01-01

    Customer churn is an outstanding problem in the enterprise management . Avoiding the customer churn ,trying to maintain and detain the customers has already become an important project in the management and development of the enterprise .The C5 .0 decision tree algorithm is used to build a customer churn alarm model and the model is used in the short message service in Chinese postal enterprise for an empirical study .The study result shows that the model can provide a high hit rate and coverage rate ,and has a good early warning function .It can help the enterprise timely find the potential losing customers and reduce farthest the customer churn .%客户流失是企业面临的一项突出问题。防止客户流失、尽力维系与挽留客户已成为企业经营与发展的一项重要课题。本文利用C5.0决策树算法建立了一种客户流失的预测模型,并利用中国邮政短信服务的400多万条实际业务数据,对模型的有效性进行了实证研究。研究结果表明,该模型提供了较高的命中率和覆盖率,具有良好的预警功能,可帮助企业及时发现有可能流失的客户,最大程度减少客户流失。

  15. 基于粗糙变精度的食品安全决策树研究%Research on Decision Tree for Food Safety Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    鄂旭; 任骏原; 毕嘉娜; 沈德海

    2014-01-01

    Food safety decision is an important content of food safety research. Based on variable precision rough sets model,a method of building decision tree with rules that have definite confidence is proposed for food safety analysis. It is an improvement for decision tree inducing approach presented in traditional methods. Present a new algorithm for constructing decision tree with variable precision weighted mean roughness as the criteria for selecting attribute. The new algorithm used variable precision approximate accuracy instead the approxi-mate accuracy. Noisy data of training sets are considered enough. Limited inconsistency is allowed to existed examples of the positive re-gions. So the decision tree is simplified and its extensive ability is improved and more comprehensible. Experiments show that the algo-rithm is feasible and effective.%食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。

  16. Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System

    Directory of Open Access Journals (Sweden)

    Jashan Koshal

    2012-08-01

    Full Text Available Main reason for the attack being introduced to the system is because of popularity of the internet. Information security has now become a vital subject. Hence, there is an immediate need to recognize and detect the attacks. Intrusion Detection is defined as a method of diagnosing the attack and the sign of malicious activity in a computer network by evaluating the system continuously. The software that performs such task can be defined as Intrusion Detection Systems (IDS. System developed with the individual algorithms like classification, neural networks, clustering etc. gives good detection rate and less false alarm rate. Recent studies show that the cascading of multiple algorithm yields much better performance than the system developed with the single algorithm. Intrusion detection systems that uses single algorithm, the accuracy and detection rate were not up to mark. Rise in the false alarm rate was also encountered. Cascading of algorithm is performed to solve this problem. This paper represents two hybrid algorithms for developing the intrusion detection system. C4.5 decision tree and Support Vector Machine (SVM are combined to maximize the accuracy, which is the advantage of C4.5 and diminish the wrong alarm rate which is the advantage of SVM. Results show the increase in the accuracy and detection rate and less false alarm rate.

  17. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  18. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  19. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision tree approach for predicting smokers'quit intentions using the data from the International Tobacco Control Four Country Survey. Three rule-based classification models are generated from three data sets using attributes in relation to demographics, warning labels, and smokers' beliefs. Both demographic attributes and warning label attributes are important in predicting smokers' quit intentions. The model's ability to predict smokers' quit intentions is enhanced, if the attributes regarding smokers' internal motivation and beliefs about quitting are included.

  20. A Cost-Sensitive Decision Tree Learning Model—An Application to Customer-Value Based Segmentation%基于代价敏感决策树的客户价值细分

    Institute of Scientific and Technical Information of China (English)

    邹鹏; 莫佳卉; 江亦华; 叶强

    2011-01-01

    The objective of this research is to extend the current decision tree learning model, to handle data sets with unequal misclassification costs.The research explores the issue of asymmetric misclassification costs through an application to customer-value based segmentation using empirical data collected from one of the largest credit card issuing banks in China.The data includes attributes from customer satisfaction survey and credit card transaction history is used to validate the proposed model.The results show that the proposed cost-sensitive decision tree for customer-value based segmentation is an effective method compared to the original decision tree learning model.%由于错误分类代价差异和不同价值客户数量的不平衡分布,基于总体准确率的数据挖掘方法不能体现由于客户价值不同对分类效果带来的影响.为了解决错误分类不平衡的数据分类问题,利用代价敏感学习技术扩展现有决策树模型,将这一方法应用在客户价值细分,建立基于客户价值的错分代价矩阵,以分类代价最小化作为决策树分支的标准,建立分类的期望损失函数作为分类效果的评价标准,采用中国某银行的信用卡客户数据进行实验.实验结果表明,与传统决策树方法相比,代价敏感决策树对客户价值细分问题有更好的分类效果,可以更精确地控制代价敏感性和不同种分类错误的分布,降低总体的错误分类代价,使模型能更准确反映分类的代价,有效识别客户价值

  1. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    G KISHOR KUMAR; P VISWANATH; A ANANDA RAO

    2016-03-01

    For classification, decision trees have become very popular because of its simplicity, interpret-ability and good performance. To induce a decision tree classifier for data having continuous valued attributes, the most common approach is, split the continuous attribute range into a hard (crisp) partition having two or more blocks, using one or several crisp (sharp) cut points. But, this can make the resulting decision tree, very sensitive to noise.An existing solution to this problem is to split the continuous attribute into a fuzzy partition (soft partition) using soft or fuzzy cut points which is based on fuzzy set theory and to use fuzzy decisions at nodes of the tree. Theseare called soft decision trees in the literature which are shown to perform better than conventional decision trees, especially in the presence of noise. Current paper, first proposes to use an ensemble of soft decision trees forrobust classification where the attribute, fuzzy cut point, etc. parameters are chosen randomly from a probability distribution of fuzzy information gain for various attributes and for their various cut points. Further, the paperproposes to use probability based information gain to achieve better results. The effectiveness of the proposed method is shown by experimental studies carried out using three standard data sets. It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  2. 基于决策树技术分析动态图形数据的研究与实现%Research and implementation of dynamic graph data based on decision trees technology

    Institute of Scientific and Technical Information of China (English)

    雷炜; 叶东毅

    2011-01-01

    针对传统动态数据分析方法(如时间序列分析)存在对动态图分析较繁琐的问题,研究基于决策树技术进行动态图形数据分析的方法和过程.利用采集的心电图数据和SLIQ算法加以实现,所得模型准确率约为73%.%Traditional dynamic data analysis approaches such as time series analysis turn out to have shortcoming in the analysis of dynamic graphs. In this paper, a method for dynamic graph data analysis based on decision tree technique was researched and implemented by using SLIQ algorithm to analyze real electrocardiogram data. The experiment results show that the obtained model is accurate to about 73%.

  3. Detection and Extraction of Videos using Decision Trees

    Directory of Open Access Journals (Sweden)

    Sk.Abdul Nabi

    2011-12-01

    Full Text Available This paper addresses a new multimedia data mining framework for the extraction of events in videos by using decision tree logic. The aim of our DEVDT (Detection and Extraction of Videos using Decision Trees system is for improving the indexing and retrieval of multimedia information. The extracted events can be used to index the videos. In this system we have considered C4.5 Decision tree algorithm [3] which is used for managing both continuous and discrete attributes. In this process, firstly we have adopted an advanced video event detection method to produce event boundaries and some important visual features. This rich multi-modal feature set is filtered by a pre-processing step to clean the noise as well as to reduce the irrelevant data. This will improve the performance of both Precision and Recall. After producing the cleaned data, it will be mined and classified by using a decision tree model. The learning and classification steps of this Decision tree are simple and fast. The Decision Tree has good accuracy. Subsequently, by using our system we will reach maximum Precision and Recall i.e. we will extract pure video events effectively and proficiently.

  4. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  5. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  6. 基于决策树的戈壁信息提取研究%Gobi information extraction based on decision tree classification method

    Institute of Scientific and Technical Information of China (English)

    冯益明; 智长贵; 姚爱冬

    2013-01-01

    Gobi is one of the main landscape types of earth' s surface in the arid region of northwestern parts of China, with the total area of 458 000-757 000 km2, accounting for the 4.8%-7.9% of China's total land area. The gobi holds abundant natural resources such as minerals, wind energy and solar power. Meanwhile, many modern cities and towns and some important traffic routes were also constructed on the gobi region. The gobi region plays an important role in the construction of western economy. Therefore, it is important to launch the gobi research under current social and economic conditions, and accurately revealing the distribution and area of gobi is the base and premise of launching the gobi research. At present, it is difficult to do fieldwork due to the execrable natural conditions and the sparse dweller in the gobi region, which leads to the scarcity of research documents on the situation, distribution, type classification, transformation and utilization of gobi. The studied region of this paper is a typical gobi distribution region, locating in Ejina County in Inner Mongolia, China, and its climatic characteristics include lack of rain, more evaporation, full sunshine, large temperature difference and frequent windy sand weather. Using Remote Sensing imageries Landsat TM5 and TM7 of plant growth season of 2005-2010, the DEM with 30 m spatial resolution, administrative map, present land use map, field investigation data and related documents as the basic data resource. Firstly, the non-gobi distribution regions were extracted in GIS software by analyzing DEM. Then, based on the analysis of spectral characteristics of difference typical ground objects, the information extraction model of Decision Tree based on knowledge was constructed to classify the remote sensing imageries, and eroded gobi and cumulated gobi were relatively accurately separated. The general accuracy of the extracted gobi information reached 91.57%. There were few materials in China on using

  7. Optimized algorithm of decision tree based on weighting factor%基于权衡因子的决策树优化算法

    Institute of Scientific and Technical Information of China (English)

    董跃华; 刘力

    2015-01-01

    Through the analysis of the issues of multivalue bias in the ID3 algorithm and subjectivity of the optimized traditional ID3 algorithm, an improved algorithm of decision tree based on weighting factor is put forward. The new algorithm introduces the weight factor that reflects the mutual relationship between the attributes. The ID3 algorithm is improved by redistricting the weight of attributes which has most values. The experiments on UCI data sets show that the optimization ID3 algorithm can overcome multivalue bias when the values of different attributes in data set are not the same. This algorithm not only improves the accuracy of average classification, but also reduces the number of average leaf nodes in the process of constructing a decision tree.%通过分析ID3算法的多值偏向问题和传统ID3改进算法中出现的主观性等问题,提出了一种基于权衡因子的决策树优化算法. 该优化算法通过引入能够反映属性之间相互依赖关系的权衡因子,对取值个数最多的属性的划分权重重新进行权衡,以完成对ID3算法的改进. 实例验证和标准数据集UCI上的实验结果表明,当数据集中属性的取值个数不相同时,优化后的ID3算法能够解决多值偏向问题, 在构建决策树的过程中, 优化后的ID3算法既能提高平均分类准确率,又能减少平均叶子节点数.

  8. Research on Recognition and Determination on Effective Technology Innovation Based on Decision Tree%基于决策树法的有效技术创新识别认定研究

    Institute of Scientific and Technical Information of China (English)

    吴红; 李玉平; 常飞; 耿霞

    2012-01-01

    首先论述了技术创意可行性论证的必要性、可行性影响因素及论证的基本流程;然后在对有效技术创新识别认定标准及方法简要分析的基础上,以“成本—效益”为视角,利用决策树法构建有效技术创新识别的认定模型,该模型利用决策树法的逆序归纳进行信息分析,计算出不同行动的收益与成本之间的差值,根据该差值与企业利润期望值的符合程度识别认定有效技术创新.%Firstly, this paper discusses the necessary of feasibility demonstration, influencing factors of feasibility and basic process of demonstration on the theme of technological innovation. Then, based on brief analysis and the standards and methods on recognition and determination for effective technology innovation, from the perspective of Costs-Benefits, u-sing the method of decision tree to construct the recognition and determination model of effective technology innovation, which carries out information analysis by using inverted sequence and induction of decision tree, and calculates the difference value between benefits and costs of various actions, then, according to the coincidence degree between this difference value and expected profit Value of enterprise, recognizes and determinates the effective technology innovation.

  9. Short-Time Fourier Transform and Decision Tree-Based Pattern Recognition for Gas Identification Using Temperature Modulated Microhotplate Gas Sensors

    Directory of Open Access Journals (Sweden)

    Aixiang He

    2016-01-01

    Full Text Available Because the sensor response is dependent on its operating temperature, modulated temperature operation is usually applied in gas sensors for the identification of different gases. In this paper, the modulated operating temperature of microhotplate gas sensors combined with a feature extraction method based on Short-Time Fourier Transform (STFT is introduced. Because the gas concentration in the ambient air usually has high fluctuation, STFT is applied to extract transient features from time-frequency domain, and the relationship between the STFT spectrum and sensor response is further explored. Because of the low thermal time constant, the sufficient discriminatory information of different gases is preserved in the envelope of the response curve. Feature information tends to be contained in the lower frequencies, but not at higher frequencies. Therefore, features are extracted from the STFT amplitude values at the frequencies ranging from 0 Hz to the fundamental frequency to accomplish the identification task. These lower frequency features are extracted and further processed by decision tree-based pattern recognition. The proposed method shows high classification capability by the analysis of different concentration of carbon monoxide, methane, and ethanol.

  10. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  11. Automatic sleep staging using state machine-controlled decision trees.

    Science.gov (United States)

    Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2015-01-01

    Automatic sleep staging from a reduced number of channels is desirable to save time, reduce costs and make sleep monitoring more accessible by providing home-based polysomnography. This paper introduces a novel algorithm for automatic scoring of sleep stages using a combination of small decision trees driven by a state machine. The algorithm uses two channels of EEG for feature extraction and has a state machine that selects a suitable decision tree for classification based on the prevailing sleep stage. Its performance has been evaluated using the complete dataset of 61 recordings from PhysioNet Sleep EDF Expanded database achieving an overall accuracy of 82% and 79% on training and test sets respectively. The algorithm has been developed with a very small number of decision tree nodes that are active at any given time making it suitable for use in resource-constrained wearable systems.

  12. 基于相似度衡量的决策树自适应迁移%Self-adaptive Transfer for Decision Trees Based on Similarity Metric

    Institute of Scientific and Technical Information of China (English)

    王雪松; 潘杰; 程玉虎; 曹戈

    2013-01-01

    如何解决迁移学习中的负迁移问题并合理把握迁移的时机与方法,是影响迁移学习广泛应用的关键点.针对这个问题,提出一种基于相似度衡量机制的决策树自适应迁移方法(Self-adaptive transfer for decision trees based on a similarity metric,STDT).首先,根据源任务数据集是否允许访问,自适应地采用成分预测概率或路径预测概率对决策树间的相似性进行判定,其亲和系数作为量化衡量关联任务相似程度的依据.然后,根据多源判定条件确定是否采用多源集成迁移,并将相似度归一化后依次分配给待迁移源决策树作为迁移权值.最后,对源决策树进行集成迁移以辅助目标任务实现决策.基于UCI机器学习库的仿真结果说明,与多源迁移加权求和算法(Weighted sum rule,WSR)和MS-TrAdaBoost相比,STDT能够在保证决策精度的前提下实现更为快速的迁移.

  13. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  14. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  15. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  16. Redundant Data Mining Based on Residual Data Merging in Decision Tree%决策树下引入残差数据合并的冗余数据挖掘

    Institute of Scientific and Technical Information of China (English)

    王倩

    2014-01-01

    提出采用残差数据合并技术的冗余数据优化挖掘算法,利用训练集建立决策树模型,引入C4.5决策树模型进行冗余数据主特征建模,在主分量特征决策树下,引入残差数据合并技术,设定数据残差特征伴随追踪模式,把传统方法中用于滤除的数据信息进行拼接伴随追踪定位,实现了冗余数据特征的优化挖掘。把方法应用到网络流量时间序列数据处理中实现网络异常监测,仿真实验表明,新的数据挖掘算法能有效提取到冗余数据特征作为有用检测特征,数据挖掘效率大幅提高,有效促进了海量数据隐藏特征的挖掘和应用,设计的网络流量监测软件能提高网络管理和监测实效性。%An improved optimization data mining algorithm based on redundant data merging technology was proposed. The training set was used to build the decision tree model, the C4.5 decision tree model was used for redundant data main fea-ture modeling. The accompanied tracking model of residual feature was set, and the information was used for tracking and positioning with data splicing. The optimization of redundant data mining was realized finally. It was applied into the net-work traffic anomaly detection, simulation result shows that improved method can extract the effective redundant data fea-ture as useful feature, and data mining efficiency is improved greatly. It can promote the massive data mining development with using the hidden features. And the designed network traffic monitoring software can improve the effectiveness of net-work management and monitoring.

  17. The Information Extraction of Freshwater Marsh Wetland Based on the Decision Tree Method:Taking Zhalong Wetland as An Example%基于决策树方法的淡水沼泽湿地信息提取——以扎龙湿地为例

    Institute of Scientific and Technical Information of China (English)

    乔艳雯; 臧淑英; 那晓东

    2013-01-01

    In order to achieve timely and accurately basic information about wetland, which can be applied to the dynamic monitoring and protection of the wetland. The author chose zhalong wetland as the research area, during the process of extracting regional remote sensing information by using the TM image data, DEM data, normalized vegetation index, texture information compound identification index, finally the author classified the types of zhanglong wetland through constructing a decision tree model. For checking the feasible degree of method of classification based on decision tree model, the author made a comparison between the traditional maximum classification of supervision and the decision tree model. The results showed that: the decision tree method based on the index was used to classify, classification accuracy increased by 14.6%, overall Kappa coefficient increased by 0.1751, supervised classification accuracy was improved noticeably. Building decision tree classification which adopted multi-source data for extracting information of inland freshwater mire wetland was a very effective approach.%为了及时准确地获取湿地基础信息,对湿地进行动态监测和保护.以扎龙湿地为研究区,以区域湿地遥感信息提取为目标,采用TM影像数据、DEM数据、归一化植被指数、纹理信息等复合识别指标构建决策树模型,对研究区不同地类进行分类.然后与传统的最大监督分类法所得到的结果进行对比.结果表明,采用基于指数的决策树分类方法对扎龙湿地类型进行分类,较传统的最大似然监督分类精度提高了14.6%;总体Kappa系数提高了0.1751,分类精度较监督分类有明显的提高,证明基于多源数据决策树分类方法是内陆淡水沼泽湿地信息提取的有效手段.

  18. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    Directory of Open Access Journals (Sweden)

    Bruno Carneiro da Rocha

    2010-10-01

    Full Text Available This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DMmanagement model of data mining in large operational databases logged from internet banktransactions.

  19. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Science.gov (United States)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  20. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Energy Technology Data Exchange (ETDEWEB)

    Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru [Saint Petersburg Electrotechnical University “LETI” (Russian Federation)

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  1. Research on Urban Water Body Extraction Using Knowledge-based Decision Tree%基于知识决策树的城市水体提取方法研究

    Institute of Scientific and Technical Information of China (English)

    陈静波; 刘顺喜; 汪承义; 尤淑撑; 王忠武

    2013-01-01

    针对城市水体与建筑物阴影、沥青路面和浓密植被等暗地物的光谱混淆性,构建了结合光谱特征和空间特征的城市水体提取知识决策树.其基本思路为:首先利用短波红外波段提取暗地物,其次分别利用浓密植被在近红外波段和沥青路面在红波段中的反射率剔除这两类暗地物,再次利用空间密度特征剔除建筑物阴影,最后根据面积对水体进行补充识别.与现有方法相比,本方法提出了城市水体提取中需关注的暗地物类型并开展针对性特征分析,并利用由噪声环境下密度聚类方法(DBSCAN)描述的空间密度特征区分城市水体和建筑物阴影.对北京城区SPOT 5多光谱影像开展的实验得到的检测率为86.18%,虚警率为13.82%,表明本方法是基于中分辨率多光谱影像提取城市水体的有效方法.%In view of the spectral mixing between water body,building shadow,asphalt road and dense vegetation in urban environments knowledge-based decision tree combining spectral and spatial features is constructed to extract water body thematic information in this paper. Firstly,dark objects in urban environment are extracted using threshold of reflectance in SWIR. Secondly,dense vegetation and asphalt road are eliminated according to their reflectance in NIR and R respectively. Thirdly, differences in spatial density are used to eliminate building shadow. Finally,area threshold is used for supplementary recognition of water body. The consideration of dark objects in urban water body extraction,and the using of spatial density described by DBSCAN in discriminating water body from building shadow are two main differences between the proposed decision tree and state-of-art methods. SPOT-5 multispectral imagery of Beijing is used to validate the proposed knowledge-based decision tree. The detection rate is 86.18% and false alarm rate is 13. 82%. It can be concluded that the proposed model is an effective method in

  2. R-C4.5决策树模型在高职就业分析中的应用%The Application of R-C4.5 Decision Tree Model in Higher Vocational Employment

    Institute of Scientific and Technical Information of China (English)

    张继美; 桂红兵

    2011-01-01

    Expounds the decision tree classification technology and R-C4.5 decision tree model.In a recent graduates of higher vocational colleges of education personal information,information and employment information data for the research object,experimental data in the data pretreatment,using R-C4.5 decision tree classification technology data mining,dig out the influence the quality of higher vocational graduate employment related factors,for government and schools improve employment of the quality of all kinds of measures and reform provides decision-making basis.%阐述了决策树分类技术和R-C4.5决策树模型。以某高职院校近几届毕业生的个人信息、教育信息和就业信息数据为研究对象,对实验数据进行数据预处理,运用R-C4.5决策树分类技术进行数据挖掘,挖掘出影响高职毕业生就业质量的相关因素,为政府和学校提高就业质量的各类措施和改革提供了决策依据。

  3. The potential impact of improving appropriate treatment for fever on malaria and non-malarial febrile illness management in under-5s: a decision-tree modelling approach.

    Directory of Open Access Journals (Sweden)

    V Bhargavi Rao

    Full Text Available BACKGROUND: As international funding for malaria programmes plateaus, limited resources must be rationally managed for malaria and non-malarial febrile illnesses (NMFI. Given widespread unnecessary treatment of NMFI with first-line antimalarial Artemisinin Combination Therapies (ACTs, our aim was to estimate the effect of health-systems factors on rates of appropriate treatment for fever and on use of ACTs. METHODS: A decision-tree tool was developed to investigate the impact of improving aspects of the fever care-pathway and also evaluate the impact in Tanzania of the revised WHO malaria guidelines advocating diagnostic-led management. RESULTS: Model outputs using baseline parameters suggest 49% malaria cases attending a clinic would receive ACTs (95% Uncertainty Interval:40.6-59.2% but that 44% (95% UI:35-54.8% NMFI cases would also receive ACTs. Provision of 100% ACT stock predicted a 28.9% increase in malaria cases treated with ACT, but also an increase in overtreatment of NMFI, with 70% NMFI cases (95% UI:56.4-79.2% projected to receive ACTs, and thus an overall 13% reduction (95% UI:5-21.6% in correct management of febrile cases. Modelling increased availability or use of diagnostics had little effect on malaria management outputs, but may significantly reduce NMFI overtreatment. The model predicts the early rollout of revised WHO guidelines in Tanzania may have led to a 35% decrease (95% UI:31.2-39.8% in NMFI overtreatment, but also a 19.5% reduction (95% UI:11-27.2%, in malaria cases receiving ACTs, due to a potential fourfold decrease in cases that were untested or tested false-negative (42.5% vs.8.9% and so untreated. DISCUSSION: Modelling multi-pronged intervention strategies proved most effective to improve malaria treatment without increasing NMFI overtreatment. As malaria transmission declines, health system interventions must be guided by whether the management priority is an increase in malaria cases receiving ACTs (reducing the

  4. 基于决策树分类的云南省迪庆地区景观类型研究%Exploring Landscapes Based on Decision Tree Classification in the Diqin Region, Yunnan Province

    Institute of Scientific and Technical Information of China (English)

    李亚飞; 刘高焕; 黄翀

    2011-01-01

    Decision tree classification is a type of supervised classification method based on spatial data mining and knowledge discovery. In this paper, the authors examined the landscape pattern of the Diqin region by building the classification decision tree in Yunnan province and using Landsat TM imagery and digital elevation models (DEMs). Subsequently, a landscape distribution map was made. In order to look at the reliability and robustness of the decision tree classification method,the traditional supervised classification was used to derive a landscape distribution map over the region. A multitude of field sampling points were used to evaluate the accuracy of the two classification methods, covering the whole Diqing region and consisting of information regarding geographic coordinates, elevations, and the description of the major landscape types. Results indicate that the overall classification accuracies of the decision tree classification and the traditional supervised classification were 85.5% and 67.4% , respectively. The landscape distribution map derived by the decision tree classification method seems to be reliable in terms of the achievable accuracy. Several conclusions could be drawn by analyzing the derived landscape distribution map as follows. Landscape types in the Diqin region primarily included valley shrub,coniferous forest, sub alpine shrub meadow, alpine snow and ice, bare land, and water body,accounting for 5.5%, 36.16%, 3.4%, 3.7%, 25.4%, and 4.4% of the Diqin region area, respectively.Except bare land and water body, other landscape types varied essentially with elevation and aspect of maintains. The landscape of the largest area was found to be coniferous forest, which was consistent with the landform of alpine and canyon. Coniferous forest was the major landscape in the region, which was distributed over 3000 m above the sea level. In terms of different elevations,the coniferous forest could be conceptually divided into three

  5. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    CERN Document Server

    Farid, Dewan Md; Rahman, Mohammad Zahidur; 10.5121/ijnsa.2010.2202

    2010-01-01

    In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learn...

  6. PREDIKSI CALON MAHASISWA BARU MENGUNAKAN METODE KLASIFIKASI DECISION TREE

    Directory of Open Access Journals (Sweden)

    Mambang

    2015-02-01

    Full Text Available Prior to the organization of health education begin the new school year, then the first step will be carried out selection of new admissions from general secondary education graduates and vocational. In this study, predicting new students to take multiple data attributes. The model is a decision tree classification prediction method to create a tree consisting of a root node, internal nodes and terminal nodes. While the root node and internal nodes are variables / features, the terminal node. Based on the experimental results and evaluations are done, it can be concluded that algorithm C4.5 with 80.39% accuracy obtained Uncertainty, Precision 94.44%, Recall of 75.00 % while the C4.5 algorithm with Information Gain Accuracy Ratio 88.24%, 98.28% Precision, 83.82% Recall.

  7. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    OpenAIRE

    Tran Hoai Linh; Pham Van Nam; Vuong Hoang Nam

    2014-01-01

    The paper presents a new system for ECG (ElectroCardioGraphy) signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron), modified TSK (Takagi-Sugeno-Kang) and the SVM (Support Vector Machine), will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to...

  8. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  9. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    like buildings, roads, grassland, trees, hedges, and walls from such an ‘intelligent’ point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using......A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects...... of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes ‘building’ (99%, 95% CI: 95%-100%) and ‘road and parking lot’ (90%, 95% CI: 83%-95%). Some...

  10. Application of vector projection method based on decision-tree-based support vector machines in fault diagnosis for transformer%DTBSVM的向量投影法在变压器故障诊断中的应用

    Institute of Scientific and Technical Information of China (English)

    张翠玲; 王大志; 江雪晨; 宁一

    2013-01-01

    By applying vector projection method in fault diagnosis for transformer ,the problem that how to structure effective SVM hierarchy based on decision-tree-based support vector machines (DTBSVM ) is solved . According to the cross situation between classification and classification sample sets ,Euclidean distance and radial basis function are utilized to calculate spatial distance and divisibility measure between different classifi-cations ,and the sequence on the basis of divisibility measure is made to design more reasonable hierarchy structure for classification .The fault diagnosis model combining one-to-rest with rest-to-rest classification is established by using the method of vector projection on decision-tree-based support vector machines ,and it can solve the multi-classification problem better .The method of vector projection aiming at N classification problem just constructs (N-1) SVM classifiers and has no unrecognized sector ,so the classification process is faster and the generalization ability is better .The test results show that correct-sentence rate increases compa-ring with traditional three-ratio method and neural network method in fault diagnosis ,so the method has bet-ter utility value .%文章将向量投影法应用在变压器故障诊断中,解决了如何构建有效SVM 层次的问题。按照类与类样本集之间的相交情况,利用欧氏距离和径向基函数计算类与类的空间距离和类间可分性测度,根据可分性测度进行排序,设计比较合理的层次结构进行分类。这种方法建立的故障诊断模型,是一种一对多、多对多分类相结合的故障诊断模型,用于解决多分类问题效果较好;这种方法对于 N类分类问题,只需构造(N-1)个SVM分类器,并且不存在不可识别的区域,分类过程比较快速,具有较好的泛化能力。实验证明与传统的三比值法和神经网络方法相比,所提出的方法在故障诊断的正判率

  11. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad

    2016-06-09

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  12. 基于神经网络与决策树的土壤粗糙度测量%Soil surface roughness measuring method based on neural network and decision tree

    Institute of Scientific and Technical Information of China (English)

    李俐; 王荻; 潘彩霞; 王鹏新

    2015-01-01

    Soil surface roughness is one of the important indices commonly used to describe soil hydrological characteristics and Lambert characteristic. In microwave quantitative remote sensing application, it affects the microwave scattering values and therefore impacts the accuracy of soil moisture retrieved using microwave sensing data. Therefore, measuring soil surface roughness has become one of the research hotspots in the field of microwave remote sensing. Two kinds of techniques are used to calculate soil surface roughness, including contact method, such as the pin meter and profile meter, and non-contact method, such as ultrasonic measurement, laser scanning, three-dimensional photography, infrared measurement and radar measurement method. All these methods need some special device. The development of image processing technology and the popularization of digital camera provide a simple measuring method which only needs a reference whiteboard and a camera. However, the detailed scale information commonly used on the reference whiteboard increases the requirements for data acquisition and data processing. The purpose of this study is to provide a method to obtain the soil surface image with a simplified reference whiteboard and then to measure soil surface roughness in the presence of field environmental noise. Therefore, a simple image acquisition method is introduced and then an image processing method combining the neural network and the decision tree is proposed. The neural network is built to detect image edge points. To reduce the environmental noise effect, the input characteristic parameters of the neural network are selected carefully, which include not only gradient information, but also image direction and neighborhood consistency information. The cutting of the background section on the original image based on image edge detection result improves the computing speed effectively. A decision tree model is introduced to divide image segments into 4 classes

  13. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  14. Efficent-cutting packet classification algorithm based on the statistical decision tree%基于统计的高效决策树分组分类算法

    Institute of Scientific and Technical Information of China (English)

    陈立南; 刘阳; 马严; 黄小红; 赵庆聪; 魏伟

    2014-01-01

    Packet classification algorithms based on decision tree are easy to implement and widely employed in high-speed packet classification. The primary objective of constructing a decision tree is minimal storage and searching time complexity. An improved decision-tree algorithm is proposed based on statistics and evaluation on filter sets. HyperEC algorithm is a multiple dimensional packet classification algorithm. The proposed algorithm allows the tradeoff between storage and throughput during constructing decision tree. For it is not sensitive to IP address length, it is suitable for IPv6 packet classifi-cation as well as IPv4. The algorithm applies a natural and performance-guided decision-making process. The storage budget is preseted and then the best throughput is achieved. The results show that the HyperEC algorithm outperforms the HiCuts and HyperCuts algorithm, improving the storage and throughput performance and scalable to large filter sets.%基于决策树的分组分类算法因易于实现和高效性,在快速分组分类中广泛使用。决策树算法的基本目标是构造一棵存储高效且查找时间复杂度低的决策树。设计了一种基于规则集统计特性和评价指标的决策树算法——HyperEC 算法。HyperEC算法避免了在构建决策树过程中决策树高度过高和存储空间膨胀的问题。HyperEC算法对IP地址长度不敏感,同样适用于IPv6的多维分组分类。实验证明,HyperEC算法当规则数量较少时,与HyperCuts基本相同,但随着规则数量的增加,该算法在决策树高度、存储空间占用和查找性能方面都明显优于经典的决策树算法。

  15. Virus Detection Algorithm Based on Decision Tree%基于决策树的病毒检测算法磁

    Institute of Scientific and Technical Information of China (English)

    朱俚治

    2015-01-01

    如今病毒的智能性,日益突出。具有当代智能性技术的病毒能够躲避部分杀毒软件的检测。因此有些病毒,在传统检测算法面前是难以被发现。为有效检测出采用了新技术的病毒,使得病毒检测算法具有新的智能性是十分必要的。MMTD算法和决策树算法是两种智能性的算法,该智能性算法在检测病毒上进行应用将有助提高病毒检测算法的智能性。因此根据当病毒检测时的过程中病毒表现出的特性,论文将M M TD算法和决策树算法结合在一起而提出了一种新的病毒检测算法。%Today intelligence viruses have become increasingly prominent .Virus with a contemporary intelligent tech‐nologies can evade detection portion antivirus software .Therefore ,some viruses ,in front of the traditional detection algo‐rithm are difficult to be found .To effectively detect the virus ,using a new technology ,virus detection algorithm with a new intelligence is essential .MMTD algorithms and decision tree algorithms are two intelligent algorithms .The intelligent algo‐rithms for application in the detection of the virus will help to improve virus detection algorithm intelligence .Therefore ,ac‐cording to the time when the process of virus detection virus exhibit characteristics ,this article combines MMTD algorithms and decision tree algorithms together and propose a new virus detection algorithm .

  16. Using Decision Trees for Coreference Resolution

    CERN Document Server

    McCarthy, J F; Carthy, Joseph F. Mc; Lehnert, Wendy G.

    1995-01-01

    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.

  17. Diagnosis of Hepatitis using Decision tree algorithm

    Directory of Open Access Journals (Sweden)

    V.Shankar sowmien

    2016-06-01

    Full Text Available This research paper proposes a prediction system for liver disease using machine learning. Researchers provided various data to identify the causes for Hepatitis. Here, Decision tree method is used to determine the structural information of tissues. The algorithm used to construct the decision tree is C4.5 that concentrates on 19 attributes such as age, sex, steroids, antivirals, spleen, fatigue, malaise, anorexia, liver big, liver firm, spiders, vilirubin, varices, ascites, ALK phosphate, SGOT, albumin, protime, and histology for the diagnosis of the disease. These features helped in determining the abnormalities of the patient which resulted in 85.81% accuracy.

  18. Rule Extraction in Transient Stability Study Using Linear Decision Trees

    Institute of Scientific and Technical Information of China (English)

    SUN Hongbin; WANG Kang; ZHANG Boming; ZHAO Feng

    2011-01-01

    Traditional operation rules depend on human experience, which are relatively fixed and difficult to fulfill the new demand of the modern power grid. In order to formulate suitable and quickly refreshed operation rules, a method of linear decision tree based on support samples is proposed for rule extraction in this paper. The operation rules extracted by this method have advantages of refinement and intelligence, which helps the dispatching center meet the requirement of smart grid construction.

  19. Nerual Networks with Decision Trees for Diagnosis Issues

    Directory of Open Access Journals (Sweden)

    Yahia Kourd

    2013-05-01

    Full Text Available This paper presents a new idea for fault detection and isolation (FDI technique which is applied to industrial system. This technique is bas ed on Neural Networks fault-free and Faulty behaviours Models (NNFMs. NNFMs are used for resid ual generation, while decision tree architecture is used for residual evaluation. The d ecision tree is realized with data collected from the NNFM’s outputs and is used to isolate dete ctable faults depending on computed threshold. Each part of the tree corresponds to spe cific residual. With the decision tree, it becomes possible to take the appropriate decision r egarding the actual process behaviour by evaluating few numbers of residuals. In comparison to usual systematic evaluation of all residuals, the proposed technique requires less com putational effort and can be used for on line diagnosis. An application example is presented to i llustrate and confirm the effectiveness and the accuracy of the proposed approach.

  20. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  1. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    Science.gov (United States)

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-12-28

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  2. ASSESSING GAMEPLAY EMOTIONS FROM PHYSIOLOGICAL SIGNALS: A FUZZY DECISION TREES BASED MODEL

    OpenAIRE

    Orero, Joseph Onderi; Levillain, Florent; Damez-Fontaine, Marc; Rifqi, Maria; Bouchon-Meunier, Bernadette

    2010-01-01

    International audience As video games become a widespread form of entertainment, there is need to develop new evaluative methodologies for acknowledging the various aspects of the player's subjective experience, and especially the emotional aspect. Video game developers could benefit from being aware of how the player reacts emotionally to specific game parameters. In this study, we addressed the possibility to record physiological measures on players involved in an action game, with the m...

  3. Fingerprint Gender Classification using Univariate Decision Tree (J48

    Directory of Open Access Journals (Sweden)

    S. F. Abdullah

    2016-09-01

    Full Text Available Data mining is the process of analyzing data from a different category. This data provide information and data mining will extracts a new knowledge from it and a new useful information is created. Decision tree learning is a method commonly used in data mining. The decision tree is a model of decision that looklike as a tree-like graph with nodes, branches and leaves. Each internal node denotes a test on an attribute and each branch represents the outcome of the test. The leaf node which is the last node will holds a class label. Decision tree classifies the instance and helps in making a prediction of the data used. This study focused on a J48 algorithm for classifying a gender by using fingerprint features. There are four types of features in the fingerprint that is used in this study, which is Ridge Count (RC, Ridge Density (RD, Ridge Thickness to Valley Thickness Ratio (RTVTR and White Lines Count (WLC. Different cases have been determined to be executed with the J48 algorithm and a comparison of the knowledge gain from each test is shown. All the result of this experiment is running using Weka and the result achieve 96.28% for the classification rate.

  4. The Research of Fault Diagnosis Knowledge Representation of Track Circuit Based on Decision Tree Method%基于决策树的轨道电路故障诊断知识表示方法研究

    Institute of Scientific and Technical Information of China (English)

    刘扬

    2014-01-01

    针对ZPW-2000 A无绝缘轨道电路故障逻辑机理模糊的问题,本文采用了基于决策树的轨道电路专家系统知识表示方法。该方法首先将轨道电路故障影响较大的特征向量样本建立故障决策表,然后运用最小信息熵算法对属性值离散化,根据决策树算法快速学习及分类的特点对离散后的数据样本进行训练学习,生成故障决策树后进行知识规则的获取,在专家系统的知识库中以产生规则存储。通过对ZPW-2000 A无绝缘轨道电路的实例分析验证了该方法在轨道电路专家系统知识表示与获取中的有效性和实用性。%For the problem of ZPW-2000A jointless track circuit fault fuzzy logic mechanism,this paper adopts the track circuit expert system knowledge representation method based on decision tree. This method first samples of a greater influence on the track circuit fault feature vector to build up the fault decision table,then use the minimum information entropy algorithm to discretize attribute value,according to the characteristics of fast learning and classification of the decision tree algorithm,it trains and learns discrete data samples,obtains knowledge rules after generating fault decision tree,then the rules are stored in the knowledge base of expert system . Through the instance analysis of ZPW-2000A jointless track circuit,it verifies the method is validity and practicability in the expert system knowledge representation and acquisition of track circuit.

  5. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Dewan Md. Farid

    2010-04-01

    Full Text Available In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposedalgorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS. We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR andsignificant reduce false positives (FP for different types of network intrusions using limited computational resources

  6. A New Fuzzy-Rough Decision Tree Algorithm Based on Conceptual Hierarchy%一种新的基于粗糙集的概念模糊化决策树算法

    Institute of Scientific and Technical Information of China (English)

    吴晓明

    2014-01-01

    A method which based on the combination of fuzzy-rough decision tree and conceptual hierarchy is proposed. The algorithm can be used to solve fuzzy-semantic problem.%提出了一种新的基于粗糙集的概念模糊化决策树算法。本算法将利用属性归纳和概念模糊化的方法删除不能反映概化信息的属性,结合模糊粗糙决策树算法,提取对决策有潜在价值的知识和规则。

  7. STUDY ON DECISION TREE COMPETENT DATA CLASSIFICATION

    OpenAIRE

    Vanitha, A.; S.Niraimathi

    2013-01-01

    Data mining is a process where intelligent methods are applied in order to extract data patterns.This is used in cases of discovering patterns and trends among large datasets. Data classification involvescategorization of data into different category according to protocols. They are many classification algorithmsavailable and among the decision tree is the most commonly used method. Classification of data objectsbased on a predefined knowledge of objects is a data mining. This paper discussed...

  8. 基于决策树和链接相似的DeepWeb查询接口判定%Deep Web query interface identification based on decision tree and link-similar

    Institute of Scientific and Technical Information of China (English)

    李雪玲; 施化吉; 兰均; 李星毅

    2011-01-01

    针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法.该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口.结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率.%In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished, this paper proposed a Deep Web query interface identification method based on decision tree and link-similar. This method used attribute information gain ratio as selection level, built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features, and then used a new method based on link-similar to identify these unidentified again, distinguishing between Deep Web query interface and the interface of search engines. The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.

  9. 基于C4.5决策树的嵌入型恶意代码检测方法%Detection of Embedded Malware Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    张福勇; 齐德昱; 胡镜林

    2011-01-01

    Embedded malware has become a novel computer security threat due to its high concealment and poor detectability. However, the existing statistical analysis methods are ineffective because they do not fully consider the small number of malicious bytes and the high information gain of embedded malware. In order to solve this problem, a new detection method of embedded malware is proposed based on C4. 5 decision tree, which implements the detection by establishing a decision tree with 500 high-information-gain 3-grams extracted from training samples as the attribute. Experimental results show that the proposed method is superior to the existing methods in terms of detection rate and classification accuracy, and that it may achieve a detection rate of 99. 80% for infected Word .%嵌入型恶意代码以其高隐蔽性和难检测性,成为计算机安全的新威胁.文中针对以往的统计分析法没有充分考虑嵌入型恶意代码所占字节数小、信息增益大的特点提出一种采用C4.5决策树的嵌入型恶意代码检测方法,即通过提取训练样本中信息增益最大的500个3-gram作为属性特征,建立决策树,实现对未知嵌入型恶意代码的检测.实验结果表明,文中方法在检测率和分类准确率上均具有明显优势,对感染了嵌入型恶意代码的Word文档的检测率达99.80%.

  10. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    Directory of Open Access Journals (Sweden)

    O.BENCHAREF

    2011-09-01

    Full Text Available The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

  11. FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ferdaws Ezzi

    2016-01-01

    Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.

  12. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  13. Optimizing Decision Tree Attack on CAS Scheme

    Directory of Open Access Journals (Sweden)

    PERKOVIC, T.

    2016-05-01

    Full Text Available In this paper we show a successful side-channel timing attack on a well-known high-complexity cognitive authentication (CAS scheme. We exploit the weakness of CAS scheme that comes from the asymmetry of the virtual interface and graphical layout which results in nonuniform human behavior during the login procedure, leading to detectable variations in user's response times. We optimized a well-known probabilistic decision tree attack on CAS scheme by introducing this timing information into the attack. We show that the developed classifier could be used to significantly reduce the number of login sessions required to break the CAS scheme.

  14. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  15. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    Institute of Scientific and Technical Information of China (English)

    Fang Ye; Zhi-Hua Chen; Jie Chen; Fang Liu; Yong Zhang; Qin-Ying Fan; Lin Wang

    2016-01-01

    Background:In the past decades,studies on infant anemia have mainly focused on rural areas of China.With the increasing heterogeneity of population in recent years,available information on infant anemia is inconclusive in large cities of China,especially with comparison between native residents and floating population.This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing.Methods:As useful methods to build a predictive model,Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia.A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31,2014.Results:The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics.The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia.Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy,exclusive breastfeeding in the first 6 months,and floating population,CHAID decision tree analysis also identified the fourth risk factor,the matemal educational level,with higher overall classification accuracy and larger area below the receiver operating characteristic curve.Conclusions:The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners.CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity.Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  16. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    Directory of Open Access Journals (Sweden)

    Jörg Huwyler

    2012-08-01

    Full Text Available Predicting blood-brain barrier (BBB permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction on both descriptor sets. Models with a corrected classification rate (CCR of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP and charge (polar surface area, which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  17. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  18. Fuzzy Decision Tree Model for Driver Behavior Confronting Yellow Signal at Signalized Intersection%交叉口黄灯期间驾驶员行为的模糊决策树模型

    Institute of Scientific and Technical Information of China (English)

    龙科军; 赵文秀; 肖向良

    2011-01-01

    Drivers decision to go or stop during the yellow interval belongs to uncertain decision making. This paper collects drivers behavior data at four similar intersections. Fuzzy Decision Tree(FDT) is applied to model driver behavior at signalized intersection. Considering vehicle location,velocity and countdown timer as the influencing factors, the FDT model is constructed using FID3 algorithm, and decision roles are generated as well. Test sample is applied to test FDT model, and results indicate that FDT model can predict drivers' decision with overall accuracy of 84.8%.%采集黄灯期间驾驶员行为的相关数据,考虑车辆位置、车速、倒计时表3个影响因素,分别设定其隶属度函数,应用模糊决策树中的FID3算法,以模糊信息熵为启发信息,构建驾驶员选择的模糊决策树模型,生成决策规则.利用测试样本对模型进行检验,结果表明,基于模糊决策树的预测结果准确率总体达到84.8%.

  19. 基于改进决策树算法的Web数据库查询结果自动分类方法%A Categorization Approach Based on Adapted Decision Tree Algorithm for Web Databases Query Results

    Institute of Scientific and Technical Information of China (English)

    孟祥福; 马宗民; 张霄雁; 王星

    2012-01-01

    To deal with the problem that too many results are returned from a Web database in response to a user query, this paper proposes a novel approach based on adapted decision tree algorithm for automatically categorizing Web database query results. The query history of all users in the system is analyzed offline and then similar queries in semantics are merged into the same cluster. Next, a set of tuple clusters over the original data is generated in accordance to the query clusters, each tuple cluster corresponding to one type of user preferences. When a query is coming, based on the tuple clusters generated in the offline time, a labeled and leveled categorization tree, which can enable the user to easily select and locate the information he/she needs, is constructed by using the adapted decision tree algorithm. Experimental results demonstrate that the categorization approach has lower navigational cost and better categorization effectiveness, and can meet different type user's personalized query needs effectively as well.%为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.

  20. 一种用于网络取证分析的模糊决策树推理方法%Fuzzy Decision Tree Based Inference Techniques for Network Forensic Analysis

    Institute of Scientific and Technical Information of China (English)

    刘在强; 林东岱; 冯登国

    2007-01-01

    网络取证是对现有网络安全体系的必要扩展,已日益成为研究的重点.但目前在进行网络取证时仍存在很多挑战:如网络产生的海量数据;从已收集数据中提取的证据的可理解性;证据分析方法的有效性等.针对上述问题,利用模糊决策树技术强大的学习能力及其分析结果的易理解性,开发了一种基于模糊决策树的网络取证分析系统,以协助网络取证人员在网络环境下对计算机犯罪事件进行取证分析.给出了该方法的实验结果以及与现有方法的对照分析结果.实验结果表明,该系统可以对大多数网络事件进行识别(平均正确分类率为91.16%),能为网络取证人员提供可理解的信息,协助取证人员进行快速高效的证据分析.%Network forensics is an important extension to present security infrastructure,and is becoming the research focus of forensic investigators and network security researchers.However many challenges still exist in conducting network forensics:The sheer amount of data generated by the network;the comprehensibility of evidences extracted from collected data;the efficiency of evidence analysis methods,etc.Against above challenges,by taking the advantage of both the great learning capability and the comprehensibility of the analyzed results of decision tree technology and fuzzy logic,the researcher develops a fuzzy decision tree based network forensics system to aid an investigator in analyzing computer crime in network environments and automatically extract digital evidence.At the end of the paper,the experimental comparison results between our proposed method and other popular methods are presented.Experimental results show that the system can classify most kinds of events (91.16% correct classification rate on average),provide analyzed and comprehensible information for a forensic expert and automate or semi-automate the process of forensic analysis.

  1. 基于决策树体系的预想故障集下风电场扰动风险测度评估%Disturbance Risk Measure of Wind Farm Based on Decision Trees under Contingency

    Institute of Scientific and Technical Information of China (English)

    卓毅鑫; 徐铝洋; 张伟; 林湘宁; 李正天

    2015-01-01

    With the development of wind power and scale of wind farm, the spatial distribution difference between wind turbines also increase. Besides, wind turbine trip-off and damage accidents has occurred frequently because of the severe wind conditions, having adverse impacts on the stability and safety operation of power grid. Therefore, it is necessary to study the online risk assessment method for power system with wind energy. Considering the wind turbine spatial distribution difference, this paper proposed an online disturbance risk measure of wind farm based on decision trees, which can perform data mining on online information, and make fast judgement on voltage violation and wind turbine trip-off. Furthermore, according to the judgement of decision trees, disturbance risk measure indices are proposed, which are visualized and provide supportive information for wind farm and power system operators.%随着风力发电的大力发展及风电场规模的持续增加,风机的空间分布差异性愈发显著。此外,风机运行状态易受风电场元件故障、电网扰动等诸多因素的影响,因此,建立实时在线评估方法和预警机制已成为当务之急。该文考虑了风电场风机分布的离散特性,建立了风电场动态安全决策树体系,并提出风电场扰动风险测度指标。该决策树体系可利用在线信息进行数据挖掘,针对预想故障集下的风机电压越限与脱网状况进行快速分析判断,并根据判断结果输出扰动测度指标,为电网及风电场运行人员提供直观地风险水平及决策参考。通过风电场算例分析,验证了所提方法的有效性。

  2. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  3. 基于C5.0决策树的税务稽查研究%Tax Inspection Research Based on C5.0 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    陈仕鸿; 刘晓庆

    2011-01-01

    The principle of C5.0 decision tree is analyzed and used in tax inspection. Through its model financial statements and tax declarations of 80 businesses and enterprises are analyzed and compared with binary Logistic regression. The result shows the model can assist the inspection and improve efficiency and effectiveness of checking case selection.%简要分析了C5.0决策树原理,并将它应用于税务稽查中,通过C5.0决策树模型,对80个商业企业的财务报表和纳税申报袁的分析,再与二分类Logistic回归法进行比较,结论表明该模型方法能够辅助稽查选案,提高稽查选案工作的效率和效果。

  4. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  5. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  6. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  7. 基于决策树的虚拟咨询团队成员选择路径%The Decision Tree-based Path for Selecting Virtual Consulting Team Members

    Institute of Scientific and Technical Information of China (English)

    尚珊; 胡贵玲; 崔洁

    2012-01-01

    This paper expatiates on the importance of the virtual consulting team in the development of the virtual consulting enterprise.Based on the comparative analysis of the virtual consulting enterprises themselves with the entity consulting enterprises and virtual enterprises,this paper discusses the existing problems in virtual consulting enterprises nowadays,and points out that the virtual team cooperation in virtual consulting enterprises is an important approach to solve these problems.The paper gives the selection process of virtual consulting team cooperation,and for the first time puts forward the specific practice of using decision tree to select team members.%阐述虚拟咨询团队在虚拟咨询企业发展中的重要作用,通过对虚拟咨询企业自身及与实体咨询企业、虚拟企业的对比分析,探讨虚拟咨询企业现今存在的问题,并提出虚拟咨询企业实现虚拟团队合作是解决这些问题的一条重要途径,给出虚拟咨询团队合作的选择流程,并且首次提出利用决策树来选择团队成员的具体做法。

  8. A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams

    Institute of Scientific and Technical Information of China (English)

    Xue-Gang Hu; Pei-Pei Li; Xin-Dong Wu; Gong-Qing Wu

    2007-01-01

    Mining with streaming data is a hot topic in data mining. When performing classification on data streams,traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

  9. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  10. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  11. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  12. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  13. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  14. An Efficient Method of Vibration Diagnostics For Rotating Machinery Using a Decision Tree

    Directory of Open Access Journals (Sweden)

    Bo Suk Yang

    2000-01-01

    Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.

  15. Invasion Rule Generation Based on Fuzzy Decision Tree%基于模糊决策树的入侵规则生成技术

    Institute of Scientific and Technical Information of China (English)

    郭洪荣

    2013-01-01

      计算机免疫系统模型GECISM中的类MC Agent,可有效的利用模糊决策树Fuzzy-Id3算法,将应用程序中系统调用视为数据集构造决策树,便会生成计算机免疫系统中入侵检测规则,并分析对比试验结束后的结果,利用Fuzzy-Id3算法所生成的规则对于未知数据的收集进行分类,具有低误报率、低漏报率。%Class MC Agent of computer immune system model GECISM can effectively use fuzzy decision-making tree Fuzzy-Id3 algorithm, consider the system call in application program as data set constructed decision-making tree, generate the invasion detection rules of computer immune system, and analyze comparison test results, use rules generated by Fuzzy-Id3 algorithm to classify for unknown data of collection, has low errors reported rate, and low omitted rate.

  16. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  17. MALDI-TOF MS Combined With Magnetic Beads for Detecting Serum Protein Biomarkers and Establishment of Boosting Decision Tree Model for Diagnosis of Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Chibo Liu, Chunqin Pan, Jianmin Shen, Haibao Wang, Liang Yong

    2011-01-01

    Full Text Available The aim of present study is to study the serum protein fingerprint of patients with colorectal cancer (CRC and to screen protein molecules that are closely related to colorectal cancer during the onset and progression of the disease with Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS. Serum samples from 144 patients with CRC and 120 healthy volunteers were adopted in present study. Weak cation exchange (WCX magnetic beads and PBSII-C protein chips reader (Ciphergen Biosystems Ins. were used. The protein fingerprint expression of all the Serum samples and the resulted profiles between cancer and normal groups were analyzed with Biomarker Wizard system. Several proteomic peaks were detected and four potential biomarkers with different expression profiles were identified with their relative molecular weights of 2870.7Da, 3084Da, 9180.5Da, and 13748.8Da, respectively. Among the four proteins, two proteins with m/z 2870.7 and 3084 were down-regulated, and the other two with m/z 9180.5 and 13748.8 were up-regulated in serum samples from CRC patients. The present diagnostic model could distinguish CRC from healthy controls with the sensitivity of 92.85% and the specificity of 91.25%. Blind test data indicated a sensitivity of 86.95% and a specificity of 85%. The result suggested that MALDI technology could be used to screen critical proteins with differential expression in the serum of CRC patients. These differentially regulated proteins were considered as potential biomarkers for the patients with CRC in the serum and of the potential value for further investigation.

  18. The Research of Reliability of Trash E-mail Identifier Based on Decision Tree of Continuous Attributes%连续属性决策树所建立的垃圾邮件识别器的稳定性研究

    Institute of Scientific and Technical Information of China (English)

    王星; 谢邦昌

    2005-01-01

    Avoiding spare mial is one of the most critical problem in Internet technology, finding the most important attribute or the attribute combination to identify which email is normal and which email is spam mail, is the bottleneck of discriminate of the spam. Recent years, decision tress is popular used for excellent with good expression and capable to output rules, and then becomes the core technique in predicting spam mail. However, many famous decision trees such as CA .5 and CART is not very robust,that make the output is not stable which distrubing the construction of the identifying classification. In this paper, we studied the robust of CART algorithm, point out the robust problem when using the decision tree classifier on identifying Spam from normal email with interval attribute, then we try to using BAGGING algorithm to gain more robust model, an at the same time increase the performance of the initial models.

  19. Prediction Of Study Track Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Deepali Joshi

    2014-05-01

    Full Text Available One of the most important issues to succeed in academic life is to assign students to the right track when they arrive at the end of basic education stage. The education system is graded from 1st to 10th standard, where after finishing the 10th grade the student’s are distributed into different academic tracks or fields such as Science, Commerce, Arts depending on the marks that they have scored. In order to succeed in academic life the student should select the correct academic field. Many students fail to select the appropriate field. At one instant of time they prefer a certain type of career and at the next instant they consider another option. To improve the quality of education data mining techniques can be utilized instead of the traditional process. The proposed system has many benefits as compared to traditional system as the accuracy of results is better. The problems can be solved via the proposed system. The proposed system will predict the streams through the decision tree method. With each and every input the proposed system evolves with better accuracy.

  20. 基于模糊关联规则和决策树的图像自动标注%Automatic image annotation based on fuzzy associationr ules and decision trees

    Institute of Scientific and Technical Information of China (English)

    李志欣; 李灵芝; 张灿龙

    2015-01-01

    传统的基于关联规则算法的图像自动标注存在“锐利边界”问题,使分类存在模糊性、不准确性。且随着多媒体技术的飞速发展,图像信息数据迅速增长,海量的图像数据会形成大量冗余的关联规则,这将导致分类效率大大降低。针对这2个问题,文中提出基于模糊关联规则和决策树的图像自动标注模型。该模型首先获得关联训练图像低层特征和高层语义的模糊关联规则,再利用决策树方法删减冗余的模糊关联规则,基于决策树删减后的模糊关联规则,大大减小了算法的计算复杂度。实验在Corel 5k和IAPR-TC12两个基准数据集上进行,并从精度、召回率、F-measure以及产生的规则数量几个度量措施上进行比较。与其他几种前沿的图像自动标注方法的结果对比表明,该方法在图像的标注精度和标注效率上有很大的提高。%The traditional automatic image annotation based on association rules exists the problem of sharp boundary, which makes classification more fuzzy and inaccurate.Moreover, with the rapid development of multimedia technology, the size of image data increases quickly.Massive image data will produce a lot of redundant association rules, which greatly decreases the efficiency of image classification.In order to solve these two problems, this paper proposes an auto-matic image annotation approach based on fuzzy association rules and decision trees.The approach firstly obtains fuzzy association rules which represent the fuzzy correlations between low-level visual features and high-level semantic concepts of training images .Then, decision tree is adopted to reduce the redundant fuzzy association rules.As a result, computa-tional complexity of the algorithm is decreased to a large degree.Experiments were done on Corel5k and IAPR-TC12 datasets.The evaluation measures are compared from the aspects of precision, recall, F-measure and the number

  1. 基于邻域粗糙集和决策树算法的核电厂故障诊断方法%Fault Diagnosis Method for Nuclear Power Plant Based on Decision Tree and Neighborhood Rough Sets

    Institute of Scientific and Technical Information of China (English)

    慕昱; 夏虹; 刘永阔

    2011-01-01

    核动力装置系统复杂,需要采集和监测的变量较多,这给装置故障诊断增加了困难.针对该问题提出基于邻域粗糙集的参数约简算法,该算法实现了实数空间的粒度计算,可直接处理数值型参数,无需离散化参数.在此基础上,采用决策树算法对核电厂的失水事故、给水管道破裂、蒸汽发生器U形管破裂和主蒸汽管道破裂等4种典型故障进行训练学习,并将诊断决策结果与支持向量机算法进行对比.仿真结果表明,该算法可快速、准确地诊断出核电厂上述故障.%Nuclear power plants (NPP) are very complex system, which need to collect and monitor vast parameters. It's hard to diagnose the faults. A parameter reduction method based on neighborhood rough sets was proposed according to the problem.Granular computing was realized in a real space, so numerical parameters could be directly processed. On this basis, the decision tree was applied to learn from training samples which were the typical faults of nuclear power plant, i. e. , loss of coolant accident, feed water pipe rupture, steam generator tube rupture, main steam pipe rupture, and diagnose by using the acquired knowledge. Then the diagnostic results were compared with the results of support vector machine. The simulation results show that this method can rapidly and accurately diagnose the above mentioned faults of the NPP.

  2. 基于 C4.5决策树的股票数据挖掘%Stock Data Mining Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    王领; 胡扬

    2015-01-01

    由于目前利用数据挖掘算法对股票分析和预测存在数据量及技术指标等方面的问题,本文基于对股市数据的分析,适当选取某些指标作为决策属性,利用C4.5决策树分类算法进行分类预测。主要对股票技术指标进行介绍和优化,对C4.5算法的效率进行改进。改进后的算法结合优化的技术指标不仅能够提高数据挖掘的执行效率,同时也能在股票预测方面得到更高的收益。%Using data mining algorithms to analze and forecast the stock still has problems in technical indicators and quantity of data.Based on the analysis of stock market data, this paper selected certain indicators as decision attribute, and used C4.5 deci-sion tree to classify and forecast the stock.This article mainly optimized the indicators of stock, and improved the efficiency of C4.5 algorithm.Optimized algorithm combining with improved indicators not only enhances the efficiency of data mining, also gets better returns in stock forecasting.

  3. 基于决策树数据挖掘算法的大学生消费数据分析%Analysis of College Students Consumption Data Based on Decision Tree Data Mining Algorithm

    Institute of Scientific and Technical Information of China (English)

    黄剑

    2015-01-01

    文章使用决策树数据挖掘算法为基本工具,以近年大学生在校校园卡消费数据为基础,探讨数据挖掘在分析和研究大学生在校消费行为变化、消费特点以及与消费价格之间的深入关系.通过对消费数据的数据挖掘,分析得到近年来大学生消费行为、习惯、消费量的信息,找出其中的内在关联和变化趋势.并使文章结果能够更好、更有效的指导学校餐饮价格波动、菜品的新增;在学生可承受的价格范围内更好的提供餐饮服务.%This paper uses decision tree data mining algorithm as the basic tool. Based on the consumption data of college students in college in recent years, the relationship between college students consumption behavior, consumption characteristics and consumption price is analyzed and studied by data mining. Through data mining of consumption data, the information of College Students' consumption behavior, habits and consumption is analyzed, and the inherent relation and changing trend are found out. And the results of this paper can better and more effectively guide the food price fluctuation and the new dishes, and provide catering service for the students who can afford the price range.

  4. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    Directory of Open Access Journals (Sweden)

    Tran Hoai Linh

    2014-09-01

    Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution

  5. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  6. CLASSIFICATION OF LISS IV IMAGERY USING DECISION TREE METHODS

    Directory of Open Access Journals (Sweden)

    A. K. Verma

    2016-06-01

    Full Text Available Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  7. Classification of Liss IV Imagery Using Decision Tree Methods

    Science.gov (United States)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  8. Application of Data Mining in the Student Information System Based on Decision Tree Algorithm%基于决策树算法的数据挖掘在学生信息系统中的应用

    Institute of Scientific and Technical Information of China (English)

    侯海霞

    2012-01-01

    There are many methods of data mining,one of which is the decision tree.The decision tree method can classify data intelligently without any data for hypothesis to find hidden and valuable information according to certain rules.This paper chooses the typical C4.5 algorithm of the method of decision tree and takes the university student information system as an example to collect potential rules and factors in favor of graduate employment,so as to guide the education and management.%数据挖掘的方法很多,决策树方法是数据挖掘方法之一。决策树方法不需要对数据进行任何假设,直接将大量数据智能地分类,按照一定的规则找出隐藏的、有价值的信息。文章选取决策树方法中具有代表性的C4.5算法,以高校学生信息管理系统中毕业就业海量信息为实例生成决策树,挖掘出有利于毕业生就业的潜在规则和因素,以便指导高校的教育和管理。

  9. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size.

  10. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  11. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  12. Automatic design of decision-tree algorithms with evolutionary algorithms.

    Science.gov (United States)

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  13. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  14. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    OpenAIRE

    Bruno Carneiro da Rocha; Rafael Timóteo de Sousa Júnior

    2010-01-01

    This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of t...

  15. Confidence sets for split points in decision trees

    OpenAIRE

    Banerjee, Moulinath; McKeague, Ian W.

    2007-01-01

    We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others cali...

  16. Decision tree approach to power systems security assessment

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1993-01-01

    An overview of the general decision tree approach to power system security assessment is presented. The general decision tree methodology is outlined, modifications proposed in the context of transient stability assessment are embedded, and further refinements are considered. The approach is then suitably tailored to handle other specifics of power systems security, relating to both preventive and emergency voltage control, in addition to transient stability. Trees are accordingly built in th...

  17. 基于决策树方法的银行客户关系管理的研究和应用%Research and Application of Bank Customer Relationship Management based on the Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    李明辉

    2012-01-01

      Decision tree algorithm in data mining is a very important value in the banking industry. Decision tree technology for the banking industry, through the analysis of specific customer background information, predict the customer's customer categories in order to take the appropriate business strategy, both to improve the service level of banking services, development of client resources, to avoid the loss of customers, to conserve resources, use of a minimum investment to get a larger income. Bank lending to judge whether the borrowers have the risk of the loan proposal is feasible, customers will be classified in accordance with the actual needs of the bank, these problems can be resolved through the decision tree algorithm%  数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。

  18. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  19. 决策树算法在团购商品销售预测中的应用%Application of Sales Volume Forecast of Group Purchase Based on Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    费斐; 叶枫

    2013-01-01

      网络团购,指的是互相不认识的消费者在特定的时间内在同一网站上共同购买同一种商品,以求得最优价格的一种网络购物方式。现如今,作为平台方的团购网站在面对大量报名参加团购的商品,审核过程中需要介入大量人力,对经验过于依赖。利用决策树算法,对影响团购商品销量水平的变量进行分析,生成可读的决策树,用以辅助决策,筛选出优质的商品。%Group purchase is a shopping mode that customers buying goods which been selling at a discount in a limited period of time and specific website. Nowadays, facing the large number of application of commodity. Group purchase website as a Platform, which has to intervene a lot of manpower for product review. Also may excessively dependent on the former experience. This paper intends to use the decision tree algorithm to analyse the sales volume of the group purchase goods. Generate readable decision tree to make a strategic decision and select the high quality goods.

  20. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    OpenAIRE

    Oral, L. O.; V. Tecim

    2013-01-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from h...

  1. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    International Nuclear Information System (INIS)

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, ≤Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  2. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  3. Research on Internet of Things Security Based on Support Vector Machines with Balanced Binary Decision Tree%基于平衡二叉决策树SVM算法的物联网安全研究

    Institute of Scientific and Technical Information of China (English)

    张晓惠; 林柏钢

    2015-01-01

    物联网是继计算机、互联网和移动通信之后的又一次信息产业革命。目前,物联网已经被正式列为国家重点发展的战略性新兴产业之一,其应用范围几乎覆盖了各行各业。物联网中存在的网络入侵等安全问题日趋突出,在大数据背景下,文章提出一种适用于物联网环境的入侵检测模型。该模型把物联网中的入侵检测分为数据预处理、特征提取和数据分类3部分。数据预处理主要解决数据的归一化和冗余数据等问题;特征提取的主要目标是降维,以减少数据分类的时间;数据分类中引入平衡二叉决策树支持向量机(SVM)多分类算法,选用BDT-SVM算法对网络入侵数据进行训练和检测。实验表明,选用BDT-SVM多分类算法可以提高入侵检测系统的精度;通过特征提取,在保证精度的前提下,减少了检测时间。%The Internet of Things (IoT) is another information industry revolution after the computer, the Internet and the mobile communications. At present, IoT has been ofifcially listed as one of the national strategic emerging industries, and its application range covers almost all areas. Secure problems such as network intrusion in the IoT art prominent increasingly. In the big data context, this paper proposes an intrusion detection model that is suitable for IoT which divides the intrusion detection procedure into three parts, which are data preprocessing, features extraction and data classiifcation. Data normalization and data redundancy reduction are solved in the data preprocessing. The main goal of features extraction is to reduce the dimension and thus to reduce the time of data classiifcation. Support vector machine with balanced binary decision tree algorithm that is named BDT-SVM is introduced in the data classiifcation for training and testing the network intrusion data. Experimental results show that it can improve the accuracy of intrusion

  4. 基于决策树法的北京城市居民通勤距离模式挖掘%Data mining on commuting distance mode of urban residents based on the analysis of decision tree

    Institute of Scientific and Technical Information of China (English)

    王茂军; 宋国庆; 许洁

    2009-01-01

    以问卷调查数据为基础,引进决策树分析方法,讨论了北京市城市居民通勤距离模式.研究发现:第一,在设定的修剪纯度下,北京城市居民通勤距离远近与出行工具、居住地变更、职业、居住地就业率、最小孩子求学状况、住房而积、家庭月收入、机动车利用状况密切相关;第二,在影响城市居民通勤距离的变量中,出行工具变量的重要性最大,其次是住房而积变量、最小孩子求学变量,再次为居住地变更变量、职业变量,家庭月收入变量为第四等级,机动车使用变量和本地就业率为第五等级.第三,因住房产权复杂性、迁居原因的多样性、被动郊区化以及生产、育儿福利及家庭内部事务分工等因素的影响,住房面积、迁居史、家庭生命周期、职业与通勤距离的关系与国内已有结论相悖,部分变量因子对短距离通勤具有决定性影响,部分变量对于长距离通勤有决定性影响.%With the development of suburbanization, urban residents now have more choices in jobs and housing locations. Nowadays, scholars increasingly pay attention to the studies on citizens' commuting mode. The analysis of commuting space characteristics belongs to the study of geography. Based on questionnaire survey, this paper first makes a descrip-tive analysis of people's commuting variables, distances, and directions. Then it discusses the commuters of Beijing by decision tree analysis and data mining. Conclusions are ob-tained as follows:First, under the fixed pruning severity, people's commuting distance is related to their traveling vehicles, resident locations, jobs, youngest child's education conditions, living space, family incomes, usage of cars, and employment rate on local areas. Factors such as gender, educational level, marital status, housing property are not involved in the mode. Second, our study of the relations between the eight variables and commuting distance is

  5. Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods

    OpenAIRE

    Khan, Saqib Hussain

    2010-01-01

    Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the databas...

  6. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    OpenAIRE

    Barbara Kraszewska-Głomba; Zofia Szymańska-Toczek; Leszek Szenborn

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with t...

  7. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  8. USING PRECEDENTS FOR REDUCTION OF DECISION TREE BY GRAPH SEARCH

    Directory of Open Access Journals (Sweden)

    I. A. Bessmertny

    2015-01-01

    Full Text Available The paper considers the problem of mutual payment organization between business entities by means of clearing that is solved by search of graph paths. To reduce the decision tree complexity a method of precedents is proposed that consists in saving the intermediate solution during the moving along decision tree. An algorithm and example are presented demonstrating solution complexity coming close to a linear one. The tests carried out in civil aviation settlement system demonstrate approximately 30 percent shortage of real money transfer. The proposed algorithm is planned to be implemented also in other clearing organizations of the Russian Federation.

  9. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense

    OpenAIRE

    Pedersen, Ted

    2001-01-01

    This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

  10. Predicting metabolic syndrome using decision tree and support vector machine methods

    Science.gov (United States)

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-01-01

    BACKGROUND Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. METHODS This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. RESULTS SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. CONCLUSION The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in

  11. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  12. Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal; Tunç, Murat Mustafa; Tunc, Murat Mustafa

    2012-01-01

    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding ...

  13. Independent Component Analysis and Decision Trees for ECG Holter Recording De-Noising

    OpenAIRE

    Jakub Kuzilek; Vaclav Kremen; Filip Soucek; Lenka Lhotska

    2014-01-01

    We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA). This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE) between origin...

  14. Bringing Science and Pragmatism together - a Tiered Approach for Modelling Toxicological Impacts in LCA

    DEFF Research Database (Denmark)

    Guinée, J; De Koning, A; Pennington, David W.;

    2004-01-01

    for as broad a range of chemicals as possible: 1) A base model representing a state-of-the-art multimedia model and 2) a simple model derived from the base model using statistical tools. Discussion. A preliminary decision tree for using the OMNIITOX information system (IS) is presented. The decision tree aims...

  15. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

    Science.gov (United States)

    Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

    2012-05-30

    The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release.

  16. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  17. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    Hoogh, de Sebastiaan; Schoenmakers, Berry; Chen, Ping; Akker, op den Harm

    2014-01-01

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approa

  18. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  19. Construction of a decision tree in linear programming problems

    International Nuclear Information System (INIS)

    The dependence of the solution of a linear programming problem on its parameter has been analyzed. An algorithm for the construction of a decision tree has been proposed with the use of the simplex method together with the validity support system

  20. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression T...

  1. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  2. Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

    Directory of Open Access Journals (Sweden)

    Chuan Ding

    2016-10-01

    Full Text Available Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.

  3. Extraction of information on construction land based on multi-feature decision tree classification%基于多特征决策树的建设用地信息提取

    Institute of Scientific and Technical Information of China (English)

    饶萍; 王建力; 王勇

    2014-01-01

    Spatial distribution status of construction land is closely related to the regional economic and social development. Therefore, timely monitoring and delivery of data on the dynamics of construction land are far-reaching for policy and decision making processes. Classifying land-use/land-cover and analyzing changes are among the most common applications of remote sensing. One of the most basic and difficult classification tasks is to distinguish the construction land from other land surfaces. Landsat imagery is one of the most widely used sources of data in remote sensing of construction land. Several techniques of construction land extraction using Landsat data are described in some literatures, but their applications are constrained by low accuracy in various situations, and usually using the technique of single index or multi-index. The purpose of this study was to devise a method to improve the accuracy of construction land extraction in the presence of various kinds of environmental noise. Thus we introduce a multi-features decision tree (DT) classification model for improving classification accuracy in the areas that including bare land, shadow and some streams, in which the other classification methods often fail to classify correctly. The model integrates four spectral indexes, the pattern recognition technique and spatial algorithms. The four spectral indexes are the normalized difference three bands index (NDTBI), the normalized difference building index (NDBI), the modified normalized difference water index (MNDWI) and the normalized difference vegetation index (NDVI) respectively. The pattern recognition technique is referred to support vector machine (SVM). And the spatial algorithm is to create buffer zone. The test site was deliberately selected so that it consists of complex surface features, such as bare land, hill shade, and some small streams that are liable to be mixed up with construction land on the Landsat imagery. For that reason, Landsat-8

  4. Decision tree classification based on fitted phenology parameters from remotely sensed vegetation data%基于拟合物候参数的植被遥感决策树分类

    Institute of Scientific and Technical Information of China (English)

    康峻; 侯学会; 牛铮; 高帅; 贾坤

    2014-01-01

    Phenology refers to periodic plant life cycle events influenced by climate and other environmental factors, such as sprouting, flowering, fruiting and leaves falling, etc. Different vegetation types have distinct growth characteristics, and phenology can be a good representative parameter to classify vegetation types. Phonological parametric analysis is mainly used to find out significant changes in specific time points and extract corresponding characteristic VI values, by analyzing a time-series vegetation index, e.g., start of season (SOS), end of season (EOS), length of season (LOS), max of EVI (MOE) and amplitude of EVI (AOE). These key phenology parameters can be used to classify vegetation types. Eerguna and Genhe in Hulunbeier city, Inner Mongolia Autonomous Region were selected as the study area. A double logistic function fitting method was used to smooth the time series MODIS-EVI data. The time range was from the summer of 2011 (DOY=209) to the summer of 2013 (DOY=193), and the total number of images was 46. Then, 100 points of each land cover type (grass, forest, crops, other non-vegetation) were chosen as classification samples. Five key phenological parameters mentioned above were extracted and used to build the decision tree classifier. The overall classification accuracy of the results reached to 73.67%. The results show that vegetation in Hulunbeier northern region had obvious unique features. The season of forest started earliest (145-160 days, DOY, hereinafter), and ended quiet early (250-275 days); the season of grass started slightly later than forest (160-170 days), but the length of season was similar to forest, both were from 90 to 120 days. The season of crops started late and ended early, so the season of crops was short and concentrated, the length of the samples was from 60 to 90 days. The classification achieved better results than MODIS land cover products (66.08%). Except for grass’ user accuracy being a little lower, producer

  5. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    Science.gov (United States)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  6. Supervised hashing using graph cuts and boosted decision trees.

    Science.gov (United States)

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data.

  7. Analisa Performansi menggunakan Algoritma Decision Tree

    OpenAIRE

    Swendy, Maries

    2016-01-01

    Data mining have been implemented to get the information more usefull then using conventional database combine with using human analysis as the user from the organization/ company systems. This Thesis proposes a tools to monitoring and tracking performance from the connectedness rule model of the results of survey, audit data and revenue’s data in organization/ company systems. The more dominant factors that influence the growth revenue as the variable of organization/ compa...

  8. Research on the accuracy of TM images land-use classification based on QUEST decision tree: A case study of Lijiang in Yunnan%基于QUEST决策树的遥感影像土地利用分类——以云南省丽江市为例

    Institute of Scientific and Technical Information of China (English)

    吴健生; 潘况; 彭建; 黄秀兰

    2012-01-01

    The accuracy of research on land use/cover change (LUCC) is determined directly by the accuracy of land use classification derived from aerial and satellite images. In analysis of the factors of accuracy of current remote sensing image classification, some methods were introduced to study new trends of classification modes. Some previous studies showed that the speed and accuracy of QUEST (Quick, Unbiased, and Efficient Statistical Tree) decision tree classification were superior to those of other decision tree classifications. On the basis of this approach, the research classified the Landsat TM-5 images in Lijiang, Yunnan province. This paper compared the result with that of maximum likelihood image classification. The overall accuracy was 90. 086 %, which was higher than the overall accuracy (85. 965%) of CART (Classification And Regression Tree). Meanwhile, the Kappa efficient was 0. 849, which was higher than the Kappa efficient (0. 760) of CART. Therefore, it is concluded that in the complex terrain area such as in mountainous regions, the choice of QUEST decision tree classification on TM image would improve the accuracy of land use classification. This type of classification decision tree can precisely obtain new classification rules from integrated satellite images, land use thematic maps, DEM maps and other field investigation materials. Simultaneously, the method can also help users to find new classification rules in multidimensional information, and to build decision tree classifier models. Furthermore, the methods, including a large number of high-resolution and hyperspectral image data, integrated multi-sensor platform, multi-temporal remote sensing image, the pattern recognition and data mining of spectral and texture features, and auxiliary geographic data, will become a trend.%土地利用分类精度直接决定土地利用/土地覆被变化相关研究的准确性,而基于决策树的遥感影像分类是近年来提高土地利用分类

  9. Distributed Decision-Tree Induction in Peer-to-Peer Systems

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in...

  10. Emergent Linguistic Rules from Inducing Decision Trees Disambiguating Discourse Clue Words

    CERN Document Server

    Siegel, E V; Siegel, Eric V.; Keown, Kathleen R. Mc

    1994-01-01

    We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules.

  11. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  12. Decision Trees and Transient Stability of Electric Power Systems

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1991-01-01

    An inductive inference method for the automatic building of decision trees is investigated. Among its various tasks, the splitting and the stop splitting criteria successively applied to the nodes of a grown tree, are found to play a crucial role on its overall shape and performances. The application of this general method to transient stability is systematically explored. Parameters related to the stop splitting criterion, to the learning set and to the tree classes are thus considered, a...

  13. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    Science.gov (United States)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  14. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  15. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  16. Land use classification in arid region based on multi-seasonal linear spectral mixture analysis and decision tree method%基于多季相光谱混合分解和决策树的干旱区土地利用分类

    Institute of Scientific and Technical Information of China (English)

    姜宛贝; 孙强强; 曲葳; 刘晓娜; 于文婧; 孙丹峰

    2016-01-01

    endmembers within each pixel. At last, these endmember abundance estimates were used for land cover/use classification in Minqin study area by using the decision tree method. According to the natural environment and land-use characters of study area and given the resolution of remote sensing data and applicability for ecosystem service assessment, in this research, we developed the two-level classification system. Exposed surface, crop land, forest/shrub land, grassland, impervious surface and water area were defined as first-level classes. The exposed surface was subdivided into moving sand, Gobi/hill/bare-land, salinized moving sand, and saline-alkaline land. Crop land was subdivided into spring crop, summer crop, perennial crop based on seasonal growth characteristics. Similarly, forest/shrub land was subdivided into spring forest/shrub, summer forest/shrub, and evergreen forest/shrub. Decision tree was designed based on the seasonality pattern of feature endmember abundance of each target class. The first step in the classification procedure was to overlay the training data on the three-seasonal abundance composite images for identifying the seasonality pattern of each class. The second step was to measure the feature endmember abundance distribution of each class within training samples. Aided by the histogram distribution, the segmenting boundary of each node was established by an interactive process. The results showed that sand, salt, green vegetation, dark materials, and water were five endmember classes used for multi-seasonal linear spectral mixture analysis. But their representative seasons were different. So, we extracted endmember reflectance for sand, salt, and green vegetation from early winter, spring, and summer, respectively. The spectral reflectance of dark material and water endmembers were derived from spring as well as salt. The mean RMSE (root-mean-square error) values were all lower than 0.01, which meant good fitness of linear spectral mixture model

  17. An analysis and study of decision tree induction operating under adaptive mode to enhance accuracy and uptime in a dataset introduced to spontaneous variation in data attributes

    Directory of Open Access Journals (Sweden)

    Uttam Chauhan

    2011-01-01

    Full Text Available Many methods exist for the purpose of classification of an unknown dataset. Decision tree induction is one of the well-known methods for classification. Decision tree method operates under two different modes: nonadaptive and adaptive mode. The non adaptive mode of operation is applied when the data set is completely mature and available or the data set is static and their will be no changes in dataset attributes. However when the dataset is likely to have changes in the values and attributes leading to fluctuation i.e., monthly, quarterly or annually, then under the circumstances decision tree method operating under adaptive mode needs to be applied, as the conventional non-adaptive method fails, as it needs to be applied once again starting from scratch on the augmented dataset. This makes things expensive in terms of time and space. Sometimes attributesare added into the dataset, at the same time number of records also increases. This paper mainly studies the behavioral aspects of classification model particularly, when number of attr bute in dataset increase due to spontaneous changes in the value(s/attribute(s. Our investigative studies have shown that accuracy of decision tree model can be maintained when number of attributes including class increase in dataset which increases thenumber of records as well. In addition, accuracy also can be maintained when number of values increase in class attribute of dataset. The way Adaptive mode decision tree method operates is that it reads data instance by instance and incorporates the same through absorption to the said model; update the model according to valueof attribute particular and specific to the instance. As the time required to updating decision tree can be less than introducing it from scratch, thus eliminating the problem of introducing decision tree repeatedly from scratch and at the same time gaining upon memory and time.

  18. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  19. Analisis Dan Perancangan Sistem Pendukung Keputusan Untuk Menghindari Kredit Macet (Non Performing Loan) Perbankan Menggunakan Algoritma Decision Tree

    OpenAIRE

    Sinuhaji, Andika Rafon

    2010-01-01

    A model of decision maker is needed to help people, especially to make a decission accurate, efficient, and effective, the model called decision support system. The aim of decision support system is to utilize the advantages of human and electronic instrument for solving various unstructured problems. The objective of this study is to avoid non performing loan credit in the proces of granting credit facility. Decision of the study by using decision tree method. The solution method consist of...

  20. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  1. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  2. Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies

    Directory of Open Access Journals (Sweden)

    José António Figueiredo Almaça

    2013-09-01

    Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.

  3. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  4. Diagnostic Features of Common Oral Ulcerative Lesions: An Updated Decision Tree

    Science.gov (United States)

    Safi, Yaser

    2016-01-01

    Diagnosis of oral ulcerative lesions might be quite challenging. This narrative review article aims to introduce an updated decision tree for diagnosing oral ulcerative lesions on the basis of their diagnostic features. Various general search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of MeSH keywords such as “oral ulcer,” “stomatitis,” and “mouth diseases.” Thereafter, English-language articles published since 1983 to 2015 in both medical and dental journals including reviews, meta-analyses, original papers, and case reports were appraised. Upon compilation of the relevant data, oral ulcerative lesions were categorized into three major groups: acute, chronic, and recurrent ulcers and into five subgroups: solitary acute, multiple acute, solitary chronic, multiple chronic, and solitary/multiple recurrent, based on the number and duration of lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by stepwise progression. PMID:27781066

  5. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  6. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  7. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  8. Decision tree analysis of factors influencing rainfall-related building damage

    Directory of Open Access Journals (Sweden)

    M. H. Spekkers

    2014-04-01

    Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of

  9. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  10. An Analysis on Performance of Decision Tree Algorithms using Student’s Qualitative Data

    Directory of Open Access Journals (Sweden)

    T.Miranda Lakshmi

    2013-06-01

    Full Text Available Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classification accuracy of CART is higher when compared to ID3 and C4.5. However the difference in classification accuracy between the decision tree algorithms is not considerably higher. The experimental results of decision tree indicate that student’s performance also influenced by qualitative factors.

  11. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  12. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  13. Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems

    CERN Document Server

    Gupta, Anupam; Nagarajan, Viswanath; Ravi, R

    2010-01-01

    We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disease, what is a good adaptive strategy to perform these tests to minimize the expected cost to identify the disease? We settle the approximability of this problem by giving a tight $O(\\log m)$-approximation algorithm. We also consider a more substantial generalization, the Adaptive TSP problem. Given an underlying metric space, a random subset $S$ of cities is drawn from a known distribution, but $S$ is initially unknown to us--we get information about whether any city is in $S$ only when we visit the city in question. What is a good adaptive way of visiting all the cities in the random subset $S$ while minimizing the expected distance traveled? For this problem, we give the first poly-logarithmic approximation, and show that this algorithm is best possible unless w...

  14. Discovering Patterns in Brain Signals Using Decision Trees

    Directory of Open Access Journals (Sweden)

    Narusci S. Bastos

    2016-01-01

    Full Text Available Even with emerging technologies, such as Brain-Computer Interfaces (BCI systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain’s behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain’s behaviour.

  15. Decision Tree Classifiers for Star/Galaxy Separation

    CERN Document Server

    Vasconcellos, E C; Gal, R R; LaBarbera, F L; Capelato, H V; Velho, H F Campos; Trevisan, M; Ruiz, R S R

    2010-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14\\le r\\le21$ ($85.2%$) and $r\\ge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15\\le r\\le21$, with m...

  16. Efficient OCR using simple features and decision trees with backtracking

    International Nuclear Information System (INIS)

    In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)

  17. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  18. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  19. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    Science.gov (United States)

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  20. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  1. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  2. 基于高分一号影像光谱指数识别火烧迹地的决策树方法%Decision Tree Method for Burned Area Identification Based on the Spectral Index of GF-1 WFV Image

    Institute of Scientific and Technical Information of China (English)

    祖笑锋; 覃先林; 尹凌宇; 陈小中; 钟祥清

    2015-01-01

    This paper describes the technique to be needed for rapidly and accurately identifying the burn-ed area by forest fires,following the catastrophic fires by the vegetation index CART decision tree methods using the wide coverage image of GF-1(GF-1 WFV).They were compared between the maximum likeli-hood classification of supervised and unsupervised classification(ISODATA),within burned area indexes, to improve the accuracy of the burned area,shaded vegetation index,global environment monitoring in-dex,improved shadows and bare commission or omission burned phenomenon.The results showed that the decision tree classification method based on CART algorithms for burned area identification has signifi-cantly improved the overall accuracy by 4.38% compared with the maximum likelihood method;Kappa coefficient increased by 0.1024.GF-1 satellite imagery for unsupervised classification(ISODATA)identi-fies the burned area poorly,the overall accuracy and Kappa coefficient are low,the map making accuracy and user accuracy have not reached 1%.%森林火灾发生后,为及时、准确地掌握森林受灾情况,利用高分一号卫星(GF -1)16m 宽幅影像各波段反射率信息,结合计算的归一化植被指数(NDVI)、过火区识别指数(BAI)、阴影植被指数(SVI)、归一化差异水体指数(NDWI)和全球环境监测指数(GEMI)等5种光谱指数,构建森林火烧迹地识别决策树模型(CART);在选取的研究区对该模型方法进行验证,并与最大似然监督分类法和非监督分类(ISODATA)方法所得到的结果精度进行了对比分析,结果表明:采用基于 CART 模型的决策树方法对火烧迹地识别结果精度较最大似然法总体分类精度提高了4.38%,Kappa 系数提高了0.1024,制图精度提高了14.96%,用户精度提高了8.50%;而采用ISODATA 方法识别的火烧迹地的总体精度和 Kappa 系数都较低,制图精度

  3. Internet Traffic Classification Using C4.5 Decision Tree%基于C4.5决策树的流量分类方法

    Institute of Scientific and Technical Information of China (English)

    徐鹏; 林森

    2009-01-01

    近年来,利用机器学习方法处理流量分类问题成为网络测量领域一个新兴的研究方向.在现有研究中,朴素贝叶斯方法及其改进算法以其实现简单、分类高效的特点而被广泛应用.但此类方法过分依赖于样本在样本空间的分布,具有潜在的不稳定性.为此,引入C4.5决策树方法来处理流量分类问题.该方法利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.理论分析和实验结果都表明,利用C4.5决策树来处理流量分类问题在分类稳定性上均具有明显的优势.%In recent years, Internet traffic classification using machine learning has become a new direction in network measurement. Being simple and efficient Naive Bayes and its improved methods have been widely used in this area. But these methods depend too much on probability distribution of sample spacing, so they have connatural instability. To handle this problem, a new method based on C4.5 decision tree is proposed in this paper. This method builds a classification model using information entropy in training data and classifies flows just by a simple search of the decision tree. The theoretical analysis and experimental results show that there are obvious advantages in classification stability when C4.5 decision tree method is used to classify Internet traffic.

  4. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  5. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  6. One-year renal graft survival prediction using a weighted decision tree classifier

    OpenAIRE

    Dalia Atallah; Ali Eldesoky; Amira H.; Mohamed Ghoneim

    2014-01-01

    This study introduces a weighted decision tree algorithm for prediction of graft survival in renal transplantation using preoperative patient's data. The objective was to identify the preoperative attributes that affect the graft survival. Between the years 2000-2009, renal allotransplantation was carried out for 889 patients at Urology and Nephrology Center which is the subject matter of this study. The ID3 algorithm was chosen to build up the decision tree using the weka machine learning so...

  7. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  8. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  9. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  10. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    Directory of Open Access Journals (Sweden)

    Somaya Hashem

    2016-01-01

    Full Text Available Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2 or advanced (F3-F4 fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy.

  11. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    Directory of Open Access Journals (Sweden)

    Malueka Rusdy

    2012-03-01

    Full Text Available Abstract Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.

  12. K-D Decision Tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier

    Science.gov (United States)

    Shibata, Tomoyuki; Wada, Toshikazu

    This paper presents a novel algorithm for Nearest Neighbor (NN) classifier. NN classification is a well-known method of pattern classification having the following properties: * it performs maximum-margin classification and achieves less than twice the ideal Bayesian error, * it does not require knowledge of pattern distributions, kernel functions or base classifiers, and * it can naturally be applied to multiclass classification problems. Among the drawbacks are A) inefficient memory use and B) ineffective pattern classification speed. This paper deals with the problems A and B. In most cases, NN search algorithms, such as k-d tree, are employed as a pattern search engine of the NN classifier. However, NN classification does not always require the NN search. Based on this idea, we propose a novel algorithm named k-d decision tree (KDDT). Since KDDT uses Voronoi-condensed prototypes, it consumes less memory than naive NN classifiers. We have confirmed that KDDT is much faster than NN search-based classifier through a comparative experiment (from 9 to 369 times faster than NN search based classifier). Furthermore, in order to extend applicability of the KDDT algorithm to high-dimensional NN classification, we modified it by incorporating Gabriel editing or RNG editing instead of Voronoi condensing. Through experiments using simulated and real data, we have confirmed the modified KDDT algorithms are superior to the original one.

  13. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  14. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection

    Science.gov (United States)

    Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    Background WHO’s new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. Methods We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. Results 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. Conclusion The decision tree algorithm proposed

  15. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    Science.gov (United States)

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all Pdiscriminator; 92 per cent of animals with this clinical sign had laminitis (OR 40.5, Pdiscrimination (OR 15.5, P<0.001). This is the first epidemiological laminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements. PMID:26969668

  16. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision treeapproach for predicting smokers' quit intentions usingthe data from the International Tobacco Control FourCountry Survey. Three rule-based classification modelsare generated from three data sets using attributes inrelation to demographics, warning labels, and smokers'beliefs. Both demographic attributes and warning labelattributes are important in predicting smokers' quitintentions. The model's ability to predict smokers' quitintentions is enhanced, if the attributes regardingsmokers' internal motivation and beliefs about quittingare included.

  17. One-year renal graft survival prediction using a weighted decision tree classifier

    Directory of Open Access Journals (Sweden)

    Dalia Atallah

    2014-06-01

    Full Text Available This study introduces a weighted decision tree algorithm for prediction of graft survival in renal transplantation using preoperative patient's data. The objective was to identify the preoperative attributes that affect the graft survival. Between the years 2000-2009, renal allotransplantation was carried out for 889 patients at Urology and Nephrology Center which is the subject matter of this study. The ID3 algorithm was chosen to build up the decision tree using the weka machine learning software. A modification was made on ID3 to refine the results. A weighted vector was introduced. The element of such a vector represents the weight of each attribute which was obtained by trial and error. The results indicated that the weighted algorithm was successful in predicting the graft survival after one year and identifying the attributes affecting graft survival. Keywords: Decision Tree, Data Mining, ID3 Algorithm, Graft Survival, Kidney Transplantation.

  18. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  19. Establishing the diagnostic model of SCC in cervical cancer by using Logistic regression combined with CHAID analysis of decision tree%Logistic回归联合分类树CHAID法建立SCC在宫颈癌中的辅助诊断模型

    Institute of Scientific and Technical Information of China (English)

    王静; 郑群; 余素飞; 冯贻君; 沈波

    2015-01-01

    目的 采用Logistic回归筛选与宫颈癌相关的血清肿瘤标志物,并进一步使用分类树卡方自动交互检测法(CHAID)建立鳞状上皮细胞癌相关抗原(Scc)在宫颈癌中的辅助诊断模型.方法 回顾性收集2010至2013年浙江省台州医院检测肿瘤标志物的宫颈癌初诊患者581例,宫颈良性疾病者342例,健康体检者341名,检测其糖类抗原199(CA199)、糖类抗原125(CA125)、CEA、SCC、AFP水平.先采用Logistic回归筛选出有统计学意义的肿瘤标志物,再进一步采用决策树CHAID法确定上述肿瘤标志物在辅助诊断宫颈癌中的价值.最后收集2014年1至12月SCC高于本研究得出的诊断值的子宫相关疾病患者共284例,计算其中的宫颈癌患者比例来验证决策树CHAID法结果.结果 Logistic回归结果显示5类可能与宫颈癌相关的肿瘤标志物中仅SCC具有统计学意义(Wald x2=22.120,P=0.000),OR值及其95% CI为1.900(1.454 ~2.483).随着SCC数值的升高,宫颈癌患者的比例也逐渐增高,当SCC>2.20 μg/L时,阳性预测值达94.7%.284例SCC高于2.20 μg/L的考虑子宫相关疾病的人群中,最终证实为宫颈癌的比例为95.1%(270例).结论 SCC对于官颈癌患者具有较好的辅助诊断价值.%Objective To explore the relationship between serum tumor markers and cervical cancer by using Logistic regression, and to further establish the diagnosis model of squamous cell carcinoma antigen (SCC) in cervical cancer by using chi-squared automatic interaction detector (CHAID) analysis of decision tree.Methods Total of 581 cases of cervical cancer,342 cases of cervical benign diseases and 341 cases of healthy controls who detected tumor markers in Taizhou Hospital of Zhejiang during 2010-2013, were retrospectively studied.The test results of carbohydrate antigen 199 (CA199), carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA), SCC, and alpha fetoprotein (AFP) were reviewed.The Logistic regression were

  20. Decision tree for the binding of dipeptides to the thermally fluctuating surface of cathepsin K

    Science.gov (United States)

    Nishiyama, Katsuhiko

    2016-03-01

    The behavior of 15 dipeptides on thermally fluctuating cathepsin K was investigated by molecular dynamics and docking simulations. Four dipeptides were distributed on sites near the active center, and the variations were small. Eleven dipeptides were distributed on sites far from the active center, and the variations were large for nine dipeptides and very large for the other two. The decision tree was constructed using genetic programming, and it accurately classified the 15 dipeptides. The decision tree would accurately estimate the behavior of various peptides, and should significantly contribute to the design of useful peptides.

  1. P2P Domain Classification using Decision Tree

    CERN Document Server

    Ismail, Anis

    2011-01-01

    In Peer-to-Peer context, a challenging problem is how to find the appropriate peer to deal with a given query without overly consuming bandwidth? Different methods proposed routing strategies of queries taking into account the P2P network at hand. This paper considers an unstructured P2P system based on an organization of peers around Super-Peers that are connected to Super-Super- Peer according to their semantic domains; By analyzing the queries log file, a predictive model that avoids flooding queries in the P2P network is constructed after predicting the appropriate Super-Peer, and hence the peer to answer the query. A challenging problem in a schema-based Peer-to-Peer (P2P) system is how to locate peers that are relevant to a given query. In this paper, architecture, based on (Super-)Peers is proposed, focusing on query routing. The approach to be implemented, groups together (Super-)Peers that have similar interests for an efficient query routing method. In such groups, called Super-Super-Peers (SSP), Su...

  2. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  3. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  4. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    Science.gov (United States)

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  5. Decision trees are PAC-learnable from most product distributions: a smoothed analysis

    CERN Document Server

    Kalai, Adam Tauman

    2008-01-01

    We consider the problem of PAC-learning decision trees, i.e., learning a decision tree over the n-dimensional hypercube from independent random labeled examples. Despite significant effort, no polynomial-time algorithm is known for learning polynomial-sized decision trees (even trees of any super-constant size), even when examples are assumed to be drawn from the uniform distribution on {0,1}^n. We give an algorithm that learns arbitrary polynomial-sized decision trees for {\\em most product distributions}. In particular, consider a random product distribution where the bias of each bit is chosen independently and uniformly from, say, [.49,.51]. Then with high probability over the parameters of the product distribution and the random examples drawn from it, the algorithm will learn any tree. More generally, in the spirit of smoothed analysis, we consider an arbitrary product distribution whose parameters are specified only up to a [-c,c] accuracy (perturbation), for an arbitrarily small positive constant c.

  6. Visualization of Decision Tree State for the Classification of Parkinson's Disease

    NARCIS (Netherlands)

    Valentijn, E

    2016-01-01

    Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node wi

  7. Binary Decision Tree Development for Probabilistic Safety Assessment Applications

    International Nuclear Information System (INIS)

    The aim of this article is to describe state of the development for the relatively new approach in the probabilistic safety analysis (PSA). This approach is based on the application of binary decision diagrams (BDD) representation for the logical function on the quantitative and qualitative analysis of complex systems that are presented by fault trees and event trees in the PSA applied for the nuclear power plants risk determination. Even BDD approach offers full solution comparing to the partial one from the conventional quantification approach there are still problems to be solved before new approach could be fully implemented. Major problem with full application of BDD is difficulty of getting any solution for the PSA models of certain complexity. This paper is comparing two approaches in PSA quantification. Major focus of the paper is description of in-house developed BDD application with implementation of the original algorithms. Resulting number of nodes required to represent the BDD is extremely sensitive to the chosen order of variables (i.e., basic events in PSA). The problem of finding an optimal order of variables that form the BDD falls under the class of NP-complete complexity. This paper presents an original approach to the problem of finding the initial order of variables utilized for the BDD construction by various dynamical reordering schemes. Main advantage of this approach compared to the known methods of finding the initial order is with better results in respect to the required working memory and time needed to finish the BDD construction. Developed method is compared against results from well known methods such as depth-first, breadth-first search procedures. Described method may be applied in finding of an initial order for fault trees/event trees being created from basic events by means of logical operations (e.g. negation, and, or, exclusive or). With some testing models a significant reduction of used memory has been achieved, sometimes

  8. Utilizing Home Healthcare Electronic Health Records for Telehomecare Patients With Heart Failure: A Decision Tree Approach to Detect Associations With Rehospitalizations.

    Science.gov (United States)

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H

    2016-04-01

    Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports. PMID:26848645

  9. Transient Stability Assessment using Decision Trees and Fuzzy Logic Techniques

    OpenAIRE

    A. Y. Abdelaziz; M. A. El-Dessouki

    2013-01-01

    Many techniques are used for Transient Stability assessment (TSA) of synchronous generators encompassing traditional time domain state numerical integration, Lyapunov based methods, probabilistic approaches and Artificial Intelligence (AI) techniques like pattern recognition and artificial neural networks.This paper examines another two proposed artificial intelligence techniques to tackle the transient stability problem. The first technique is based on the Inductive Inference Reasoning (IIR)...

  10. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    Science.gov (United States)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  11. Soft context clustering for F0 modeling in HMM-based speech synthesis

    Science.gov (United States)

    Khorram, Soheil; Sameti, Hossein; King, Simon

    2015-12-01

    This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure

  12. Nitrogen removal influence factors in A/O process and decision trees for nitrification/denitrification system

    Institute of Scientific and Technical Information of China (English)

    MA Yong; PENG Yong-zhen; WANG Shu-ying; WANG Xiao-lian

    2004-01-01

    In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.

  13. Condition monitoring on grinding wheel wear using wavelet analysis and decision tree C4.5 algorithm

    Directory of Open Access Journals (Sweden)

    S.Devendiran

    2013-10-01

    Full Text Available A new online grinding wheel wear monitoring approach to detect a worn out wheel, based on acoustic emission (AE signals processed by discrete wavelet transform and statistical feature extraction carried out using statistical features such as root mean square and standard deviation for each wavelet decomposition level and classified using tree based knowledge representation methodology decision tree C4.5 data mining techniques is proposed. The methodology was validate with AE signal data obtained in Aluminium oxide 99 A(38A grinding wheel which is used in three quarters of majority grinding operations under different grinding conditions to validate the proposed classification system. The results of this scheme with respect to classification accuracy were discussed.

  14. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    Directory of Open Access Journals (Sweden)

    Hedayetul Islam Shovon

    2012-08-01

    Full Text Available Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student’s GPA and based on that instructor can take necessary step to improve student academic performance

  15. Comparative Analysis of Serial Decision Tree Classification Algorithms

    OpenAIRE

    Matthew Nwokejizie Anyanwu; Sajjan Shiva

    2009-01-01

    Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card ra...

  16. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    Science.gov (United States)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  17. EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE

    Directory of Open Access Journals (Sweden)

    S. Anupama Kumar

    2011-07-01

    Full Text Available Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps were taken to improve the performance of the students who were predicted to fail. After the declaration of the results in the final examination the marks obtained by the students are fed into the system and the results were analyzed. The comparative analysis of the results states that the prediction has helped the weaker students to improve and brought out betterment in the result. To analyse the accuracy of the algorithm, it is compared with ID3 algorithm and found to be more efficient in terms of the accurately predicting the outcome of the student and time taken to derive the tree. Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining, Bayesian network etc can be applied on the educational data for predicting the students behavior, performance in examination etc. This prediction will help the tutors to identify the weak students and help them to score better marks. The C4.5 decision tree algorithm is applied on student’s internal assessment data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to fail or pass. The result is given to the tutor and steps

  18. Transient Stability Assessment using Decision Trees and Fuzzy Logic Techniques

    Directory of Open Access Journals (Sweden)

    A. Y. Abdelaziz

    2013-09-01

    Full Text Available Many techniques are used for Transient Stability assessment (TSA of synchronous generators encompassing traditional time domain state numerical integration, Lyapunov based methods, probabilistic approaches and Artificial Intelligence (AI techniques like pattern recognition and artificial neural networks.This paper examines another two proposed artificial intelligence techniques to tackle the transient stability problem. The first technique is based on the Inductive Inference Reasoning (IIR approach which belongs to a particular family of machine learning from examples. The second presents a simple fuzzy logic classifier system for TSA. Not only steady state but transient attributes are used for transient stability estimation so as to reflect machine dynamics and network changes due to faults.The two techniques are tested on a standard test power system. The performance evaluation demonstrated satisfactory results in early detection of machine instability. The advantage of the two techniques is that they are straightforward and simple for on-line implementation.

  19. Measurement of the t-channel single top-quark production using boosted decision trees in ATLAS experiment at √(s)=7 TeV

    International Nuclear Information System (INIS)

    This thesis presents a measurement of the cross section of t-channel single top-quark production using 1.04 fb-1 data collected by the ATLAS detector at the LHC with proton-proton collision at center-of-mass √(s)=7 TeV. Selected events contain one lepton, missing transverse energy, and two or three jets, one of them b-tagged. The background model consists of multi-jets, W+jets and top-quark pair events, with smaller contributions from Z+jets and di-boson events. By using a selection based on the distribution of a multivariate discriminant constructed with the boosted decision trees, the cross section of t-channel single top-quark production is measured: σt = (97.3 +30.7 -30.2) pb, which is in good agreement with the prediction of the Standard Model. Assuming that the top-quark-related CKM matrix elements obey the relation |Vtb|>> |Vts|, |Vtd|, the coupling strength at the Wtb vertex is extracted from the measured cross section, |Vtb| = (1.23 +0.20 -0.19). If it is assumed that |Vtb| ≤ 1 a lower limit of |Vtb| > 0.61 is obtained at the 95% confidence level. (author)

  20. A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining

    CERN Document Server

    Kadampur, Mohammad Ali

    2010-01-01

    Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their enterprises. These data sets typically contain sensitive individual information, which consequently get exposed to the other parties. Though we cannot deny the benefits of knowledge discovery that comes through data mining, we should also ensure that data privacy is maintained in the event of data mining. Privacy preserving data mining is a specialized activity in which the data privacy is ensured during data mining. Data privacy is as important as the extracted knowledge and efforts that guarantee data privacy during data mining are encouraged. In this paper we propose a strategy that protects the data privacy during decision tree analysis of data mining process. We propose to add specific noise to the numeric attributes after exploring the decision tree of the original data. T...

  1. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  2. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  3. Intrusion Preventing System using Intrusion Detection System Decision Tree Data Mining

    Directory of Open Access Journals (Sweden)

    Syurahbil

    2009-01-01

    Full Text Available Problem statement: To distinguish the activities of the network traffic that the intrusion and normal is very difficult and to need much time consuming. An analyst must review all the data that large and wide to find the sequence of intrusion on the network connection. Therefore, it needs a way that can detect network intrusion to reflect the current network traffics. Approach: In this study, a novel method to find intrusion characteristic for IDS using decision tree machine learning of data mining technique was proposed. Method used to generate of rules is classification by ID3 algorithm of decision tree. Results: These rules can determine of intrusion characteristics then to implement in the firewall policy rules as prevention. Conclusion: Combination of IDS and firewall so-called the IPS, so that besides detecting the existence of intrusion also can execute by doing deny of intrusion as prevention.

  4. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  5. A Decision Tree Approach to Classify Web Services using Quality Parameters

    OpenAIRE

    Sonawani, Shilpa; Mukhopadhyay, Debajyoti

    2013-01-01

    With the increase in the number of web services, many web services are available on internet providing the same functionality, making it difficult to choose the best one, fulfilling users all requirements. This problem can be solved by considering the quality of web services to distinguish functionally similar web services. Nine different quality parameters are considered. Web services can be classified and ranked using decision tree approach since they do not require long training period and...

  6. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    OpenAIRE

    Yang Youjin; Gu Bokyung; Yoon Taeseon

    2016-01-01

    Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  7. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    Directory of Open Access Journals (Sweden)

    Yang Youjin

    2016-01-01

    Full Text Available Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  8. An Examination of Mathematically Gifted Students' Learning Styles by Decision Trees

    OpenAIRE

    Esra Aksoy; Serkan Narlı

    2015-01-01

    The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all t...

  9. 基于三期决策树分析平台建立护理质量综合评价体系%Establishment of comprehensive evaluation system of nursing quality based on three stage decision tree analysis plat-form

    Institute of Scientific and Technical Information of China (English)

    吴疆; 肖红著; 夏丽娅; 伍艳玲; 甘露; 桂文芳; 李劼; 邓晖

    2016-01-01

    [目的]运用决策树法客观、准确、快速地构建护理质量综合评价的等级评估平台。[方法]运用 SPSS18.0中决策树卡方自动交互检测法对各病区单元护理种类、数量、技术风险等级、护理人力配置和能级状况完成综合评价和分类分析,并按以上因素的综合分布情况将病区单元归属为不同等级的护理集群,构建三期决策树分析平台。[结果]建立能兼顾护理工作量、技术风险程度、护理人员配置和能级状况分类分析的三期决策树护理质量综合评价体系。[结论]建立在三期决策树分析平台上的护理质量综合评价体系,分类功能强大,平台建立精准、便捷,分类灵活。%Objective:Using decision tree to objectively,accurately,quickly set up a grade evaluation platform of comprehensive evaluation system of nursing quality.Methods:In SPSS18.0,the decision tree method was used to analyze the types,quantity,technical risk level,nursing manpower allocation and energy level of all units.And according to the above factors,the comprehensive distribution of the unit belongs to the different grades of nurs-ing group,and the three stage decision making power analysis platform was built.Results:A comprehensive e-valuation system of three stage of decision making tree nursing quality was established which took into account the workload of nursing,technical risk level,nursing staff configuration and energy level classification analysis. Conclusion:The comprehensive evaluation system of nursing quality on the three stage decision making analy-sis platform was established.Its classification function was powerful,and the platform was established accurate-ly,conveniently and flexiblely.

  10. Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.

    Science.gov (United States)

    Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K

    2008-07-01

    A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made. PMID:18632325

  11. Decision Trees in the Analysis of the Intensity of Damage to Portal Frame Buildings in Mining Areas / Drzewa Decyzyjne W Analizie Intensywności Uszkodzeń Budynków Halowych Na Terenach Górniczych

    Science.gov (United States)

    Firek, Karol; Rusek, Janusz; Wodyński, Aleksander

    2015-09-01

    The article presents a preliminary database analysis regarding the technical condition of 94 portal frame buildings located in the mining area of Legnica-Głogów Copper District (LGOM), using the methodology of decision trees. The scope of the analysis was divided into two stages. The first one included creating a decision tree by a standard CART method, and determining the importance of individual damage indices in the values of the technical wear of buildings. The second one was based on verification of the created decision tree and the importance of these indices in the technical wear of buildings by means of a simulation of individual dendritic models using the method of random forest. The obtained results confirmed the usefulness of decision trees in the early stage of data analysis. This methodology allows to build the initial model to describe the interaction between variables and to infer about the importance of individual input variables. Celem prezentowanych w artykule badań było sprawdzenie możliwości pozyskiwania informacji na temat udziału uszkodzeń w zużyciu technicznym zabudowy terenu górniczego z wykorzystaniem metody drzew decyzyjnych. Badania przeprowadzono na podstawie utworzonej przez autorów bazy danych o stanie technicznym i uszkodzeniach 94 budynków typu halowego, usytuowanych na terenie górniczym Legnicko-Głogowskiego Okręgu Miedziowego (LGOM). Do analiz przyjęto metodę drzew decyzyjnych CART - Classification & Regression Tree, na bazie której utworzono model aproksymujący wartość zużycia technicznego budynków. W efekcie ustalono wpływ poszczególnych zmiennych na przebieg modelowanego procesu (Rys. 3 i 4). W drugim etapie, stosując metodę losowych lasów przeprowadzono weryfikację wyników uzyskanych dla modelu utworzonego metodą CART (Tab. 2). Przeprowadzone badania pozwoliły na ustalenie udziałów wyspecyfikowanych kategorii uszkodzeń elementów badanych budynków w ich stopniu zużycia technicznego. Najwi

  12. Performance Evaluation of Discriminant Analysis and Decision Tree, for Weed Classification of Potato Fields

    Directory of Open Access Journals (Sweden)

    Farshad Vesali

    2012-09-01

    Full Text Available In present study we tried to recognizing weeds in potato fields to effective use from herbicides. As we know potato is one of the crops which is cultivated vastly all over the world and it is a major world food crop that is consumed by over one billion people world over, but it is threated by weed invade, because of row cropping system applied in potato tillage. Machine vision is used in this research for effective application of herbicides in field. About 300 color images from 3 potato farms of Qorveh city and 2 farms of Urmia University-Iran, was acquired. Images were acquired in different illumination condition from morning to evening in sunny and cloudy days. Because of overlap and shading of plants in farm condition it is hard to use morphologic parameters. In method used for classifying weeds and potato plants, primary color components of each plant were extracted and the relation between them was estimated for determining discriminant function and classifying plants using discrimination analysis. In addition the decision tree method was used to compare results with discriminant analysis. Three different classifications were applied: first, Classification was applied to discriminate potato plant from all other weeds (two groups, the rate of correct classification was 76.67% for discriminant analysis and 83.82% for decision tree; second classification was applied to discriminate potato plant from separate groups of each weed (6 groups, the rate of correct classification was 87%. And the third, Classification of potato plant versus weed species one by one. As the weeds were different, the results of classification were different in this composition. The decision tree in all conditions showed the better result than discriminant analysis.

  13. Decision Tree Complexity of Graph Properties with Dimension at Most 5

    Institute of Scientific and Technical Information of China (English)

    高随祥; 林国辉

    2000-01-01

    A graph property is a set of graphs such that if the set contains some graph G then it also contains each isomorphic copy of G (with the same vertex set). A graph property P on n vertices is said to be elusive, if every decision tree algorithm recognizing P must examine all n(n - 1)/2 pairs of vertices in the worst case. Karp conjectured that every nontrivial monotone graph property is elusive. In this paper, this conjecture is proved for some cases. Especially, it is shown that if the abstract simplicial complex of a nontrivial monotone graph property P has dimension not exceeding 5, then P is elusive.

  14. Use of decision trees for evaluating severe accident management strategies in nuclear power plants

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moosung [Hanyang Univ., Seoul (Korea, Republic of). Dept. of Nuclerar Engineering; Lee, Yongjin; Jerng, Dong Wook [Chung-Ang Univ., Seoul (Korea, Republic of). School of Energy Systems Engineering

    2016-07-15

    Accident management strategies are defined to innovative actions taken by plant operators to prevent core damage or to maintain the sound containment integrity. Such actions minimize the chance of offsite radioactive substance leaks that lead to and intensify core damage under power plant accident conditions. Accident management extends the concept of Defense in Depth against core meltdown accidents. In pressurized water reactors, emergency operating procedures are performed to extend the core cooling time. The effectiveness of Severe Accident Management Guidance (SAMG) became an important issue. Severe accident management strategies are evaluated with a methodology utilizing the decision tree technique.

  15. 决策树技术在农村3岁以下儿童贫血状况研究中的应用%The application of decision tree in the research of anemia among rural children under 3-year-old

    Institute of Scientific and Technical Information of China (English)

    马玉刚; 毕育学; 颜虹; 邓立娜; 梁卫峰; 王蓓; 张雪丽

    2009-01-01

    目的 探讨决策树技术在农村儿童贫血研究中的应用.方法 在SAS 8.2软件的Enterprise Miner模块中,将3000例农村地区3岁以下断奶儿童的卫生保健研究数据按75%和25%分为初步拟合模型的训练集与调整模型的验证集,利用Gini杂质函数建立CART算法决策树模型,以误分率、ROC曲线、Root ASE和诊断图建立的模型进行评价.通过模型中的变量以及变量在模型中的上下层级关系,来分析农村地区3岁以下断奶儿童贫血发生的影响因素,以及影响因素间的相互作用.结果 CART决策树模型中训练集和验证集的误分率分别为21.2%、21.9%,Root ASE为0.399、0.404;模型的ROC曲线高于参考线,有较大的曲线下面积;诊断图中实际值和预测值相一致的比例最大,正确分类的观察符合率明显高于错误分类的观察符合率;决策树模型共筛选出9个影响儿童贫血的重要因素,并按影响因素间的相对重要性进行了排序,其中母亲是否贫血(1.00)是最重要的影响因素,其他的是儿童的月龄(0.75)、儿童的断奶时间(0.53)、孩子母亲的年龄(0.32)、添加鸡蛋的时间(0.26)、项目县分类(0.26)、添加鲜奶的时间(0.16)、家庭人口数(0.13)和母亲受教育年限(0.12).结论 决策树技术为有效分析儿童保健研究方面的资料提供一种新的思路.%Objective To study the application of decision tree in the research of anemia among rural children. Methods In the Enterprise Miner module of software SAS 8.2,3000 observations were sampled from database and the decision tree model was built. The model using decision tree of CART bases on Gini impurity index. Results The misclassification rate of decision tree model was, training set 21.2%, validation set 21.9%. The Root ASE of decision tree model was,training set 0.399,validation set 0.404. The area under the ROC curve was larger than the reference line. The diagnostic chart showed that the corresponding

  16. Decision trees for evaluating skin and respiratory sensitizing potential of chemicals in accordance with European regulations.

    Science.gov (United States)

    Selgrade, Maryjane K; Sullivan, Katherine S; Boyles, Rebecca R; Dederick, Elizabeth; Serex, Tessa L; Loveless, Scott E

    2012-08-01

    Guidance for determining the sensitizing potential of chemicals is available in EC Regulation No. 1272/2008 Classification, Labeling, and Packaging of Substances; REACH guidance from the European Chemicals Agency; and the United Nations Globally Harmonized System (GHS). We created decision trees for evaluating potential skin and respiratory sensitizers. Our approach (1) brings all the regulatory information into one brief document, providing a step-by-step method to evaluate evidence that individual chemicals or mixtures have sensitizing potential; (2) provides an efficient, uniform approach that promotes consistency when evaluations are done by different reviewers; (3) provides a standard way to convey the rationale and information used to classify chemicals. We applied this approach to more than 50 chemicals distributed among 11 evaluators with varying expertise. Evaluators found the decision trees easy to use and recipients (product stewards) of the analyses found that the resulting documentation was consistent across users and met their regulatory needs. Our approach allows for transparency, process management (e.g., documentation, change management, version control), as well as consistency in chemical hazard assessment for REACH, EC Regulation No. 1272/2008 Classification, Labeling, and Packaging of Substances and the GHS. PMID:22584521

  17. Effective use of Fibro Test to generate decision trees in hepatitis C

    Institute of Scientific and Technical Information of China (English)

    Dana Lau-Corona; Luís Alberto Pineda; Héctor Hugo Aviés; Gabriela Gutiérrez-Reyes; Blanca Eugenia Farfan-Labonne; Rafael Núnez-Nateras; Alan Bonder; Rosalinda Martínez-García; Clara Corona-Lau; Marco Antonio Olivera-Martíanez; Maria Concepción Gutiérrez-Ruiz; Guillermo Robles-Díaz; David Kershenobich

    2009-01-01

    AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein,haptoglobin, α2 macroglobulin, and γ-glutamyl FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of F0 and F4 were classified with very high accuracy (18/20 for F0, 9/9 for F0-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in F0 and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression.transpeptidase were used as predictors, and the FibroTest

  18. Snow event classification with a 2D video disdrometer - A decision tree approach

    Science.gov (United States)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  19. Comparison of Attribute Reduction Methods for Coronary Heart Disease Data by Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    ZHENG Gang; HUANG Yalou; WANG Pengtao; SHU Guangfu

    2005-01-01

    Attribute reduction is necessary in decision making system. Selecting right attribute reduction method is more important. This paper studies the reduction effects of principal components analysis (PCA) and system reconstruction analysis (SRA) on coronary heart disease data. The data set contains 1723 records, and 71 attributes in each record. PCA and SRA are used to reduce attributes number (less than 71 ) in the data set. And then decision tree algorithms, C4.5, classification and regression tree ( CART), and chi-square automatic interaction detector ( CHAID), are adopted to analyze the raw data and attribute reduced data. The parameters of decision tree algorithms, including internal node number, maximum tree depth, leaves number, and correction rate are analyzed. The result indicates that, PCA and SRA data can complete attribute reduction work,and the decision-making rate on the reduced data is quicker than that on the raw data; the reduction effect of PCA is better than that of SRA, while the attribute assertion of SRA is better than that of PCA. PCA and SRA methods exhibit goodperformance in selecting and reducing attributes.

  20. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  1. Integration of health services in the care of people living with aids: an approach using a decision tree.

    Science.gov (United States)

    de Medeiros, Leidyanny Barbosa; Trigueiro, Débora Raquel Soares Guedes; da Silva, Daiane Medeiros; do Nascimento, João Agnaldo; Monroe, Aline Aparecida; Nogueira, Jordana de Almeida; Leadebal, Oriana Deyze Correia Paiva

    2016-02-01

    The care offer to people living with HIV/AIDS must transcend specialized outpatient services and include the participation of the Family Health Strategy. By understanding the importance of integration between these two points in the care network, the study aimed to build a decision support model to assist professionals of specialized health services in identifying behavior patterns in the use of Family Health Strategy services by people living with HIV/AIDS attended in the outpatient clinic. Thus, was proposed a model called decision tree, created from a database of 141 people with AIDS, users of a specialized outpatient clinic. The decision-making variable was the use of Family Health Strategy services by evaluating the integration of care. The model enabled the establishment of 23 rules with 80.1% hit percentage, what may support the decision-making of professionals in identifying situations in which it is necessary to stimulate the use of the Family Health Strategy by users. PMID:26910161

  2. The effect of the fragmentation problem in decision tree learning applied to the search for single top quark production

    International Nuclear Information System (INIS)

    Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.

  3. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  4. Using decision trees to predict benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea.

    Science.gov (United States)

    Pesch, Roland; Pehlke, Hendrik; Jerosch, Kerstin; Schröder, Winfried; Schlüter, Michael

    2008-01-01

    In this article a concept is described in order to predict and map the occurrence of benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea. The approach consists of two work steps: (1) geostatistical analysis of abiotic measurement data and (2) calculation of benthic provinces by means of Classification and Regression Trees (CART) and GIS-techniques. From bottom water measurements on salinity, temperature, silicate and nutrients as well as from punctual data on grain size ranges (0-20, 20-63, 63-2,000 mu) raster maps were calculated by use of geostatistical methods. At first the autocorrelation structure was examined and modelled with help of variogram analysis. The resulting variogram models were then used to calculate raster maps by applying ordinary kriging procedures. After intersecting these raster maps with punctual data on eight benthic communities a decision tree was derived to predict the occurrence of these communities within the study area. Since such a CART tree corresponds to a hierarchically ordered set of decision rules it was applied to the geostatistically estimated raster data to predict benthic habitats within and near the EEZ. PMID:17680336

  5. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    Science.gov (United States)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  6. Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis

    Directory of Open Access Journals (Sweden)

    Åhlfeldt Hans

    2008-09-01

    Full Text Available Abstract Background The guideline for postmastectomy radiotherapy (PMRT, which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Methods Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI found patterns of non-compliance with the guideline. The PMRT guideline was used to separate cases according to the recommendation to receive or not receive PMRT. The two groups of patients were analyzed separately. Resulting patterns were transformed into rules that were then compared with the reasons that were extracted by manual inspection of records for the non-compliant cases. Results Analyzing patients in the group who should receive PMRT according to the guideline did not result in a robust decision tree. However, classification of the other group, patients who should not receive PMRT treatment according to the guideline, resulted in a tree with nine leaves and three of them were representing non-compliance with the guideline. In a comparison between rules resulting from these three non-compliant patterns and manual inspection of patient records, the following was found: In the decision tree, presence of perigland growth is the most important variable followed by number of malignantly invaded lymph nodes and level of Progesterone receptor. DNA index, age, size of the tumor and level of Estrogen receptor are also involved but with less importance. From manual inspection of the cases, the most frequent pattern for non-compliance is age above the threshold followed by near cut-off values for risk factors and unknown reasons. Conclusion Comparison of patterns of non-compliance acquired from data mining and manual inspection of patient records demonstrates that not all of the non-compliances are repetitive or important. There

  7. New energy opinion leaders' lifestyles and media usage - applying data mining decision tree analysis for UNIDO - ICHET web site users

    International Nuclear Information System (INIS)

    According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns

  8. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  9. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    CERN Document Server

    Rajendran, P

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper. The frequent patterns from the CT scan images are generated by frequent pattern tree (FP-Tree) algorithm that mines the association rules. The decision tree method has been used to classify the medical images for diagnosis. This system enhances the classification process to be more accurate. The hybrid method improves the efficiency of the proposed method than the traditional image mining methods. The experimental result on prediagnosed database of brain images showed 97% sensitivity and 95% accuracy respectively. The ph...

  10. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  11. Simulation of human behavior elements in a virtual world using decision trees

    Directory of Open Access Journals (Sweden)

    Sandra Mercado Pérez

    2013-05-01

    Full Text Available Human behavior refers to the way an individual responds to certain events or occurrences, naturally cannot predict how an individual can act, for it the computer simulation is used. This paper presents the development of the simulation of five possible human reactions within a virtual world, as well as the steps needed to create a decision tree that supports the selection of any of any of these reactions. For that creation it proposes three types of attributes, those are the personality, the environment and the level of reaction. The virtual world Second Life was selected because of its internal programming language LSL (Linden Scripting Language which allows the execution of predefined animation sequences or creates your own.

  12. Decision tree method applied to computerized prediction of ternary intermetallic compounds

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Decision tree method and atomic parameters were used to find the regularities of the formation of ternary intermetallic compounds in alloy systems. The criteria of formation can be expressed by a group of inequalities with two kinds of atomic parameters Zl (number of valence electrons in the atom of constituent element) and Ri/Rj (ratio of the atomic radius of constituent element i and j) as independent variables. The data of 2238 known ternary alloy systems were used to extract the empirical rules governing the formation of ternary intermetallic compounds, and the facts of ternary compound formation of other 1334 alloy systems were used as samples to test the reliability of the empirical criteria found. The rate of correctness of prediction was found to be nearly 95%. An expert system for ternary intermetallic compound formation was built and some prediction results of the expert system were confirmed.

  13. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  14. KFDA and clustering based multiclass SVM for intrusion detection

    Institute of Scientific and Technical Information of China (English)

    WEI Yu-xin; WU Mu-qing

    2008-01-01

    To improve the classification accuracy and reducethe training time, an intrusion detection technology is proposed,which combines feature extraction technology and multiclasssupport vector machine (SVM) classification algorithm. Theintrusion detection model setup has two phases. The first phaseis to project the original training data into kernel fisherdiscriminant analysis (KFDA) space. The second phase is to usefuzzy clustering technology to cluster the projected data andconstruct the decision tree, based on the clustering results. Theoverall detection model is set up based on the decision tree.Results of the experiment using knowledge discovery and datamining (KDD) from 99 datasets demonstrate that the proposedtechnology can be an an effective way for intrusion detection.

  15. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva;

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study...

  16. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    Science.gov (United States)

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  17. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  18. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  19. 小波分析和决策树在低饱和度气层识别中的应用%Applying the wavelet analysis and decision tree to identify low-saturation natural gas

    Institute of Scientific and Technical Information of China (English)

    贺旭; 李雄炎; 周金煜; 于红岩

    2011-01-01

    The particular reservoir condition and low-amplitude structural trap generate the abundant low saturation natural gas in the Quaternary ot the Sanhu area in the Qaidam basin. It is difficult to accurately delineate reservoirs because of the poor reservoir properties, thin reservoir thickness and limitations of surrounding rocks and logging instrument resolution. The effects of the high shale content, high irreducible water saturation, high formation water salinity, and clay minerals result in the log curves show much ambiguity at Iow-saturation natural gas, so that the identification of low-saturation natural gas is particularly difficult. To solve this problem, this work uses wavelet analysis to reconstruct log curves in order to improve the vertical resolution, makes a comparative analysis with the imaging logging data, and uses improved log curves to accurately delineate reservoirs. At the same time, we employ the decision tree to set up the predictive model of low-saturation natural gas based on the transparency of learning process and intelligibility of study results of the decision tree. This study amends the predictive model based on actual characteristics of reservoirs and achieves the purpose of an accurate identification of low-saturation natural gas. Practical application shows that the wavelet analysis and decision tree can effectively solve the reservoir delineationand identification of low-saturation natural gas problem in the research area.%特殊的成藏条件和低幅度构造圈闭致使柴达木盆地三湖地区第四系存在大量的低饱和度气藏.储层物性较差,储层厚度偏薄,受围岩和测井仪器分辨率的限制,难以准确划分储层;高泥质含量、高束缚水饱和度、高地层水矿化度和粘土矿物的影响,致使测井曲线在低饱和度气层表现出许多模糊性,使低饱和废气层的识别显得尤为困难.针对这一问题,文章采用小波分析对测井曲线进行重构,以提高测井曲

  20. DECISION TREE CONSTRUCTION AND COST-EFFECTIVENESS ANALYSIS OF TREATMENT OF ULCERATIVE COLITIS WITH PENTASA® MESALAZINE 2 G SACHET

    Directory of Open Access Journals (Sweden)

    Alvaro Mitsunori NISHIKAWA

    2013-12-01

    Full Text Available Context Unspecified Ulcerative Rectocolitis is a chronic disease that affects between 0.5 and 24.5/105 inhabitants in the world. National and international clinical guidelines recommend the use of aminosalicylates (including mesalazine as first-line therapy for induction of remission of unspecified ulcerative rectocolitis, and recommend the maintenance of these agents after remission is achieved. However, multiple daily doses required for the maintenance of disease remission compromise compliance with treatment, which is very low (between 45% and 65%. Use of mesalazina in granules (2 g sachet once daily - Pentasa® sachets 2 g - can enhance treatment adherence, reflecting in an improvement in patients' outcomes. Objective To evaluate the evidence on the use of mesalazine for the maintenance of remission in patients with unspecified ulcerative rectocolitis and its effectiveness when taken once versus more than once a day. From an economic standpoint, to analyze the impact of the adoption of this dosage in Brazil's public health system, considering patients' adherence to treatment. Methods A decision tree was developed based on the Clinical Protocol and Therapeutic Guidelines for Ulcerative Colitis, published by the Ministry of Health in the lobby SAS/MS n° 861 of November 4 th, 2002 and on the algorithms published by the Associação Brasileira de Colite Ulcerativa e Doença de Crohn, aiming to get the cost-effectiveness of mesalazine once daily in granules compared with mesalazine twice daily in tablets. Results The use of mesalazine increases the chances of remission induction and maintenance when compared to placebo, and higher doses are associated with greater chance of success without increasing the risk of adverse events. Conclusion The use of a single daily dose in the maintenance of remission is effective and related to higher patient compliance when compared to the multiple daily dose regimens, with lower costs.

  1. Models, methods and software for distributed knowledge acquisition for the automated construction of integrated expert systems knowledge bases

    International Nuclear Information System (INIS)

    Based on an analysis of existing models, methods and means of acquiring knowledge, a base method of automated knowledge acquisition has been chosen. On the base of this method, a new approach to integrate information acquired from knowledge sources of different typologies has been proposed, and the concept of a distributed knowledge acquisition with the aim of computerized formation of the most complete and consistent models of problem areas has been introduced. An original algorithm for distributed knowledge acquisition from databases, based on the construction of binary decision trees has been developed

  2. A Fuzzy Optimization Technique for the Prediction of Coronary Heart Disease Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Persi Pamela. I

    2013-06-01

    Full Text Available Data mining along with soft computing techniques helps to unravel hidden relationships and diagnose diseases efficiently even with uncertainties and inaccuracies. Coronary Heart Disease (CHD is akiller disease leading to heart attack and sudden deaths. Since the diagnosis involves vague symptoms and tedious procedures, diagnosis is usually time-consuming and false diagnosis may occur. A fuzzy system is one of the soft computing methodologies is proposed in this paper along with a data mining technique for efficient diagnosis of coronary heart disease. Though the database has 76 attributes, only 14 attributes are found to be efficient for CHD diagnosis as per all the published experiments and doctors’ opinion. So only the essential attributes are taken from the heart disease database. From these attributes crisp rules are obtained by employing CART decision tree algorithm, which are then applied to the fuzzy system. A Particle Swarm Optimization (PSO technique is applied for the optimization of the fuzzy membership functions where the parameters of the membership functions are altered to new positions. The result interpreted from the fuzzy system predicts the prevalence of coronary heart disease and also the system’s accuracy was found to be good.

  3. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    Science.gov (United States)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  4. Effect of training characteristics on object classification: an application using Boosted Decision Trees

    CERN Document Server

    Sevilla-Noarbe, Ignacio

    2015-01-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects tha...

  5. Decision tree for smart feature extraction from sleep HR in bipolar patients.

    Science.gov (United States)

    Migliorini, Matteo; Mariani, Sara; Bianchi, Anna M

    2013-01-01

    The aim of this work is the creation of a completely automatic method for the extraction of informative parameters from peripheral signals recorded through a sensorized T-shirt. The acquired data belong to patients affected from bipolar disorder, and consist of RR series, body movements and activity type. The extracted features, i.e. linear and non-linear HRV parameters in the time domain, HRV parameters in the frequency domain, and parameters indicative of the sleep quality, profile and fragmentation, are of interest for the automatic classification of the clinical mood state. The analysis of this dataset, which is to be performed online and automatically, must address the problems related to the clinical protocol, which also includes a segment of recording in which the patient is awake, and to the nature of the device, which can be sensitive to movements and misplacement. Thus, the decision tree implemented in this study performs the detection and isolation of the sleep period, the elimination of corrupted recording segments and the checking of the minimum requirements of the signals for every parameter to be calculated. PMID:24110866

  6. The Use of Decision Tree Flowchart in Stomatology Education%决策树流程图辅助口腔临床教学经验介绍

    Institute of Scientific and Technical Information of China (English)

    周敏; 刘宏伟; 何园

    2013-01-01

    Objective:To investigate feasibility of the decision tree flowchart model applying into the clinical teaching of stomatology. Methods: First, a clinical problem of a patient was selected as the target. Then the students were ordered to list all the different possible conditions of the clinical problem or its classifications, and list the indications / contraindications of each treatment method. Finally, a decision tree flowchart was established after the completion of the tasks above. Results: This teaching mode gave full play to the initiative and enthusiasm of the students, which also helped them to classify and summarize the knowledge and developed their logical thinking. It was welcomed and very satisfying for most students. Conclusion: It's more active and effective in dentistry clinical teaching with the help of the decision tree flowchart modal.%目的:探讨将决策树流程图模式应用于口腔临床教学的可行性.方法:2010-08-2012-12期间,对进入牙周科轮转的20名住院医师,临床理论教学采用了决策树流程图方法.以某一临床问题为目标,引导学生通过列举出与这一目标问题的相关分类、不同的临床情况以及所有相应治疗方式的适应证和禁忌证,从而构建出决策树模型.结果:在这一教学模式中学生可以充分发挥积极性,将多个知识点进行归类、梳理和归纳,调动了发散思维和逻辑思维,获得学生好评,取得了满意的教学效果.结论:利用决策树流程图进行教学,可以使口腔临床教学更加积极有效.

  7. Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    CERN Document Server

    Ball, N M; Myers, A D; Tcheng, D; Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-01-01

    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness an...

  8. Validation of probability equation and decision tree in predicting subsequent dengue hemorrhagic fever in adult dengue inpatients in Singapore.

    Science.gov (United States)

    Thein, Tun L; Leo, Yee-Sin; Lee, Vernon J; Sun, Yan; Lye, David C

    2011-11-01

    We developed a probability equation and a decision tree from 1,973 predominantly dengue serotype 1 hospitalized adult dengue patients in 2004 to predict progression to dengue hemorrhagic fever (DHF), applied in our clinic since March 2007. The parameters predicting DHF were clinical bleeding, high serum urea, low serum protein, and low lymphocyte proportion. This study validated these in a predominantly dengue serotype 2 cohort in 2007. The 1,017 adult dengue patients admitted to Tan Tock Seng Hospital, Singapore had a median age of 35 years. Of 933 patients without DHF on admission, 131 progressed to DHF. The probability equation predicted DHF with a sensitivity (Sn) of 94%, specificity (Sp) 17%, positive predictive value (PPV) 16%, and negative predictive value (NPV) 94%. The decision tree predicted DHF with a Sn of 99%, Sp 12%, PPV 16%, and NPV 99%. Both tools performed well despite a switch in predominant dengue serotypes.

  9. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  10. Agent Based Model of Livestock Movements

    Science.gov (United States)

    Miron, D. J.; Emelyanova, I. V.; Donald, G. E.; Garner, G. M.

    The modelling of livestock movements within Australia is of national importance for the purposes of the management and control of exotic disease spread, infrastructure development and the economic forecasting of livestock markets. In this paper an agent based model for the forecasting of livestock movements is presented. This models livestock movements from farm to farm through a saleyard. The decision of farmers to sell or buy cattle is often complex and involves many factors such as climate forecast, commodity prices, the type of farm enterprise, the number of animals available and associated off-shore effects. In this model the farm agent's intelligence is implemented using a fuzzy decision tree that utilises two of these factors. These two factors are the livestock price fetched at the last sale and the number of stock on the farm. On each iteration of the model farms choose either to buy, sell or abstain from the market thus creating an artificial supply and demand. The buyers and sellers then congregate at the saleyard where livestock are auctioned using a second price sealed bid. The price time series output by the model exhibits properties similar to those found in real livestock markets.

  11. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  12. Application of decision tree classification to rubber plantations extraction with remote sensing%基于决策树分类的橡胶林地遥感识别

    Institute of Scientific and Technical Information of China (English)

    刘晓娜; 封志明; 姜鲁光

    2013-01-01

    . Based on Landsat remote sensing image data and MODIS-NDVI data, rubber plantations were extracted by the decision tree classification method in BRCLM using spectral features and texture characteristics. The results showed that: (1) On account of spectral differences between rubber forests at different growth stages, we were able to extract rubber plantations according to young rubber forest (<10 a) and mature rubber forest (≥10 a) respectively. The optimum temporal window to discriminate rubber plantations was from early January to late March, which is especially appropriate for mature rubber forest. Mature rubber forest, dry land with high vegetation cover, and forest land were prone to misclassification. Meanwhile, young rubber forest, tea plantation, shrubland and grassland were confused with each type in spectral characteristics according to the index of NDVI. (2) Based on the original spectral characteristics, normalized indices, K-T transform indices, and texture features, we established young rubber forest and mature rubber forest decision tree classification models respectively. The overall accuracy of the mature rubber forest went beyond 90%, and the young rubber forest beyond 75%, which meant that the decision tree method was better for mature rubber forest extraction. The rubber plantation distribution maps were obtained using the established decision tree models in 1980, 1990, and 2000 with high classification accuracy, which indicated that the models were simple and efficient for extracting rubber plantations in the tropical areas. This is an effective method for perennial vegetation extraction and classification accuracy verification. (3) From 1980 to 2010, the size of rubber plantations in BRCLM increased nearly nine times, from 705 km2 to 6 014 km2, and the expansion rate of the young rubber forest was faster than that of the mature rubber forest. National differences of rubber plantations in BRCLM were significant, and the cross-border planting

  13. Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness.

    Directory of Open Access Journals (Sweden)

    Lukas Tanner

    Full Text Available BACKGROUND: Dengue is re-emerging throughout the tropical world, causing frequent recurrent epidemics. The initial clinical manifestation of dengue often is confused with other febrile states confounding both clinical management and disease surveillance. Evidence-based triage strategies that identify individuals likely to be in the early stages of dengue illness can direct patient stratification for clinical investigations, management, and virological surveillance. Here we report the identification of algorithms that differentiate dengue from other febrile illnesses in the primary care setting and predict severe disease in adults. METHODS AND FINDINGS: A total of 1,200 patients presenting in the first 72 hours of acute febrile illness were recruited and followed up for up to a 4-week period prospectively; 1,012 of these were recruited from Singapore and 188 from Vietnam. Of these, 364 were dengue RT-PCR positive; 173 had dengue fever, 171 had dengue hemorrhagic fever, and 20 had dengue shock syndrome as final diagnosis. Using a C4.5 decision tree classifier for analysis of all clinical, haematological, and virological data, we obtained a diagnostic algorithm that differentiates dengue from non-dengue febrile illness with an accuracy of 84.7%. The algorithm can be used differently in different disease prevalence to yield clinically useful positive and negative predictive values. Furthermore, an algorithm using platelet count, crossover threshold value of a real-time RT-PCR for dengue viral RNA, and presence of pre-existing anti-dengue IgG antibodies in sequential order identified cases with sensitivity and specificity of 78.2% and 80.2%, respectively, that eventually developed thrombocytopenia of 50,000 platelet/mm(3 or less, a level previously shown to be associated with haemorrhage and shock in adults with dengue fever. CONCLUSION: This study shows a proof-of-concept that decision algorithms using simple clinical and haematological parameters

  14. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  15. A Study on Fraud Detection Based on Data Mining Using Decision Tree

    Directory of Open Access Journals (Sweden)

    A. N. Pathak

    2011-05-01

    Full Text Available Fraud is a million dollar business and it is increasing every year. The U.S. identity fraud incidence rate increased in 2008 returning to levels unseen since 2003. Almost 10 million Americans learned they were victims of identity (ID fraud in 2008 which is up from 8.1 million victims in 2007. More consumers are becoming identity (ID fraud victims reversing the previous trend in which identity (ID fraud had been gradually decreasing. This reverse makes sense since overall criminal activity tends to increase where there is a recession. Fraud involves one or more persons who intentionally act secretly to deprive another of something of value, for their own benefit. Fraud is as old as humanity itself and can take an unlimited variety of different forms. However, in recent years, the development of new technologies has also provided further ways in which criminals may commit fraud (Bolton and Hand 2002. In addition to that, business reengineering, reorganization or downsizing may weaken or eliminate control, while new information systems may present additional opportunities to commit fraud.

  16. CLOUD DETECTION BASED ON DECISION TREE OVER TIBETAN PLATEAU WITH MODIS DATA

    OpenAIRE

    Xu, L.; Fang, S; Niu, R.; Li, J

    2012-01-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision meth...

  17. Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree

    OpenAIRE

    Huo, Q.; Ma, B.

    1999-01-01

    We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-t...

  18. Real-time Container Transport Planning with Decision Trees based on Offline Obtained Optimal Solutions

    NARCIS (Netherlands)

    B. van Riessen (Bart); R.R. Negenborn (Rudy); R. Dekker (Rommert)

    2016-01-01

    textabstractHinterland networks for container transportation require planning methods in order to increase efficiency and reliability of the inland road, rail and waterway connections. In this paper we aim to derive real-time decision rules for suitable allocations of containers to inland services b

  19. Forecasting Reading Anxiety for Promoting English-Language Reading Performance Based on Reading Annotation Behavior

    Science.gov (United States)

    Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao

    2016-01-01

    To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…

  20. Predictors and patterns of problematic Internet game use using a decision tree model.

    Science.gov (United States)

    Rho, Mi Jung; Jeong, Jo-Eun; Chun, Ji-Won; Cho, Hyun; Jung, Dong Jin; Choi, In Young; Kim, Dai-Jin

    2016-09-01

    Background and aims Problematic Internet game use is an important social issue that increases social expenditures for both individuals and nations. This study identified predictors and patterns of problematic Internet game use. Methods Data were collected from online surveys between November 26 and December 26, 2014. We identified 3,881 Internet game users from a total of 5,003 respondents. A total of 511 participants were assigned to the problematic Internet game user group according to the Diagnostic and Statistical Manual of Mental Disorders Internet gaming disorder criteria. From the remaining 3,370 participants, we used propensity score matching to develop a normal comparison group of 511 participants. In all, 1,022 participants were analyzed using the chi-square automatic interaction detector (CHAID) algorithm. Results According to the CHAID algorithm, six important predictors were found: gaming costs (50%), average weekday gaming time (23%), offline Internet gaming community meeting attendance (13%), average weekend and holiday gaming time (7%), marital status (4%), and self-perceptions of addiction to Internet game use (3%). In addition, three patterns out of six classification rules were explored: cost-consuming, socializing, and solitary gamers. Conclusion This study provides direction for future work on the screening of problematic Internet game use in adults.

  1. Predictors and patterns of problematic Internet game use using a decision tree model.

    Science.gov (United States)

    Rho, Mi Jung; Jeong, Jo-Eun; Chun, Ji-Won; Cho, Hyun; Jung, Dong Jin; Choi, In Young; Kim, Dai-Jin

    2016-09-01

    Background and aims Problematic Internet game use is an important social issue that increases social expenditures for both individuals and nations. This study identified predictors and patterns of problematic Internet game use. Methods Data were collected from online surveys between November 26 and December 26, 2014. We identified 3,881 Internet game users from a total of 5,003 respondents. A total of 511 participants were assigned to the problematic Internet game user group according to the Diagnostic and Statistical Manual of Mental Disorders Internet gaming disorder criteria. From the remaining 3,370 participants, we used propensity score matching to develop a normal comparison group of 511 participants. In all, 1,022 participants were analyzed using the chi-square automatic interaction detector (CHAID) algorithm. Results According to the CHAID algorithm, six important predictors were found: gaming costs (50%), average weekday gaming time (23%), offline Internet gaming community meeting attendance (13%), average weekend and holiday gaming time (7%), marital status (4%), and self-perceptions of addiction to Internet game use (3%). In addition, three patterns out of six classification rules were explored: cost-consuming, socializing, and solitary gamers. Conclusion This study provides direction for future work on the screening of problematic Internet game use in adults. PMID:27499227

  2. Determination of fetal state from cardiotocogram using LS-SVM with particle swarm optimization and binary decision tree.

    Science.gov (United States)

    Yılmaz, Ersen; Kılıkçıer, Cağlar

    2013-01-01

    We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable classification accuracy rate of 91.62%.

  3. Analysis of Human Papillomavirus Using Datamining - Apriori, Decision Tree, and Support Vector Machine (SVM and its Application Field

    Directory of Open Access Journals (Sweden)

    Cho Younghoon

    2016-01-01

    Full Text Available Human Papillomavirus(HPV has various types (compared to other viruses and plays a key role in evoking diverse diseases, especially cervical cancer. In this study, we aim to distinguish the features of HPV of different degree of fatality by analyzing their DNA sequences. We used Decision Tree Algorithm, Apriori Algorithm, and Support Vector Machine in our experiment. By analyzing their DNA sequences, we discovered some relationships between certain types of HPV, especially on the most fatal types, 16 and 18. Moreover, we concluded that it would be possible for scientists to develop more potent HPV cures by applying these relationships and features that HPV virus exhibit.

  4. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  5. 胶东半岛果园TM影像信息的提取决策树方法%Decision tree classification of orchard information extraction from TM imagery in Jiaodong Peninsula of China

    Institute of Scientific and Technical Information of China (English)

    于新洋; 张安定; 侯西勇

    2012-01-01

    Decision tree classification is a kind of classification model which uses certain classification rules to gradually thin the research image. It has been widely used for information extraction from remote sensing images due to its goodness of intuitive and high efficiency. Jiaodong Peninsula is one of the most famous areas in China for the production of fruits; therefore, it is very significant to monitor the distribution of orchards. In this paper, the decision tree classification was used to extract the area of orchard in Jiaodong Peninsula. Specifically, Landsat5 TM image (path 120 row034, October24, 2005) was available and five most representative cities (Penglai, Longkou, Laizhou, Qixia, Zhaoyuan) were selected as the study area. It turned out that the decision tree classification had satisfactory performance, the classification results were acceptable and could be used as the original inputs for related researches.%本文选取胶东半岛最具代表性的5个果品县(市)为研究区,以Landsat TM影像数据为分类影像,尝试提取果园信息.选用可以“无缝”融入多种辅助信息的决策树分类方法,综合NDVI、地形地貌和缨帽变换等多种辅助信息,利用年内物候变化最大的果园与背景地物的光谱差异,进行果园信息提取;利用SPOT影像以及野外考察资料作为检验样本进行精度验证.表明综合多种辅助信息,利用决策树分类法提取TM影像果园信息可行且准确性较高.

  6. 基于优化的决策树算法在热轧工艺中的应用%Application of improved decision tree on the hot rolling process

    Institute of Scientific and Technical Information of China (English)

    钟蜜; 刘斌

    2011-01-01

    Decision tree classification method is a very effective machine learning methods, with a classification of high precision, good noise robustness of the data and the formation of the advantages of a tree model. The optimization of decision tree algorithms are mainly from the choice of the branch properties standards, decision tree pruning, and the introduction of fuzzy theory, rough set theory, genetic algorithm and neural network algorithms to optimize several aspects. This article introduces the properties of rough set theory, the importance of the principle to optimize the decision tree, first calculated for each condition attribute importance to classification, and then importance sample set size of a filter, without prejudice to the classification accuracy rate while reducing the size of tree. The algorithm in Visual C + + 6. 0 programming environment, and is applied to hot rolling model, data processing by hot rolling to verify the validity of the algorithm.%决策树分类方法是一种非常有效的机器学习方法,具有分类精度高、对噪声数据有很好的健壮性以及形成树状模式等优点,对决策树算法的优化也主要是从分支属性的选择标准,对决策树的修剪,以及引入模糊理论、粗糙集理论、遗传算法和神经网络算法等几个方面进行优化.引入粗糙集理论中的属性重要性原理来对决策树进行优化,首先计算出每个条件属性对分类的重要度,然后根据重要度大小来对样本集进行一个筛选,在不损害分类准确率的同时减小决策树的规模.整个算法在Visual C++6.0环境下编程实现,并应用于热轧工艺模型中,通过对热轧数据的处理,验证了算法的有效性.

  7. Network Traffic Classification Using SVM Decision Tree%基于SVM决策树的网络流量分类

    Institute of Scientific and Technical Information of China (English)

    邱婧; 夏靖波; 柏骏

    2012-01-01

    In order to solve the unrecognized area and long training time problems existed when using Support Vector Machine ( SVM) method in network traffic classification, SVM decision tree was used in network traffic classification by using its advantages in multi-class classification. The authoritative flow data sets were tested. The experiment results show that SVM decision tree method has shorter training time and better classification performance than ordinary "one-on-one" and "one-on-more"SVM method in network traffic classification, whose classification accuracy rate can reach 98. 8%.%提出一种用支持向量机(SVM)决策树来对网络流量进行分类的方法,利用SVM决策树在多类分类方面的优势,解决SVM在流量分类中存在的无法识别区域和训练时间较长的问题.对权威流量数据集进行了测试,实验结果表明,SVM决策树在流量分类中比普通的“一对一”和“一对多”SVM方法具有更短的训练时问和更好的分类性能,分类准确率可以达到98.8%.

  8. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  9. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  10. Comparison between SARS CoV and MERS CoV Using Apriori Algorithm, Decision Tree, SVM

    Directory of Open Access Journals (Sweden)

    Jang Seongpil

    2016-01-01

    Full Text Available MERS (Middle East Respiratory Syndrome is a worldwide disease these days. The number of infected people is 1038(08/03/2015 in Saudi Arabia and 186(08/03/2015 in South Korea. MERS is all over the world including Europe and the fatality rate is 38.8%, East Asia and the Middle East. The MERS is also known as a cousin of SARS (Severe Acute Respiratory Syndrome because both diseases show similar symptoms such as high fever and difficulty in breathing. This is why we compared MERS with SARS. We used data of the spike glycoprotein from NCBI. As a way of analyzing the protein, apriori algorithm, decision tree, SVM were used, and particularly SVM was iterated by normal, polynomial, and sigmoid. The result came out that the MERS and the SARS are alike but also different in some way.

  11. 采用决策树分类方法进行煤矸石信息提取研究%Research on using the decision tree classification method to extract coal gangue information

    Institute of Scientific and Technical Information of China (English)

    冯稳; 张志; 乌云其其格; 孟丹

    2011-01-01

    利用遥感技术快速、准确地调查煤矸石堆分布情况,对预防地质灾害以及保护生态环境和居民生命财产安全有着重要的指导意义.基于TM多光谱影像,运用知识决策树分类方法对江西萍乡煤矿区进行煤矸石信息提取试验.首先,在研究区背景知识的基础下,统计分析矿区内煤矸石及其他典型地物在影像上的光谱特征,建立了研究区的分类知识库;其次,在决策树分类模型支撑下,分别运用归一化差异植被指数、改进型归一化差异水体指数以及光谱阈值法对图像进行分类;最后,利用地学知识和几何特征进行分类后处理,分类精度达到82.97%.试验表明,该方法适用于煤矸石信息的自动提取,结合目视解译方法,可以提高解译的效率及准确度.%Using remote sensing technique to survey coal gangue' s distribution quickly and accurately has important guiding significance for the prevention of geological disasters and the protection of the ecological environment and residents' life and property securities. Based on TM multi-spectral image, it is adopted the decision tree classification method to extract Pingxiang coal mining area' coal gangue information in Jiangxi Province. Firstly, under the foundation of study area' s background knowledge, counted and analyzed the area' s coal gangue' s and other typical surface objects' spectral characteristics in RS image, then established the study area' s classification databases.Secondly, on the support of the decision tree classification model, used Normalized Difference Vegetation Index,Modified Normalized Difference Water Index and Spectrum Threshold Method to classify the image respectively. Ultimately, post-process the classified image by using geological knowledge and geometric feature. The total classification accuracy was up to 82. 97%. The experiment demonstrates that this method is suitable for coal gangue information's automatic extraction

  12. Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules

    Directory of Open Access Journals (Sweden)

    G.MeeraGandhi

    2010-11-01

    Full Text Available In the era of information society, computer networks and their related applications are the emerging technologies. Network Intrusion Detection aims at distinguishing the behavior of the network. As the network attacks have increased in huge numbers over the past few years, Intrusion Detection System (IDS is increasingly becoming a critical component to secure the network. Owing to large volumes of security audit data in a network in addition to intricate and vibrant properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, the field of machine learning attempts to characterize how such changes can occur by designing, implementing, running, and analyzing algorithms that can be run on computers. The discipline draws on ideas, with the goal of understanding the computational character of learning. Learning always occurs in the context of some performance task, and that a learning method should always be coupled with a performance element that uses the knowledge acquired during learning. In this research, machine learning is being investigated as a technique for making the selection, using as training data and their outcome. In this paper, we evaluate the performance of a set of classifier algorithms of rules (JRIP, Decision Tabel, PART, and OneR and trees (J48, RandomForest, REPTree, NBTree. Based on the evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The empirical simulation result shows the comparison between the noticeable performance improvements. The classification models were trained using the data collected from Knowledge Discovery Databases (KDD for Intrusion Detection. The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The

  13. Fuzzy Decision Trees with Possibility Distributions as Output%输出为可能性分布的模糊决策树

    Institute of Scientific and Technical Information of China (English)

    袁修久; 张文修

    2003-01-01

    More than one possible classifications for a given instance is supposed. A possibility distribution is assigned at a terminal node of a fuzzy decision tree. The possibility distribution of given instance with known value of attributes is determined by using simple fuzzy reasoning. The inconsistency in determining a single class for a given instance diminishes here.

  14. Data Optimization with Multilayer Perceptron Neural Network and Using New Pattern in Decision Tree Comparatively

    Directory of Open Access Journals (Sweden)

    Murat Kayri

    2010-01-01

    Full Text Available Problem statement: The aim of the present study is to exemplify the use of Artificial Neural Networks (ANN for parameter prediction. Missing value or unreal approach to some questions in scale is a problem for unbiased findings. To learn a real pattern with ANN provides robust and unbiased parameter estimation. Approach: To this end, data was collected from 906 students using ?Scale of student views about the expected situations and the current expectations from their families during learning process? for the study entitled ?Student views about the expected situations and the current expectations from their families during learning process?. In the study, first the initial data set gathered using the measurement tool and the new data set produced by Multi-Layer Receptors algorithm, which was considered as the highest predictive level of ANN for the research were individually analyzed by Chaid analysis and the results of the two analyses were compared. Results: The findings showed that as a result of Chaid analysis with the initial data set the variable ?education level of mother? had a considerable effect on total score dependent variable, while ?education level of father? was the influential variable on the attitude level in the data set predicted by ANN, unlike the previous model. Conclusion/Recommendations: The findings of the research show Artificial Neural Networks could be used for parameter estimation in cause-effect based studies. It is also thought the research will contribute to extensive use of advanced statistical methods.

  15. Malware propagation modeling by the means of genetic algorithms

    OpenAIRE

    Goranin, N.; Čenys, A.

    2008-01-01

    Existing malware propagation models mainly concentrate to forecasting the number of infected computers in the initial propagation phase. In this article we propose a genetic algorithm based model for estimating the propagation rates of known and perspective Internet worms after their propagation reaches the satiation phase. Estimation algorithm is based on the known worms’ propagation strategies with correlated propagation rates analysis and is presented as a decision tree, generated by GAtre...

  16. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    Institute of Scientific and Technical Information of China (English)

    潘大胜; 屈迟文

    2016-01-01

    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  17. Study on Acoustic Modeling in a Mandarin Continuous Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    PENG Di; LIU Gang; GUO Jun

    2007-01-01

    The design of acoustic models is of vital importance to build a reliable connection between acoustic waveform and linguistic messages in terms of individual speech units. According to the characteristic of Chinese phonemes,the base acoustic phoneme units set is decided and refined and a decision tree based state tying approach is explored.Since one of the advantages of top-down tying method is flexibility in maintaining a balance between model accuracy and complexity, relevant adjustments are conducted, such as the stopping criterion of decision tree node splitting, during which optimal thresholds are captured. Better results are achieved in improving acoustic modeling accuracy as well as minimizing the scale of the model to a trainable extent.

  18. Application of decision trees to the analysis of soil radon data for earthquake prediction.

    Science.gov (United States)

    Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  19. Application of decision trees to the analysis of soil radon data for earthquake prediction

    Energy Technology Data Exchange (ETDEWEB)

    Zmazek, B. E-mail: boris.zmazek@ijs.si; Todorovski, L.; Dzeroski, S.; Vaupotic, J.; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  20. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    Science.gov (United States)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  1. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    OpenAIRE

    Jörg Huwyler; Felix Hammann; Claudia Suenderhauf

    2012-01-01

    Predicting blood-brain barrier (BBB) permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was co...

  2. Tumor Regression Grades: Can They Influence Rectal Cancer Therapy Decision Tree?

    OpenAIRE

    Marisa D. Santos; Cristina Silva; Anabela Rocha; Eduarda Matos; Carlos Nogueira; Carlos Lopes

    2013-01-01

    Background. Evaluating impact of tumor regression grade in prognosis of patients with locally advanced rectal cancer (LARC). Materials and Methods. We identified from our colorectal cancer database 168 patients with LARC who received neoadjuvant therapy followed by complete mesorectum excision surgery between 2003 and 2011: 157 received 5-FU-based chemoradiation (CRT) and 11 short course RT. We excluded 29 patients, the remaining 139 were reassessed for disease recurrence and survival; the sl...

  3. Credit Card Fraud Detection using Decision Tree for Tracing Email and IP

    Directory of Open Access Journals (Sweden)

    Gayathiri.P

    2012-09-01

    Full Text Available Credit card fraud is a wide-ranging term for theft and fraud committed using a credit card or any similar payment mechanism as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. Transactions completed with credit cards seem to become more and more popular with the introduction of online shopping and banking. Correspondingly, the number of credit card frauds has also increased .Currently; data mining is a popular way to combat frauds because of its effectiveness. Data mining is a well-defined procedure that takes data as input and produces output in the forms of models or patterns. In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses. Frauds has also increased .Currently, data mining is a popular way to combat frauds because of its effectiveness. Data mining is a well-defined procedure that takes data as input and produces output in the forms of models or patterns. In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses.

  4. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  5. Assessment of Poor College Student in Guizhou Province via C4.5 Decision Tree%一种基于C4.5决策树的贵州省高校贫困生评定方法

    Institute of Scientific and Technical Information of China (English)

    李明江; 卢玉; 刘彦

    2013-01-01

    A C4.5 decision tree based assessment approach of poor college students in Guizhou province was proposed in this paper. Firstly the index system was established from the consumer behavior of students, the economic condition of their families and their work-study status. Secondly, 15 indexes are taken as the attributes of data to be classified by C4.5 decision tree, and the continuous attributes are discrete according to the information gain-ratio of attributes. The tree is pruned using the prediction error to obtain the four most important attributes to characterize the poor students. Fi-nally some real data is used to validate the efficency of our proposed method, and the experiments results show that it is of simple principle, and cgariacteristic of rapid and accurate calaulation. Compare with its counterparts, it not only does not rely on the statitical distribution of data, but also need not choose the model parameters, so it is an efficient technolo-gy for assement of poor college students.%  提出了一种基于C4.5决策树的贵州省高校贫困生评定方法。首先从贵州省大学生的消费行为、家庭情况、贷款与助学行为3个方面建立了大学生贫困资格评定的指标体系;其次,将获得的15项指标作为C4.5决策树的特征属性,基于信息增益率完成对连续变量的离散化处理,将知识表示成树的形式,采用错误预测率进行修剪,得到了影响贫困学生评定的4个最重要变量;最后将该方法进行实证分析。结果显示,它不仅原理简单,解释直观,而且计算快速准确。相比同类方法,它不依赖于数据的统计分布,也不需要选择模型参数,是一种有效的高校贫困生分类评定技术。

  6. Nosocomial infections in Brazilian pediatric patients: using a decision tree to identify high mortality groups.

    Science.gov (United States)

    Lopes, Julia M M; Goulart, Eugenio M A; Siqueira, Arminda L; Fonseca, Inara K; Brito, Marcus V S de; Starling, Carlos E F

    2009-04-01

    Nosocomial infections (NI) are frequent events with potentially lethal outcomes. We identified predictive factors for mortality related to NI and developed an algorithm for predicting that risk in order to improve hospital epidemiology and healthcare quality programs. We made a prospective cohort NI surveillance of all acute-care patients according to the National Nosocomial Infections Surveillance System guidelines since 1992, applying the Centers for Disease Control and Prevention 1988 definitions adapted to a Brazilian pediatric hospital. Thirty-eight deaths considered to be related to NI were analyzed as the outcome variable for 754 patients with NI, whose survival time was taken into consideration. The predictive factors for mortality related to NI (p Cox regression model) were: invasive procedures and use of two or more antibiotics. The mean survival time was significantly shorter (p patients who suffered invasive procedures and for those who received two or more antibiotics. Applying a tree-structured survival analysis (TSSA), two groups with high mortality rates were identified: one group with time from admission to the first NI less than 11 days, received two or more antibiotics and suffered invasive procedures; the other group had the first NI between 12 and 22 days after admission and was subjected to invasive procedures. The possible modifiable factors to prevent mortality involve invasive devices and antibiotics. The TSSA approach is helpful to identify combinations of predictors and to guide protective actions to be taken in continuous-quality-improvement programs. PMID:20140354

  7. Tumor Regression Grades: Can They Influence Rectal Cancer Therapy Decision Tree?

    Directory of Open Access Journals (Sweden)

    Marisa D. Santos

    2013-01-01

    Full Text Available Background. Evaluating impact of tumor regression grade in prognosis of patients with locally advanced rectal cancer (LARC. Materials and Methods. We identified from our colorectal cancer database 168 patients with LARC who received neoadjuvant therapy followed by complete mesorectum excision surgery between 2003 and 2011: 157 received 5-FU-based chemoradiation (CRT and 11 short course RT. We excluded 29 patients, the remaining 139 were reassessed for disease recurrence and survival; the slides of surgical specimens were reviewed and classified according to Mandard tumor regression grades (TRG. We compared patients with good response (Mandard TRG1 or TRG2 versus patients with bad response (Mandard TRG3, TRG4, or TRG5. Outcomes evaluated were 5-year overall survival (OS, disease-free survival (DFS, local, distant and mixed recurrence. Results. Mean age was 64.2 years, and median followup was 56 months. No statistically significant survival difference was found when comparing patients with Mandard TRG1 versus Mandard TRG2 (. Mandard good responders (TRG1 + 2 have significantly better OS and DFS than Mandard bad responders (TRG3 + 4 + 5 (OS ; DFS . Conclusions. Mandard good responders had a favorable prognosis. Tumor response (TRG to neoadjuvant chemoradiation should be taken into account when defining the optimal adjuvant chemotherapy regimen for patients with LARC.

  8. Nosocomial infections in brazilian pediatric patients: using a decision tree to identify high mortality groups

    Directory of Open Access Journals (Sweden)

    Julia M.M. Lopes

    2009-04-01

    Full Text Available Nosocomial infections (NI are frequent events with potentially lethal outcomes. We identified predictive factors for mortality related to NI and developed an algorithm for predicting that risk in order to improve hospital epidemiology and healthcare quality programs. We made a prospective cohort NI surveillance of all acute-care patients according to the National Nosocomial Infections Surveillance System guidelines since 1992, applying the Centers for Disease Control and Prevention 1988 definitions adapted to a Brazilian pediatric hospital. Thirty-eight deaths considered to be related to NI were analyzed as the outcome variable for 754 patients with NI, whose survival time was taken into consideration. The predictive factors for mortality related to NI (p < 0.05 in the Cox regression model were: invasive procedures and use of two or more antibiotics. The mean survival time was significantly shorter (p < 0.05 with the Kaplan-Meier method for patients who suffered invasive procedures and for those who received two or more antibiotics. Applying a tree-structured survival analysis (TSSA, two groups with high mortality rates were identified: one group with time from admission to the first NI less than 11 days, received two or more antibiotics and suffered invasive procedures; the other group had the first NI between 12 and 22 days after admission and was subjected to invasive procedures. The possible modifiable factors to prevent mortality involve invasive devices and antibiotics. The TSSA approach is helpful to identify combinations of predictors and to guide protective actions to be taken in continuous-quality-improvement programs.

  9. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    Science.gov (United States)

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation.

  10. Decision tree analysis to assess the cost-effectiveness of yttrium microspheres for treatment of hepatic metastases from colorectal cancer

    International Nuclear Information System (INIS)

    Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  11. Recent advances using rodent models for predicting human allergenicity

    International Nuclear Information System (INIS)

    The potential allergenicity of newly introduced proteins in genetically engineered foods has become an important safety evaluation issue. However, to evaluate the potential allergenicity and the potency of new proteins in our food, there are still no widely accepted and reliable test systems. The best-known allergy assessment proposal for foods derived from genetically engineered plants was the careful stepwise process presented in the so-called ILSI/IFBC decision tree. A revision of this decision tree strategy was proposed by a FAO/WHO expert consultation. As prediction of the sensitizing potential of the novel introduced protein based on animal testing was considered to be very important, animal models were introduced as one of the new test items, despite the fact that non of the currently studied models has been widely accepted and validated yet. In this paper, recent results are summarized of promising models developed in rat and mouse

  12. Gene function classification using Bayesian models with hierarchy-based priors

    Directory of Open Access Journals (Sweden)

    Neal Radford M

    2006-10-01

    Full Text Available Abstract Background We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs from the E. coli genome. Results The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.

  13. Comparison of tree types of models for the prediction of final academic achievement

    Directory of Open Access Journals (Sweden)

    Silvana Gasar

    2002-12-01

    Full Text Available For efficient prevention of inappropriate secondary school choices and by that academic failure, school counselors need a tool for the prediction of individual pupil's final academic achievements. Using data mining techniques on pupils' data base and expert modeling, we developed several models for the prediction of final academic achievement in an individual high school educational program. For data mining, we used statistical analyses, clustering and two machine learning methods: developing classification decision trees and hierarchical decision models. Using an expert system shell DEX, an expert system, based on a hierarchical multi-attribute decision model, was developed manually. All the models were validated and evaluated from the viewpoint of their applicability. The predictive accuracy of DEX models and decision trees was equal and very satisfying, as it reached the predictive accuracy of an experienced counselor. With respect on the efficiency and difficulties in developing models, and relatively rapid changing of our education system, we propose that decision trees are used in further development of predictive models.

  14. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-07-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  15. 决策树在居民就诊影响因素研究中的应用%Application of decision tree in study of factors affecting residential medical treatment service

    Institute of Scientific and Technical Information of China (English)

    刘海霞; 钟晓妮; 周燕荣; 田考聪

    2011-01-01

    目的了解影响重庆地区居民就诊服务的主要影响因素,满足更多居民卫生服务需求,提高卫生服务利用率.方法 针对重庆地区不同人群的影响因素,采取不同的卫生政策,构建影响居民就诊率的决策树模型.结果 调查的11 570名居民中,合计就诊人次为2 447人次,平均就诊次数2.1次,两周就诊率为21.15%(城市为12.58%、农村为29.19%),高于全国平均水平,而各年龄段的就诊率呈现中间低两端高的趋势,各年龄段就诊率差异有统计学意义(P<0.05);从决策树模型来看,此决策树共有17个节点,对应17条分类规则,树的根节点为职业类型,此变量对就诊率的影响最大,职业类型、年龄、居民类型、参保情况以及家庭年收入对居民就诊影响较大,且所选出的影响因素对不同人群的影响不同.结论 重庆地区居民就诊卫生服务利用相对较高,且不同人群的影响因素不同,因此,在制订卫生服务规划时应针对不同人群提出相应的卫生政策.%Objective To better know main factors affecting the treatment service of residents to meet the demands of health service of more residents and improve, health service the utilization.Methods Aiming at the different affecting factors of different crowds in Chongqing area and adopting different health polices, the decision tree model affecting the rate of residential seeking medical care was constracted.Results Of 11 570 residents receiving investigation,there were 2 447 person seeing the doctors in total,2.1 times on average, the rate of 2-week seeking medical care was 21.15% (12.58% in city and 29.19% in rural areas) , which was higher than national average.However,the seeking medical care rate for each age section showed the tendency that the middle part was lower than both ends.There were statistical differences for treatment rate of each age section.As far as decision tree model was concerned,there were 17 nodal points in the

  16. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    OpenAIRE

    Wided Khiari

    2013-01-01

    This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through th...

  17. Establishing diagnostic platform for environmental biosafety assessment of genetically modified plants based on the decision-tree method

    OpenAIRE

    Lei Wang; Chao Yang; Bao-Rong Lu

    2010-01-01

    Transgenic biotechnology and its products provide important solutions for the great challenge of global food security. Biosafety assessment of genetically modified organisms (GMOs) including their food and environmental safety is a prerequisite for the commercialization and safe application of transgenic biotechnologyproducts. However, existing methodologies cannot meet the urgent requirements for rapid biosafety assessment of the increasing number of new and sophisticated GMOs. Therefore, a ...

  18. Millon´s Personality Model and ischemic cardiovascular acute episodes: Profiles of risk in a decision tree

    Directory of Open Access Journals (Sweden)

    María M. Richard's

    2008-01-01

    Full Text Available La identificación de subgrupos de riesgo permite a los psicólogos clínicos desarrollar intervenciones específicas para esos subgrupos. El principal propósito de este trabajo fue encontrar asociaciones estadísticas entre características de personalidad -rasgos y trastornos- y la existencia de episodios isquémicos cardiovasculares agudos según el modelo de personalidad de Theodore Millon. Los análisis del presente estudio se fundamentaron en una muestra de 313 mujeres y hombres entre 31 y 80 años de edad, divididos en dos grupos: un grupo clínico formado por 143 participantes internados a causa de episodios isquémicos cardiovasculares agudos y un grupo control constituido por 170 personas sin antecedentes de enfermedades cardiovasculares. Los resultados mostraron cuatro perfiles de riesgo de personalidad asociados con la existencia de episodios isquémicos agudos y, por tanto, esto posibilita a los psicólogos clínicos el diseño de intervenciones específicas para aquellos subgrupos.

  19. 决策树方法在网球训练中的应用%Application of the Decision Tree in Tennis Trainings

    Institute of Scientific and Technical Information of China (English)

    冯能山; 龙超; 熊金志; 廖国君

    2014-01-01

    数据挖掘在体育领域的应用还比较少。如何利用好体育运动的训练数据,从中挖掘出有用信息,是数据挖掘技术在体育领域中的一项重要任务。决策树方法是一种常用的数据挖掘技术,该文把决策树方法应用于网球训练,对有关数据进行挖掘,形成一棵网球训练的决策树,从而协助体育工作人员更合理地制定网球训练方案,提高网球训练的效率。%Nowadays it is still relatively rare to see the applications of data mining in the field of sports. However, applying data mining in sports can facilitate a more efficient way to use the data of sports training by digging out the relevant information. In this paper, the decision tree approach is applied in the tennis training to form a decision tree through digging out the relevant data. As a result, the application helps the staffs of sports to make a more rational tennis training program whereas the efficiency of ten-nis training can be improved.

  20. Application of analyzing influencing factors of life pressure in college students by decision tree%决策树分析在高校大学生生活压力影响因素分析中的应用

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

    Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions and poor interpersonal were largest (68.84%). Conclusions Combine with the characteristic of these different sub-health group when we develop mental health education and guiding. Specially, pay more attention to those poor interpersonal relationships, poor economic conditions and the only child college students.%  [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关

  1. 决策树分析在高校大学生生活压力影响因素分析中的应用%Application of analyzing influencing factors of life pressure in college students by decision tree

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

      [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关系;家庭子女数;兼职情况.经济情况差、人际关系差的大学生生活压力所占的比例最大(68.84%).[结论]开展大学生心理健康教育和指导时,要结合不同亚群人群的特点,有针对性开展;要特别关注人际关系差、经济情况差或独生子女的大学生.%Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions

  2. Prediction of Frost Occurrences Using Statistical Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Hyojin Lee

    2016-01-01

    Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.

  3. USING AN ACTIVE FUZZY ECA RULE -BASED NEGOTIATION AGENT IN E-COMMERCE

    Directory of Open Access Journals (Sweden)

    Farnaz Mahan

    2011-12-01

    Full Text Available E-commerce is considered a key service within modern information society, and the idea of automating e-commerce transactions has attracted much interest in recent years. A multi-agent model is a system that applies various autonomous agents to accomplish specified goals. Such a system addresses resource allocation issues. Because the nature of resource trading requires multiple agents to request geographically dispersed heterogeneous resources, we use a multi-agent architecture for e-commerce because each agent can be describe each participant intelligently. In this paper, negotiation agents based on fuzzy ECA rule-based proposed. Here we focus on agents in e-commerce that negotiate between sellers and buyers in order to get the best deal. The negotiation process between buyers and its sellers begins through combined and fairness protocols. We add learning properties to agents based on a fuzzy decision tree to develop negotiation skills and present the results. Using a fuzzy decision tree helps us understand and adapt other agents’ behavior and real-time world conditions in order to produce the best contracts. Thus, the agent can improve in terms of skills on negotiation by updating its fuzzy decision tree.

  4. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with rresources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  5. Establishment of the threshold of toxicological concern with decision tree approach and its application in food contact materials%毒理学关注阈值方法的建立及其在食品接触材料评估中的应用

    Institute of Scientific and Technical Information of China (English)

    隋海霞; 张磊; 毛伟峰; 李建文; 刘爱东; 刘兆平

    2012-01-01

    目的 以邻苯二甲酸二(2-乙基己基)酯(DEHP)为模式化学物,建立可用于食品接触材料的毒理学关注阈值(TTC)风险评估方法.方法 建立基于Cramer结构分类的TTC决策树方法;利用Cramer结构分类流程和Toxtree软件对DEHP进行Cramer结构分类;利用2002年中国居民营养与健康状况调查数据和部分食物中DEHP的监测数据,估计我国不同年龄组人群通过饮料、植物油、发酵乳、方便面、果冻、果酱的DEHP暴露量,并按照TTC决策树方法对DEHP进行风险评估;同时,采用传统的风险评估方法进行验证.结果 DEHP属于Cramer Ⅰ类结构,其对应的TTC阈值为30μg/kg BW.我国居民的DEHP最大暴露量为4.06 μg/kg BW,4个年龄组的最大暴露量为11.10 μg/kg BW,分别占DEHP TTC阈值的13.5%和37.0%.按照DEHP的健康指导值——每日耐受摄入量(TDI)(50 μg/kg BW)计算,全人群和4个年龄组的最大暴露量分别占TDI的8.1%和22.2%,两种方法的风险评估结果基本一致.结论 TTC决策树方法是一种有效的风险评估工具,可用于食品接触材料的优先筛选和初步评估.我国居民膳食DEHP的健康风险较低,不需要引起健康关注.%Objective To establish the threshold of toxicological concern (TTC) approach and to apply it in the risk assessment of bis (2-ethylhexyl) phthalate (DEHP) as a model chemical for food contact materials. Methods TTC decision tree approach was established and DEHP was classified into Cramer systems based on both Cramer schematic diagram and Toxtree software to classify DEHP into Cramer systems. DEHP exposure in general population as well as in four age population groups was estimated by using data from the Chinese National Nutrition and Health Survey and data from DEHP surveillance on beverage, vegetable oil, fermented milk, instant noodle, fruit, vegetable-based jelly and fruit jam in China. TTC decision tree approach was used for risk assessment and the

  6. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    Directory of Open Access Journals (Sweden)

    S. Galelli

    2013-02-01

    Full Text Available Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees, in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART; (ii is computationally very efficient; and, (iii allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore and Canning River (Western Australia representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5 and parametric data-driven approaches (ANNs and multiple linear regression. Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5 in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  7. A Dynamic Web Page Prediction Model Based on Access Patterns to Offer Better User Latency

    CERN Document Server

    Mukhopadhyay, Debajyoti; Saha, Dwaipayan; Kim, Young-Chon

    2011-01-01

    The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer limited performance improvement because it is difficult for caching to handle the large number of increasingly diverse files. Studies have been conducted on prefetching models based on decision trees, Markov chains, and path analysis. However, the increased uses of dynamic pages, frequent changes in site structure and user access patterns have limited the efficacy of these static techniques. In this paper, we have proposed a methodology to cluster related pages into different categories based on the access patterns. Additionally we use page ranking to build up our prediction model at the initial stages when users haven't already started sending requests. This way we have tried to overcome the problems of maintaining huge databases which is needed in case of log based techn...

  8. Modelos matemáticos para la evaluación económica: los modelos dinámicos basados en ecuaciones diferenciales Mathematical models for economic evaluation: dynamic models based on differential equations

    Directory of Open Access Journals (Sweden)

    Roberto Pradas Velasco

    2009-10-01

    Full Text Available La utilización conjunta de árboles de decisión y modelos epidemiológicos basados en ecuaciones diferenciales es un método apropiado para la evaluación económica de medidas profilácticas ante enfermedades infecciosas. Estos modelos permiten combinar el comportamiento dinámico de la enfermedad con el consumo de recursos sanitarios. Para ilustrar este tipo de modelos se ajusta un sistema dinámico de ecuaciones diferenciales al comportamiento epidémico de la gripe en España, con el fin de proyectar el impacto epidemiológico de la vacunación antigripal. Los resultados del modelo dinámico se implementan en un diagrama con estructura de árbol para medir el consumo de recursos sanitarios y su repercusión en términos monetarios.The joint utilization of both decision trees and epidemiological models based on differential equations is an appropriate method for the economic evaluation of preventative interventions applied to infectious diseases. These models can combine the dynamic pattern of the disease together with health resource consumption. To illustrate this type of model, we adjusted a dynamic system of differential equations to the epidemic behavior of influenza in Spain, with a view to projecting the epidemiologic impact of influenza vaccination. The results of the epidemic model are implemented in a diagram with the structure of a decision tree so that health resource consumption and the economic implications can be calculated.

  9. Realization and Application of Customer Attrition Early Warning Model in Security Company

    Directory of Open Access Journals (Sweden)

    Shen Yizhen

    2012-09-01

    Full Text Available In this paper, we propose the customer attrition early warning model based on data warehouse and data mining technologies, which is achieved and applied in our security company. The modeling variables can be selected by means of the combination with decision tree and the gradual regression in Logistic regression. Then customer attrition early warning model can be constructed based on Logistic regression. The results show that the model can strongly promote the customer attrition capturing rate, push on the building of the company customer marketing management and customer service management organization, and economize the marketing cost. The company profits promotion and trade competitive power can be promised.

  10. An Assessment of the Effectiveness of Tree-Based Models for Multi-Variate Flood Damage Assessment in Australia

    Directory of Open Access Journals (Sweden)

    Roozbeh Hasanzadeh Nafari

    2016-07-01

    Full Text Available Flood is a frequent natural hazard that has significant financial consequences for Australia. In Australia, physical losses caused by floods are commonly estimated by stage-damage functions. These methods usually consider only the depth of the water and the type of buildings at risk. However, flood damage is a complicated process, and it is dependent on a variety of factors which are rarely taken into account. This study explores the interaction, importance, and influence of water depth, flow velocity, water contamination, precautionary measures, emergency measures, flood experience, floor area, building value, building quality, and socioeconomic status. The study uses tree-based models (regression trees and bagging decision trees and a dataset collected from 2012 to 2013 flood events in Queensland, which includes information on structural damages, impact parameters, and resistance variables. The tree-based approaches show water depth, floor area, precautionary measures, building value, and building quality to be important damage-influencing parameters. Furthermore, the performance of the tree-based models is validated and contrasted with the outcomes of a multi-parameter loss function (FLFArs from Australia. The tree-based models are shown to be more accurate than the stage-damage function. Consequently, considering more parameters and taking advantage of tree-based models is recommended. The outcome is important for improving established Australian flood loss models and assisting decision-makers and insurance companies dealing with flood risk assessment.

  11. Network Traffic Anomalies Identification Based on Classification Methods

    Directory of Open Access Journals (Sweden)

    Donatas Račys

    2015-07-01

    Full Text Available A problem of network traffic anomalies detection in the computer networks is analyzed. Overview of anomalies detection methods is given then advantages and disadvantages of the different methods are analyzed. Model for the traffic anomalies detection was developed based on IBM SPSS Modeler and is used to analyze SNMP data of the router. Investigation of the traffic anomalies was done using three classification methods and different sets of the learning data. Based on the results of investigation it was determined that C5.1 decision tree method has the largest accuracy and performance and can be successfully used for identification of the network traffic anomalies.

  12. Decision Tree Technology Application in the Clients Division of Hospital%决策树技术在医院住院客户划分中的应用

    Institute of Scientific and Technical Information of China (English)

    罗强

    2011-01-01

    The history of the provincial MCH hospital business data as a sample,through data mining decision tree modeling method to build their hospital customers into the model,to get classification rules,on this basis,the customer will be divided into different patient groups.Through the division of customers and characteristics analysis,the hospital can be a clear understanding of key customers and key customer groups to provide customized according to need personalized service,which will greatly enhance this part of the customer loyalty and satisfaction,to ensure the hospital mainstream profits and long-term source of income and stability.%本文以省妇幼保健院历史的住院业务数据为样本,通过数据挖掘的决策树建模方法建立其住院客户的划分模型,得到分类规则,在此基础上将住院客户划分为不同的群体。通过对客户的划分及其特征分析,医院可清楚的了解重点客户并给予重点客户群体以按需要定制的个性化服务,这将极大提升这部分客户的忠诚度和满意度,从而确保医院主流利润和收入来源的长期性和稳定性。

  13. Research of H5N6 Treatment by Comparing with H6N1 and H10N8 by Using Decision Tree and Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Kim Sunghyun

    2016-01-01

    Full Text Available Since 2003, 608 people in 15 countries have infected with human-infectious AI viruses and 359 of them died. Especially, in China, H6N1 and H10N8 viruses were wide-spread and a lot of people were infected and died. Recently, H5N6 virus emerged in China and the number of patients has been increasing gradually. Therefore, this research compared amino acid strain of Matrix Protein, Hemagglutinin, Neuraminidase and Nucleoprotein of H5N6, H6N1 and H10N8, by using Decision tree and Apriori Algorithm, to figure out their similarity and devise the treatment. In result, Matrix protein and Nucleoprotein sequences of H5N6 were similar with those of H6N1 and H10N8. Therefore, this research concluded that the treatment targeting those proteins of H6N1 and H10N8 will be also effective to H5N6.

  14. EVFDT: An Enhanced Very Fast Decision Tree Algorithm for Detecting Distributed Denial of Service Attack in Cloud-Assisted Wireless Body Area Network

    Directory of Open Access Journals (Sweden)

    Rabia Latif

    2015-01-01

    Full Text Available Due to the scattered nature of DDoS attacks and advancement of new technologies such as cloud-assisted WBAN, it becomes challenging to detect malicious activities by relying on conventional security mechanisms. The detection of such attacks demands an adaptive and incremental learning classifier capable of accurate decision making with less computation. Hence, the DDoS attack detection using existing machine learning techniques requires full data set to be stored in the memory and are not appropriate for real-time network traffic. To overcome these shortcomings, Very Fast Decision Tree (VFDT algorithm has been proposed in the past that can handle high speed streaming data efficiently. Whilst considering the data generated by WBAN sensors, noise is an obvious aspect that severely affects the accuracy and increases false alarms. In this paper, an enhanced VFDT (EVFDT is proposed to efficiently detect the occurrence of DDoS attack in cloud-assisted WBAN. EVFDT uses an adaptive tie-breaking threshold for node splitting. To resolve the tree size expansion under extreme noise, a lightweight iterative pruning technique is proposed. To analyze the performance of EVFDT, four metrics are evaluated: classification accuracy, tree size, time, and memory. Simulation results show that EVFDT attains significantly high detection accuracy with fewer false alarms.

  15. Fault Detection and Diagnosis for Gas Turbines Based on a Kernelized Information Entropy Model

    Directory of Open Access Journals (Sweden)

    Weiying Wang

    2014-01-01

    Full Text Available Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms.

  16. Fault detection and diagnosis for gas turbines based on a kernelized information entropy model.

    Science.gov (United States)

    Wang, Weiying; Xu, Zhiqiang; Tang, Rui; Li, Shuying; Wu, Wei

    2014-01-01

    Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms. PMID:25258726

  17. Fault detection and diagnosis for gas turbines based on a kernelized information entropy model.

    Science.gov (United States)

    Wang, Weiying; Xu, Zhiqiang; Tang, Rui; Li, Shuying; Wu, Wei

    2014-01-01

    Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms.

  18. Model-based geostatistics

    CERN Document Server

    Diggle, Peter J

    2007-01-01

    Model-based geostatistics refers to the application of general statistical principles of modeling and inference to geostatistical problems. This volume provides a treatment of model-based geostatistics and emphasizes on statistical methods and applications. It also features analyses of datasets from a range of scientific contexts.

  19. A Study on the Application of the Decision Tree Algorithm in Psychological Information of Vocational College Students

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available This paper discusses the basic operating principle and the development status of data mining technology, analyzes the insufficiency of the existing psychological management system, and proposes the development trend of psychological health education in colleges. According to an analysis on factors affecting college students’ mental health and the deviation between the reality and the current number of students with psychological abnormality, this paper studies the application of data mining technology and puts forward a system based on data mining that combines the classified data mining technology with the existing psychological management system.

  20. A best-first soft/hard decision tree searching MIMO decoder for a 4 × 4 64-QAM system

    KAUST Repository

    Shen, Chungan

    2012-08-01

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. © 2011 IEEE.

  1. Análise dos atributos do solo e da produtividade da cultura de cana-de-açúcar com o uso da geoestatística e árvore de decisão Analyze the soil attributes and sugarcane yield culture with the use of geostatistics and decision trees

    Directory of Open Access Journals (Sweden)

    Zigomar Menezes de Souza

    2010-04-01

    , applying the cell criterion, by using a yield monitor that allowed the elaboration of a digital map representing the surface of production of the studied area. To determine the soil attributes, soil samples were collected at the beginning of the harvest in 2006/2007 using a regular grid of 50 x 50m, in the depths of 0.0-0.2m and 0.2-0.4m. Soil attributes and sugarcane yield data were analyzed by using geostatistics techniques and were classified into three yield levels for the elaboration of the decision tree. The decision tree was induced in the software SAS Enterprise Miner, using an algorithm based on entropy reduction. Altitude and potassium presented the highest values of correlation with sugarcane yield. The induction of decision trees showed that the altitude is the variable with the greatest potential to interpret the sugarcane yield maps, then assisting in precision agriculture and, revealing an adjusted tool for the study of management definition zones in area cropped with sugarcane.

  2. Predicting aquatic toxicities of chemical pesticides in multiple test species using nonlinear QSTR modeling approaches.

    Science.gov (United States)

    Basant, Nikita; Gupta, Shikha; Singh, Kunwar P

    2015-11-01

    In this study, we established nonlinear quantitative-structure toxicity relationship (QSTR) models for predicting the toxicities of chemical pesticides in multiple aquatic test species following the OECD (Organization for Economic Cooperation and Development) guidelines. The decision tree forest (DTF) and decision tree boost (DTB) based QSTR models were constructed using a pesticides toxicity dataset in Selenastrum capricornutum and a set of six descriptors. Other six toxicity data sets were used for external validation of the constructed QSTRs. Global QSTR models were also constructed using the combined dataset of all the seven species. The diversity in chemical structures and nonlinearity in the data were evaluated. Model validation was performed deriving several statistical coefficients for the test data and the prediction and generalization abilities of the QSTRs were evaluated. Both the QSTR models identified WPSA1 (weighted charged partial positive surface area) as the most influential descriptor. The DTF and DTB QSTRs performed relatively better than the single decision tree (SDT) and support vector machines (SVM) models used as a benchmark here and yielded R(2) of 0.886 and 0.964 between the measured and predicted toxicity values in the complete dataset (S. capricornutum). The QSTR models applied to six other aquatic species toxicity data yielded R(2) of >0.92 (DTF) and >0.97 (DTB), respectively. The prediction accuracies of the global models were comparable with those of the S. capricornutum models. The results suggest for the appropriateness of the developed QSTR models to reliably predict the aquatic toxicity of chemicals and can be used for regulatory purpose.

  3. An improved classification tree analysis of high cost modules based upon an axiomatic definition of complexity

    Science.gov (United States)

    Tian, Jianhui; Porter, Adam; Zelkowitz, Marvin V.

    1992-01-01

    Identification of high cost modules has been viewed as one mechanism to improve overall system reliability, since such modules tend to produce more than their share of problems. A decision tree model was used to identify such modules. In this current paper, a previously developed axiomatic model of program complexity is merged with the previously developed decision tree process for an improvement in the ability to identify such modules. This improvement was tested using data from the NASA Software Engineering Laboratory.

  4. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Directory of Open Access Journals (Sweden)

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  5. Diagnosis of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther I.

    2015-03-01

    We study the depth of decision trees for diagnosis of constant 0 and 1 faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in the networks. For bases containing networks with at most 10 edges we find coefficients for linear bounds which are close to sharp. © 2014 Elsevier B.V. All rights reserved.

  6. Diagnosis of three types of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther

    2016-03-24

    We study the depth of decision trees for diagnosis of three types of constant faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis and each type of faults, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in networks. For bases containing networks with at most 10 edges, we find sharp coefficients for linear bounds.

  7. CLINICAL DATABASE ANALYSIS USING DMDT BASED PREDICTIVE MODELLING

    Directory of Open Access Journals (Sweden)

    Srilakshmi Indrasenan

    2013-04-01

    Full Text Available In recent years, predictive data mining techniques play a vital role in the field of medical informatics. These techniques help the medical practitioners in predicting various classes which is useful in prediction treatment. One of such major difficulty is prediction of survival rate in breast cancer patients. Breast cancer is a common disease these days and fighting against it is a tough battle for both the surgeons and the patients. To predict the survivability rate in breast cancer patients which helps the medical practitioner to select the type of treatment a predictive data mining technique called Diversified Multiple Decision Tree (DMDT classification is used. Additionally, to avoid difficulties from the outlier and skewed data, it is also proposed to perform the improvement of training space by outlier filtering and over sampling. As a result, this novel approach gives the survivability rate of the cancer patients based on which the medical practitioners can choose the type of treatment.

  8. Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.

    Directory of Open Access Journals (Sweden)

    Christos T Nakas

    Full Text Available Electronic Health Record (EHR data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858. Penalized logistic regression methodology provided a better result (AUC = 0.872. Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively. Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR data that have a significant amount of missing records (in up to >50% of the studied features eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures.

  9. Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.

    Science.gov (United States)

    Nakas, Christos T; Schütz, Narayan; Werners, Marcus; Leichtle, Alexander B

    2016-01-01

    Electronic Health Record (EHR) data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS) model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC) for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858). Penalized logistic regression methodology provided a better result (AUC = 0.872). Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively). Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR) data that have a significant amount of missing records (in up to >50% of the studied features) eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures. PMID:27414408

  10. Automatic generation of a metamodel from an existing knowledge base to assist the development of a new analogous knowledge base.

    Science.gov (United States)

    Bouaud, J; Séroussi, B

    2002-01-01

    Knowledge acquisition is a key step in the development of knowledge-based systems and methods have been proposed to help elicitating a domain-specific task model from a generic task model. We explored how an existing validated knowledge base (KB) represented by a decision tree could be automatically processed to infer a higher level domain-specific task model. On-codoc is a guideline-based decision support system applied to breast cancer therapy. Assuming task identity and ontological proximity between breast and lung cancer domains, the generalization of the breast can-cer KB should allow to build a metamodel to serve as a guide for the elaboration of a new specific KB on lung cancer. Two types of parametrized generalization methods based on tree structure simplification and ontological abstraction were used. We defined a similarity distance and a generalization coefficient to select the best metamodel identified as the closest to the original decision tree of the most generalized metamodels. PMID:12463788

  11. Improvement of Tone Intelligibility for Average-Voice-Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2012-01-01

    Full Text Available Problem statement: Tone intelligibility in speech synthesis is an important attribute that should be taken into account. The tone correctness of the synthetic speech is degraded considerably in the average-voice-based HMM-based Thai speech synthesis. The tying mechanism in the decision tree based context clustering without appropriate criterion causes unexpected tone neutralization. Incorporation of the phrase intonation to the context clustering process in the training stage was proposed early. However, the tone correctness is not satisfied. Approach: This study proposes a number of tonal features including tone-geometrical features and phrase intonation features to be exploited in the context clustering process of HMM training stage. Results: In the experiments, subjective evaluations of both average voice and adapted voice in terms of the intelligibility of tone are conducted. Effects on decision trees of the extracted features are also evaluated. By considering gender in training speech, two core experiments were conducted. The first experiment shows that the proposed tonal features can improve the tone intelligibility for female speech model above that of male speech model, while the second experiment shows that the proposed tonal features improve the tone intelligibility for gender dependent model than for gender independent model. Conclusion: All of the experimental results confirm that the tone correctness of the synthesized speech from the average-voice-based HMM-based Thai speech synthesis is significantly improved when using most of the extracted features.

  12. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS

    Science.gov (United States)

    Tien Bui, Dieu; Pradhan, Biswajeet; Nampak, Haleh; Bui, Quang-Thanh; Tran, Quynh-An; Nguyen, Quoc-Phi

    2016-09-01

    This paper proposes a new artificial intelligence approach based on neural fuzzy inference system and metaheuristic optimization for flood susceptibility modeling, namely MONF. In the new approach, the neural fuzzy inference system was used to create an initial flood susceptibility model and then the model was optimized using two metaheuristic algorithms, Evolutionary Genetic and Particle Swarm Optimization. A high-frequency tropical cyclone area of the Tuong Duong district in Central Vietnam was used as a case study. First, a GIS database for the study area was constructed. The database that includes 76 historical flood inundated areas and ten flood influencing factors was used to develop and validate the proposed model. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Receiver Operating Characteristic (ROC) curve, and area under the ROC curve (AUC) were used to assess the model performance and its prediction capability. Experimental results showed that the proposed model has high performance on both the training (RMSE = 0.306, MAE = 0.094, AUC = 0.962) and validation dataset (RMSE = 0.362, MAE = 0.130, AUC = 0.911). The usability of the proposed model was evaluated by comparing with those obtained from state-of-the art benchmark soft computing techniques such as J48 Decision Tree, Random Forest, Multi-layer Perceptron Neural Network, Support Vector Machine, and Adaptive Neuro Fuzzy Inference System. The results show that the proposed MONF model outperforms the above benchmark models; we conclude that the MONF model is a new alternative tool that should be used in flood susceptibility mapping. The result in this study is useful for planners and decision makers for sustainable management of flood-prone areas.

  13. 基于“优选肿瘤标志群”建立的决策树模型对肺癌辅助诊断的价值%Application of decision tree combined with filtered biomarkers in the di-agnosis of lung cancer

    Institute of Scientific and Technical Information of China (English)

    何其栋; 魏小玲; 张红巧; 王威; 吴拥军

    2014-01-01

    目的:应用决策树技术联合肿瘤标志蛋白芯片建立基于“优选肿瘤标志群”的决策树模型,实现对肺癌的快速诊断。方法:运用肿瘤标志定量检测试剂盒测定201例肺部良性疾病及199例肺癌患者血清中9项肿瘤标志[癌胚抗原、糖原类抗原19-9( CA199)、神经元特异性烯醇化酶、CA242、铁蛋白、CA125、甲胎蛋白、人生长激素和CA153]水平,应用logistic回归对肿瘤标志进行筛选以获得“优选肿瘤标志群”,分别于筛选前后建立决策树模型和Fisher判别分析模型。结果:肺癌组9项血清肿瘤标志水平均高于肺良性疾病组(P<0.05)。筛选前基于9项肿瘤标志分别建立的Fisher判别分析模型、决策树模型和筛选后基于6项肿瘤标志建立的Fisher判别分析模型、决策树模型,其预测准确度分别为86.0%、92.5%、84.5%、91.5%。筛选前和筛选后决策树模型ROC曲线的AUC分别为0.925和0.915,均高于Fisher判别分析的0.860和0.845(Z=4.462和4.575,P均<0.01);但决策树模型和Fisher判别分析筛选前后自身相比,差异均无统计学意义(Z=1.914和1.074,P均>0.05)。结论:基于6项肿瘤标志建立的决策树模型诊断肺癌的效果优于Fisher判别分析。%Aim:To establish decision tree model based on filtered biomarkers to achieve rapid diagnosis of lung canc -er.Methods:The serum levels of 9 tumor markers (CEA,CA199,NSE,CA242,Ferritin,CA125,AFP,HGH and CA153) in 199 patients with lung cancer and 201 patients with benign pulmonary lesion were measured by multiple tumor marker protein biochip, and the models of C5.0 and Fisher discrimination analysis were developed based on the tumor markers be-fore and after being filtered by logistic regression .Results:The serum levels of the 9 tumor markers in patients with lung cancer were significantly higher than those in patients with benign

  14. Structural Equation Model Trees

    Science.gov (United States)

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  15. Classificação espectral de área plantada com a cultura da cana-de-açúcar por meio da árvore de decisão Spectral classification of planted area with sugarcane through the decision tree

    Directory of Open Access Journals (Sweden)

    Rafael C. Delgado

    2012-04-01

    Full Text Available O objetivo deste trabalho foi testar o classificador "árvore de decisão", em dados provenientes de sensores orbitais, para identificar área plantada com cana-de-açúcar, em diferentes épocas de plantio na Fazenda Boa Fé, localizada no Triângulo Mineiro, mais especificamente no município de Conquista, Minas Gerais. Acoplaram-se técnicas de Sensoriamento Remoto (SR em um módulo de Sistema de Informação Geográfica (SIG, permitindo uma análise temporal do uso e ocupação do solo, especialmente com vistas a identificar e a monitorar as áreas agrícolas. Com base no cálculo do viés médio (VM, o presente estudo mostrou que, em áreas de cana-de-açúcar, onde a irrigação é frequente e ocorrem chuvas significativas que antecedem a passagem do satélite Landsat-5, os valores foram ligeiramente subestimados, com valor deste indicador de -0,13 ha. Foi verificado, também, que os valores de NDVI mais altos proporcionaram uma leve superestimativa dos resultados, com valores de viés médio variando de 0,04 a 0,23 ha. Conforme os resultados, o classificador árvore de decisão apresentou um grande potencial para o mapeamento das áreas cultivadas com cana-de-açúcar.This study was carried out to test the "decision tree" classifier via remote sensing (RS, to identify planted areas with sugarcane, at different planting dates in Boa Fé, located in the Triângulo Mineiro, more specifically in the town of Conquista, Minas Gerais, Brazil. RS techniques, integrated into a Geographic Information System (GIS, allow a temporal analysis of land use and occupation, especially in order to identify and monitor agricultural areas. Based on the calculation of mean bias (VM, this study showed that in areas of sugarcane, where irrigation is frequent and significant rainfall occurring prior to the passage of Landsat-5, the estimated values were slightly underestimated, with the value of this indicator equal to -0.13 ha. It was also verified that the

  16. Model Based Definition

    Science.gov (United States)

    Rowe, Sidney E.

    2010-01-01

    In September 2007, the Engineering Directorate at the Marshall Space Flight Center (MSFC) created the Design System Focus Team (DSFT). MSFC was responsible for the in-house design and development of the Ares 1 Upper Stage and the Engineering Directorate was preparing to deploy a new electronic Configuration Management and Data Management System with the Design Data Management System (DDMS) based upon a Commercial Off The Shelf (COTS) Product Data Management (PDM) System. The DSFT was to establish standardized CAD practices and a new data life cycle for design data. Of special interest here, the design teams were to implement Model Based Definition (MBD) in support of the Upper Stage manufacturing contract. It is noted that this MBD does use partially dimensioned drawings for auxiliary information to the model. The design data lifecycle implemented several new release states to be used prior to formal release that allowed the models to move through a flow of progressive maturity. The DSFT identified some 17 Lessons Learned as outcomes of the standards development, pathfinder deployments and initial application to the Upper Stage design completion. Some of the high value examples are reviewed.

  17. A Packet-classification Algorithm Based on Hash and AQT Decision Tree%基于Hash和AQT的类决策树包分类算法研究

    Institute of Scientific and Technical Information of China (English)

    赵国锋; 陈群丽

    2010-01-01

    多维包分类算法是网络安全、网络测量、服务质量、流路由等技术的重要组成部分,然而设计一种在时间上和空间上均占优的包分类算法却十分困难.在研究现有的经典IP包分类算法的基础上,根据协议类型域有限取值的特点提出了一种基于Hash函数和AQT的决策树的新型IP包分类算法.仿真结果表明:相比传统包分类算法,该算法具有更低的时空复杂度.

  18. 一种改进的SVM决策树文本分类算法%Text Classifier Based on an Improved SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    赵天昀

    2010-01-01

    将SVM和二叉决策树结合起来构成SVM决策树的方法能够较好地解决多类文本分类问题,在此基础上引入了一种基于支持向量数据描述(SVDD)的类间可分性度量方法,对SVM决策树分类器进行改进,实验表明,该方法有效地提高了SVM决策树多类分类器的分类精度和速度.

  19. Decision Tree based Detection of Botnet Flow%基于决策树的僵尸流量检测方法研究

    Institute of Scientific and Technical Information of China (English)

    谢开斌; 蔡皖东; 蔡俊朝

    2008-01-01

    僵尸网络目前是互联网面临的安全威胁之一,检测网络中潜在的僵尸网络流量对提高互联网安全性具有重要意义.论文重点研究了基于IRC协议的僵尸网络,以僵尸主机与聊天服务器之间的会话特征为基础,提出了一种基于决策树的僵尸网络流量检测方法.实验证明该方法是可行的.

  20. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  1. Is a home based video teleconcltation setup cost effective for lowering HBA1C for patients with type-2 diabetes over a six-month period?

    DEFF Research Database (Denmark)

    Sall Jensen, Morten; Rasmussen, Ole Winther

    OBJECTIVES: A RCT assessed the effectiveness and costs of a home based video teleconsultation (HVT) setup to lower HbA1c in patients with type-2 diabetes against usual out-patient treatment on the hospital. The HVT equipment was delivered to the patients by the hospital. This analysis shows...... the potential incremental cost-effectiveness ratio (ICER) of using a HVT setup on six-months health care effects and costs. METHODS: The study effectiveness outcome was HbA1c level in mmol/l. The economic analysis was performed with a spreadsheet decision tree model with a Danish hospital payer’s direct cost...

  2. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  3. 北京城市居民的养老模式选择及其合理性分析%Urban Elders' Desirable Caring Patterns and Its Rationality: A Decision Tree Analysis

    Institute of Scientific and Technical Information of China (English)

    高晓路; 颜秉秋; 季珏

    2012-01-01

    Based on a questionnaire survey in Beijing, the desirable caring patterns of urban elderly were investi- gated. With a decision tree analysis approach, the respondents' choices among four different caring patterns (living independently, family care, community care, and institutional care) in two scenarios were revealed, one in the healthy stage and one that a person was in need of long term care. Then the rationality of the preferred caring patterns was examined. First of all, the study manifested the lifestyle change of Chinese elderly, which was char- acterized by a tremendous number of no-child families. There was a huge gap between the needs of people in dif- ferent health stages. In particular, about half of the respondents intended to go to nursing homes if they were in need of care, while only 5.7% intended to do so when they were healthy. However, the severe shortage of caring facilities was a critical issue, especially those for the disabled and semi-disabled people, and it would be unrealis- tic to provide enough nursing beds in the future. Considering the capacity of service supply, it was proposed that the appropriate ratios for the (semi-)disabled elderly to choose institutional care and community care in the year 2020 could be 35% and 30%, respectively. Furthermore, people aged under 70 should be the main target of demand management, most of whom had demonstrated a strong preference for institutional care in the future.%基于北京市典型社区的问卷调查,运用决策树分析的方法,对城市居民在不同阶段养老方式的选择及其合理性进行了实证分析。研究表明:①目前,我国传统的家庭养老模式已经转型,北京市老年人的空巢家庭比例超过1/2,城市居民在健康状态下约80%选择独自生活,而独立生活有困难时近1/2的老人倾向于选择机构养老。②目前的主要问题是,老年人对社区居家养老的了解和认可度十分有限,养老机构总量不足的矛盾十

  4. Assessment for the Model Predicting of the Cognitive and Language Ability in the Mild Dementia by the Method of Data-Mining Technique

    Directory of Open Access Journals (Sweden)

    Haewon Byeon

    2016-06-01

    Full Text Available Assessments of cognitive and verbal functions are widely used as screening tests to detect early dementia. This study developed an early dementia prediction model for Korean elderly based on random forest algorithm and compared its results and precision with those of logistic regression model and decision tree model. Subjects of the study were 418 elderly (135 males and 283 females over the age of 60 in local communities. Outcome was defined as having dementia and explanatory variables included digit span forward, digit span backward, confrontational naming, Rey Complex Figure Test (RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, RCFT recognition false positive, Seoul Verbal Learning Test (SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, Korean Color Word Stroop Test (K-CWST color reading correct, and K-CWST color reading error. The Random Forests algorithm was used to develop prediction model and the result was compared with logistic regression model and decision tree based on chi-square automatic interaction detector (CHAID. As the result of the study, the tests with high level of predictive power in the detection of early dementia were verbal memory, visuospatial memory, naming, visuospatial functions, and executive functions. In addition, the random forests model was more accurate than logistic regression and CHIAD. In order to effectively detect early dementia, development of screening test programs is required which are composed of tests with high predictive power.

  5. Testing and Treating Women after Unsuccessful Conservative Treatments for Overactive Bladder or Mixed Urinary Incontinence: A Model-Based Economic Evaluation Based on the BUS Study

    Science.gov (United States)

    Barton, Pelham; Middleton, Lee J.; Deeks, Jonathan J.; Daniels, Jane P.; Latthe, Pallavi; Coomarasamy, Arri; Rachaneni, Suneetha; McCooty, Shanteela; Verghese, Tina S.; Roberts, Tracy E.

    2016-01-01

    Objective To compare the cost-effectiveness of bladder ultrasonography, clinical history, and urodynamic testing in guiding treatment decisions in a secondary care setting for women failing first line conservative treatment for overactive bladder or urgency-predominant mixed urinary incontinence. Design Model-based economic evaluation from a UK National Health Service (NHS) perspective using data from the Bladder Ultrasound Study (BUS) and secondary sources. Methods Cost-effectiveness analysis using a decision tree and a 5-year time horizon based on the outcomes of cost per woman successfully treated and cost per Quality-Adjusted Life-Year (QALY). Deterministic and probabilistic sensitivity analyses, and a value of information analysis are also undertaken. Results Bladder ultrasonography is more costly and less effective test-treat strategy than clinical history and urodynamics. Treatment on the basis of clinical history alone has an incremental cost-effectiveness ratio (ICER) of £491,100 per woman successfully treated and an ICER of £60,200 per QALY compared with the treatment of all women on the basis of urodynamics. Restricting the use of urodynamics to women with a clinical history of mixed urinary incontinence only is the optimal test-treat strategy on cost-effectiveness grounds with ICERs of £19,500 per woman successfully treated and £12,700 per QALY compared with the treatment of all women based upon urodynamics. Conclusions remained robust to sensitivity analyses, but subject to large uncertainties. Conclusions Treatment based upon urodynamics can be seen as a cost-effective strategy, and particularly when targeted at women with clinical history of mixed urinary incontinence only. Further research is needed to resolve current decision uncertainty. PMID:27513926

  6. Statistical Model for Prediction of Diabetic Foot Disease in Type 2 Diabetic Patients

    Directory of Open Access Journals (Sweden)

    Raúl López Fernández

    2016-02-01

    Full Text Available Background: the need to predict and study diabetic foot problems is a critical issue and represents a major medical challenge. The reduction of its incidence can lead to positive results for improving the quality of life of patients and the impact on the socio-economic sphere, due to the high prevalence of diabetes in the working population. Objective: to design a statistical model for prediction of diabetic foot disease in type 2 diabetic patients. Methods: a descriptive study was conducted in patients attending the Diabetes Clinic in Cienfuegos from 2010 to 2013. Significant risk factors for diabetic foot disease were analyzed as variables. To design the model, binary logistic regression analysis and Chi-squared automatic interaction detection decision tree were used. Results: two models that behaved similarly based on the comparison criteria considered (percentage of correct classification, sensitivity and specificity were developed. Validation was established through the receiver operating characteristic curve. The model using Chi-squared automatic interaction detection showed the best predictive results. Conclusions: Chi-squared automatic interaction detection decision trees have an adequate predictive capacity, which can be used in the Diabetes Clinic of Cienfuegos municipality.

  7. Analysis of fluidized bed granulation process using conventional and novel modeling techniques.

    Science.gov (United States)

    Petrović, Jelena; Chansanroj, Krisanin; Meier, Brigitte; Ibrić, Svetlana; Betz, Gabriele

    2011-10-01

    Various modeling techniques have been applied to analyze fluidized-bed granulation process. Influence of various input parameters (product, inlet and outlet air temperature, consumption of liquid-binder, granulation liquid-binder spray rate, spray pressure, drying time) on granulation output properties (granule flow rate, granule size determined using light scattering method and sieve analysis, granules Hausner ratio, porosity and residual moisture) has been assessed. Both conventional and novel modeling techniques were used, such as screening test, multiple regression analysis, self-organizing maps, artificial neural networks, decision trees and rule induction. Diverse testing of developed models (internal and external validation) has been discussed. Good correlation has been obtained between the predicted and the experimental data. It has been shown that nonlinear methods based on artificial intelligence, such as neural networks, are far better in generalization and prediction in comparison to conventional methods. Possibility of usage of SOMs, decision trees and rule induction technique to monitor and optimize fluidized-bed granulation process has also been demonstrated. Obtained findings can serve as guidance to implementation of modeling techniques in fluidized-bed granulation process understanding and control. PMID:21839830

  8. Machine Learning Approaches for Modeling Spammer Behavior

    CERN Document Server

    Islam, Md Saiful; Islam, Md Rafiqul

    2010-01-01

    Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Na\\"ive Bayesian classifier (Na\\"ive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

  9. Model Based Analysis of Face Images for Facial Feature Extraction

    Science.gov (United States)

    Riaz, Zahid; Mayer, Christoph; Beetz, Michael; Radig, Bernd

    This paper describes a comprehensive approach to extract a common feature set from the image sequences. We use simple features which are easily extracted from a 3D wireframe model and efficiently used for different applications on a benchmark database. Features verstality is experimented on facial expressions recognition, face reognition and gender classification. We experiment different combinations of the features and find reasonable results with a combined features approach which contain structural, textural and temporal variations. The idea follows in fitting a model to human face images and extracting shape and texture information. We parametrize these extracted information from the image sequences using active appearance model (AAM) approach. We further compute temporal parameters using optical flow to consider local feature variations. Finally we combine these parameters to form a feature vector for all the images in our database. These features are then experimented with binary decision tree (BDT) and Bayesian Network (BN) for classification. We evaluated our results on image sequences of Cohn Kanade Facial Expression Database (CKFED). The proposed system produced very promising recognition rates for our applications with same set of features and classifiers. The system is also realtime capable and automatic.

  10. Extraction of winter wheat planted area in Jiangsu province using decision tree and mixed-pixel methods%基于决策树和混合像元分解的江苏省冬小麦种植面积提取

    Institute of Scientific and Technical Information of China (English)

    王连喜; 徐胜男; 李琪; 薛红喜; 吴建生

    2016-01-01

    are suitable for remote sensing monitoring of major crops in a large scale. Based on the analysis of the time-series spectrum character curve, crop type identification and acreage extraction can be effectively achieved. The time series curve of normalized difference vegetation index (NDVI) can provide the information of crop growth dynamic change, thus is suitable for remote sensing extracting of major crops planting area. We used Jiangsu province as a research area and employed NDVI (normalized difference vegetation index) time-series data from 46 scenes of MODIS images with spatial resolution of 250 m collected from January 1st 2013 to December 31st 2014, reflectance image data of MODIS collected on April 23rd and image data of Landsat to carry out the remote sensing study for winter wheat planting area. First, a time-series curve of NDVI was built from the MODIS data, which was smoothed by an improved Savitzky-Golay filter. The improved Savitzky-Golay filter reserved the authenticity of data at both ends of the NDVI time series while further improving the smoothness of the curve. Based on the reconstruction of NDVI time series analysis, phenology, plant structure and the samples of ground survey, we extracted the key value of typical objects in their phenological growth period emphatically. At the same time, we analyzed the variation trend of winter wheat, woodland and rice (starting time, range, extent and maximum of NDVI) during the growth period. Through comparing and analyzing the characteristics of NDVI time series curves of different objects after smoothing, we defined the different crops, determined the training rules and build the construction of decision tree so that we can extract the distribution of winter wheat preliminarily. The decision tree classification method can be done quickly and efficiently using multi-threshold, however whose threshold is difficult to select accurately as a result of mixed pixel problem. The range of threshold would affect

  11. A method of real-time fault diagnosis for power transformers based on vibration analysis

    International Nuclear Information System (INIS)

    In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection. (paper)

  12. Keyphrase extraction based on topic feature%基于主题特征的关键词抽取

    Institute of Scientific and Technical Information of China (English)

    刘俊; 邹东升; 邢欣来; 李英豪

    2012-01-01

    Keyphrase extraction is a process for extracting a set of terms from a document. This paper proposed a novel topic feature for extracting keyphrase. This topic feature was computed based on topic model which modeled the topic-word distributions and the topic distributions of document. Moreover, it proposed a keyphrase extraction approach based on bagged decision trees. This approach jointed common features and the proposed topic feature. Experimental results demonstrate that the proposed topic feature can make an improvement for keyphrase extraction. At the mean time, an effective performance can be a-chieved by the bagged decision trees based approach.%为了使抽取出的关键词更能反映文档主题,提出了一种新的词的主题特征(topic feature,TF)计算方法,该方法利用主题模型中词和主题的分布情况计算词的主题特征.并将该特征与关键词抽取中的常用特征结合,用装袋决策树方法构造一个关键词抽取模型.实验结果表明提出的主题特征可以提升关键词抽取的效果,同时验证了装袋决策树在关键词抽取中的适用性.

  13. Probabilistic, meso-scale flood loss modelling

    Science.gov (United States)

    Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno

    2016-04-01

    Flood risk analyses are an important basis for decisions on flood risk management and adaptation. However, such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments and even more for flood loss modelling. State of the art in flood loss modelling is still the use of simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood loss models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we demonstrate and evaluate the upscaling of the approach to the meso-scale, namely on the basis of land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany (Botto et al. submitted). The application of bagging decision tree based loss models provide a probability distribution of estimated loss per municipality. Validation is undertaken on the one hand via a comparison with eight deterministic loss models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official loss data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of loss estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation approach is that it inherently provides quantitative information about the uncertainty of the prediction. References: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64. Botto A, Kreibich H, Merz B, Schröter K (submitted) Probabilistic, multi-variable flood loss modelling on the meso-scale with BT-FLEMO. Risk Analysis.

  14. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    International Nuclear Information System (INIS)

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R2) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R2 and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed

  15. Method for gesture based modeling

    DEFF Research Database (Denmark)

    2006-01-01

    A computer program based method is described for creating models using gestures. On an input device, such as an electronic whiteboard, a user draws a gesture which is recognized by a computer program and interpreted relative to a predetermined meta-model. Based on the interpretation, an algorithm...... is assigned to the gesture drawn by the user. The executed algorithm may, for example, consist in creating a new model element, modifying an existing model element, or deleting an existing model element....

  16. Reputation Detection of Credit Card Based on SVM%基于组合分类器的信用卡信誉检测

    Institute of Scientific and Technical Information of China (English)

    周宓

    2012-01-01

    给出了支持向量机的信用卡信誉检测模型和基于决策树的信用卡信誉检测模型的建立方法,并在这两种单一分类器的基础上,归纳总结支持向量机方法和决策树方法对信用卡信誉检测的偏好特性,提出了一种基于偏好特性进行组合的组合分类器模型建立方法.%Credit testing model of support vector machine and construction mehtod of credit testing model based on decision tree were given. Based on the two single classifier, preferences of credift card credit test- ing supporting support vector machine and decision tree were concluded and summarized. Construction meh- ted of combined classification model was proposed based on combination of preference characteristics.

  17. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  18. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  19. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  20. Model Construct Based Enterprise Model Architecture and Its Modeling Approach

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    In order to support enterprise integration, a kind of model construct based enterprise model architecture and its modeling approach are studied in this paper. First, the structural makeup and internal relationships of enterprise model architecture are discussed. Then, the concept of reusable model construct (MC) which belongs to the control view and can help to derive other views is proposed. The modeling approach based on model construct consists of three steps, reference model architecture synthesis, enterprise model customization, system design and implementation. According to MC based modeling approach a case study with the background of one-kind-product machinery manufacturing enterprises is illustrated. It is shown that proposal model construct based enterprise model architecture and modeling approach are practical and efficient.

  1. Model-based software design

    Science.gov (United States)

    Iscoe, Neil; Liu, Zheng-Yang; Feng, Guohui; Yenne, Britt; Vansickle, Larry; Ballantyne, Michael

    1992-01-01

    Domain-specific knowledge is required to create specifications, generate code, and understand existing systems. Our approach to automating software design is based on instantiating an application domain model with industry-specific knowledge and then using that model to achieve the operational goals of specification elicitation and verification, reverse engineering, and code generation. Although many different specification models can be created from any particular domain model, each specification model is consistent and correct with respect to the domain model.

  2. Model-Based Reasoning

    Science.gov (United States)

    Ifenthaler, Dirk; Seel, Norbert M.

    2013-01-01

    In this paper, there will be a particular focus on mental models and their application to inductive reasoning within the realm of instruction. A basic assumption of this study is the observation that the construction of mental models and related reasoning is a slowly developing capability of cognitive systems that emerges effectively with proper…

  3. Model-based Software Engineering

    DEFF Research Database (Denmark)

    Kindler, Ekkart

    2010-01-01

    The vision of model-based software engineering is to make models the main focus of software development and to automatically generate software from these models. Part of that idea works already today. But, there are still difficulties when it comes to behaviour. Actually, there is no lack in models...

  4. Principles of models based engineering

    Energy Technology Data Exchange (ETDEWEB)

    Dolin, R.M.; Hefele, J.

    1996-11-01

    This report describes a Models Based Engineering (MBE) philosophy and implementation strategy that has been developed at Los Alamos National Laboratory`s Center for Advanced Engineering Technology. A major theme in this discussion is that models based engineering is an information management technology enabling the development of information driven engineering. Unlike other information management technologies, models based engineering encompasses the breadth of engineering information, from design intent through product definition to consumer application.

  5. Graph Model Based Indoor Tracking

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lu, Hua; Yang, Bin

    2009-01-01

    infrastructure for different symbolic positioning technologies, e.g., Bluetooth and RFID. More specifically, the paper proposes a model of indoor space that comprises a base graph and mappings that represent the topology of indoor space at different levels. The resulting model can be used for one or several...... indoor positioning technologies. Focusing on RFID-based positioning, an RFID specific reader deployment graph model is built from the base graph model. This model is then used in several algorithms for constructing and refining trajectories from raw RFID readings. Empirical studies with implementations...

  6. 决策树模型与回归模型在天津市某区公务员健康状况分析中的应用与比较%Comparison between Decision Tree and Logistic Regression Applied in the Study of Health Status and Correlates in the Government Employee in a District of Tianjin

    Institute of Scientific and Technical Information of China (English)

    魏凤江; 崔壮; 李长平; 宋春华; 朱宝; 刘媛媛; 马骏

    2013-01-01

    目的 了解天津市某区公务员健康状况的影响因素,为提高该人群的健康水平提供依据.方法 于2008年9 ~12月,采用整群抽样的方法对天津市某区公务员进行健康状况及影响因素的问卷调查.应用SAS 8.2 Enterprise Miner模块建立决策树模型和回归模型,对该区公务员人群的健康状况影响因素进行分析和预测.结果 该区公务员总体患病率为47.0%,模型筛检出影响健康状况的因素有:年龄、体质指数、吸烟、被动吸烟、饮酒、睡眠时间、按时吃饭情况、体育锻炼花费时间、文化程度、婚姻状况,亚健康分值、心理健康分值.将logistic回归模型与决策树模型进行预测性能的比较,ROC面积比较结果发现,两者差别无统计学意义(x2=1.6073,P=0.2049).结论 公务员人群健康状况不容乐观,各种慢性病患病率较高,是今后开展健康管理的重点群体.%Objective To comprehend the health status of government employee in a district of Tianjin, and to provide relative guidelines of health management for government employee. Methods A questionnaire survey was conducted within September to December,2008. All participants were included by cluster sampling. Decision tree model and logistic regression model was conducted using SAS8.2 Enterprise Miner to analyze and predict the influential factors of the health status. Results The total prevalence rate was 47. 0% .Multi-variable analysis disclosed that age, BMI, smoking, passive smoking, drinking, sleep time, regular diet, intensity of physical exercise,education,marital status,sub-health scores, and mental health scores were associated with the health status. Use roc curve to comparise prediction effect between logistic regression model and the decision tree model, The results declosed that their was no statistical signifi- cance (χ2 = 1.6073, P = 0.2049). Conclusion The health status of government employee was far from ideal for some chronic diseases

  7. A heuristic finite-state model of the human driver in a car-following situation

    Science.gov (United States)

    Burnham, G. O.; Bekey, G. A.

    1976-01-01

    An approach to modeling human driver behavior in single-lane car following which is based on a finite-state decision structure is considered. The specific strategy at each point in the decision tree was obtained from observations of typical driver behavior. The synthesis of the decision logic is based on position and velocity thresholds and four states defined by regions in the phase plane. The performance of the resulting assumed intuitively logical model was compared with actual freeway data. The match of the model to the data was optimized by adapting the model parameters using a modified PARTAN algorithm. The results indicate that the heuristic model behavior matches actual car-following performance better during deceleration and constant velocity phases than during acceleration periods.

  8. Model-based consensus

    NARCIS (Netherlands)

    Boumans, Marcel

    2014-01-01

    The aim of the rational-consensus method is to produce “rational consensus”, that is, “mathematical aggregation”, by weighing the performance of each expert on the basis of his or her knowledge and ability to judge relevant uncertainties. The measurement of the performance of the experts is based on

  9. Model-based consensus

    NARCIS (Netherlands)

    M. Boumans

    2014-01-01

    The aim of the rational-consensus method is to produce "rational consensus", that is, "mathematical aggregation", by weighing the performance of each expert on the basis of his or her knowledge and ability to judge relevant uncertainties. The measurement of the performance of the experts is based on

  10. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  11. Method of modelization assistance with bond graphs and application to qualitative diagnosis of physical systems

    International Nuclear Information System (INIS)

    After having recalled the usual diagnosis techniques (failure index, decision tree) and those based on an artificial intelligence approach, the author reports a research aimed at exploring the knowledge and model generation technique. He focuses on the design of an aid to model generation tool and aid-to-diagnosis tool. The bond graph technique is shown to be adapted to the aid to model generation, and is then adapted to the aid to diagnosis. The developed tool is applied to three projects: DIADEME (a diagnosis system based on physical model), the improvement of the SEXTANT diagnosis system (an expert system for transient analysis), and the investigation on an Ariane 5 launcher component. Notably, the author uses the Reiter and Greiner algorithm

  12. Event-Based Conceptual Modeling

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    The paper demonstrates that a wide variety of event-based modeling approaches are based on special cases of the same general event concept, and that the general event concept can be used to unify the otherwise unrelated fields of information modeling and process modeling. A set of event......-based modeling approaches are analyzed and the results are used to formulate a general event concept that can be used for unifying the seemingly unrelated event concepts. Events are characterized as short-duration processes that have participants, consequences, and properties, and that may be modeled in terms...... of information structures. The general event concept can be used to guide systems analysis and design and to improve modeling approaches....

  13. Constructing a Soil Class Map of Denmark based on the FAO Legend Using Digital Techniques

    DEFF Research Database (Denmark)

    Adhikari, Kabindra; Minasny, Budiman; Greve, Mette Balslev;

    2014-01-01

    Soil mapping in Denmark has a long history and a series of soil maps based on conventional mapping approaches have been produced. In this study, a national soil map of Denmark was constructed based on the FAO–Unesco Revised Legend 1990 using digital soil mapping techniques, existing soil profile...... observations and environmental data. This map was developed using soil-landscape models generated with a decision tree-based digital soil mapping technique. As input variables in the model, more than 1170 soil profile data and 17 environmental variables including geology, land use, landscape type, area of...... overall prediction accuracy based on a 20% hold-back validation data was 60%, but increased to 76% when prediction accuracy of similar soil groups was considered. Podzoluvisols and Alisols were among the weakly predicted groups (< 48% prediction confidence), whereas Podzols and Luvisols had the highest...

  14. 决策树 ID3算法在客户信息分类中的应用%Application of decision tree ID3 algorithm in classification of customer information

    Institute of Scientific and Technical Information of China (English)

    吴建源

    2014-01-01

    In modern enterprises, how to retain ecustomers is important research direction of the enterprise customer management.This paper uses the decision tree ID3 algorithm to analyze characteristics of customer attributes, realize the classification of customer information, find out the characteristics of all kinds of customers, and specifically improve the relationship with the customers, so as to avoid the customer loss, and improve the market share.%在现代企业,如何保留客户是企业客户管理的重要研究方向。使用决策树 ID3算法,分析客户的属性特征,实现客户信息的分类,找出各类客户的特征,有针对性地改善客户关系,从而避免客户流失,提高市场的占有率。

  15. Bank Customer Churn Decision Tree Prediction Algorithm under Data mining Technology%数据挖掘技术下的银行客户流失决策树预测算法

    Institute of Scientific and Technical Information of China (English)

    石杨; 岳嘉佳

    2014-01-01

    在银行客户流失预测系统中经常要通过客户数据对未知客户的服务信息进行预测,以对银行今后的经营策略提供依据。在对客户的预测中,经常需要对他们的某种分类属性进行分类规则挖掘。该文主要探讨使用决策树这种常用的有效的方法来对客户数据进行分类规则挖掘。%In the bank customer churn prediction system often unknown by the customer data to predict customer service infor-mation in order to provide the basis for the bank in the future business strategy. In the customer's forecast, they often need to clas-sify certain classification rule mining properties. This paper discusses the use of this common and effective decision tree approach to classification rule mining of customer data.

  16. Modeling Guru: Knowledge Base for NASA Modelers

    Science.gov (United States)

    Seablom, M. S.; Wojcik, G. S.; van Aartsen, B. H.

    2009-05-01

    Modeling Guru is an on-line knowledge-sharing resource for anyone involved with or interested in NASA's scientific models or High End Computing (HEC) systems. Developed and maintained by the NASA's Software Integration and Visualization Office (SIVO) and the NASA Center for Computational Sciences (NCCS), Modeling Guru's combined forums and knowledge base for research and collaboration is becoming a repository for the accumulated expertise of NASA's scientific modeling and HEC communities. All NASA modelers and associates are encouraged to participate and provide knowledge about the models and systems so that other users may benefit from their experience. Modeling Guru is divided into a hierarchy of communities, each with its own set forums and knowledge base documents. Current modeling communities include those for space science, land and atmospheric dynamics, atmospheric chemistry, and oceanography. In addition, there are communities focused on NCCS systems, HEC tools and libraries, and programming and scripting languages. Anyone may view most of the content on Modeling Guru (available at http://modelingguru.nasa.gov/), but you must log in to post messages and subscribe to community postings. The site offers a full range of "Web 2.0" features, including discussion forums, "wiki" document generation, document uploading, RSS feeds, search tools, blogs, email notification, and "breadcrumb" links. A discussion (a.k.a. forum "thread") is used to post comments, solicit feedback, or ask questions. If marked as a question, SIVO will monitor the thread, and normally respond within a day. Discussions can include embedded images, tables, and formatting through the use of the Rich Text Editor. Also, the user can add "Tags" to their thread to facilitate later searches. The "knowledge base" is comprised of documents that are used to capture and share expertise with others. The default "wiki" document lets users edit within the browser so others can easily collaborate on the

  17. Base Flow Model Validation Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The program focuses on turbulence modeling enhancements for predicting high-speed rocket base flows. A key component of the effort is the collection of...

  18. Constraint Based Modeling Going Multicellular.

    Science.gov (United States)

    Martins Conde, Patricia do Rosario; Sauter, Thomas; Pfau, Thomas

    2016-01-01

    Constraint based modeling has seen applications in many microorganisms. For example, there are now established methods to determine potential genetic modifications and external interventions to increase the efficiency of microbial strains in chemical production pipelines. In addition, multiple models of multicellular organisms have been created including plants and humans. While initially the focus here was on modeling individual cell types of the multicellular organism, this focus recently started to switch. Models of microbial communities, as well as multi-tissue models of higher organisms have been constructed. These models thereby can include different parts of a plant, like root, stem, or different tissue types in the same organ. Such models can elucidate details of the interplay between symbiotic organisms, as well as the concerted efforts of multiple tissues and can be applied to analyse the effects of drugs or mutations on a more systemic level. In this review we give an overview of the recent development of multi-tissue models using constraint based techniques and the methods employed when investigating these models. We further highlight advances in combining constraint based models with dynamic and regulatory information and give an overview of these types of hybrid or multi-level approaches.

  19. Event-Based Conceptual Modeling

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    2009-01-01

    The purpose of the paper is to obtain insight into and provide practical advice for event-based conceptual modeling. We analyze a set of event concepts and use the results to formulate a conceptual event model that is used to identify guidelines for creation of dynamic process models and static...... information models. We characterize events as short-duration processes that have participants, consequences, and properties, and that may be modeled in terms of information structures. The conceptual event model is used to characterize a variety of event concepts and it is used to illustrate how events can...... be used to integrate dynamic modeling of processes and static modeling of information structures. The results are unique in the sense that no other general event concept has been used to unify a similar broad variety of seemingly incompatible event concepts. The general event concept can be used...

  20. Event-Based Activity Modeling

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    2004-01-01

    We present and discuss a modeling approach that supports event-based modeling of information and activity in information systems. Interacting human actors and IT-actors may carry out such activity. We use events to create meaningful relations between information structures and the related activit...