WorldWideScience

Sample records for based decision-tree models

  1. Decision tree based knowledge acquisition and failure diagnosis using a PWR loop vibration model

    International Nuclear Information System (INIS)

    An analytical vibration model of the primary system of a 1300 MW PWR was used for simulating mechanical faults. Deviations in the calculated power density spectra and coherence functions are determined and classified. The decision tree technique is then used for a personal computer supported knowledge presentation and for optimizing the logical relationships between the simulated faults and the observed symptoms. The optimized decision tree forms the knowledge base and can be used to diagnose known cases as well as to include new data into the knowledge base if new faults occur. (author)

  2. Applying decision tree models to SMEs: A statistics-based model for customer relationship management

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-07-01

    Full Text Available Customer Relationship Management (CRM has been an important part of enterprise decision-making and management. In this regard, Decision Tree (DT models are the most common tools for investigating CRM and providing an appropriate support for the implementation of CRM systems. Yet, this method does not yield any estimate of the degree of separation of different subgroups involved in analysis. In this research, we compute three decision-making models in SMEs, analyzing different decision tree methods (C&RT, C4.5 and ID3. The methods are then used to compute ME and VoE for the models and they were then used to calculate the Mean Errors (ME and Variance of Errors (VoE estimates to investigate the predictive power of these methods. These decision tree methods were used to analyze small- and medium-sized enterprises (SME’s datasets. The paper proposes a powerful technical support for better directing market tends and mining in CRM. According to the findings, C&RT shows a better degree of separation. As a result, we recommend using decision tree methods together with ME and VoE to determine CRM factors.

  3. Decision Tree Based Algorithm for Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Kajal Rai

    2016-01-01

    Full Text Available An Intrusion Detection System (IDS is a defense measure that supervises activities of the computer network and reports the malicious activities to the network administrator. Intruders do many attempts to gain access to the network and try to harm the organization’s data. Thus the security is the most important aspect for any type of organization. Due to these reasons, intrusion detection has been an important research issue. An IDS can be broadly classified as Signature based IDS and Anomaly based IDS. In our proposed work, the decision tree algorithm is developed based on C4.5 decision tree approach. Feature selection and split value are important issues for constructing a decision tree. In this paper, the algorithm is designed to address these two issues. The most relevant features are selected using information gain and the split value is selected in such a way that makes the classifier unbiased towards most frequent values. Experimentation is performed on NSL-KDD (Network Security Laboratory Knowledge Discovery and Data Mining dataset based on number of features. The time taken by the classifier to construct the model and the accuracy achieved is analyzed. It is concluded that the proposed Decision Tree Split (DTS algorithm can be used for signature based intrusion detection.

  4. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    Science.gov (United States)

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  5. Statistical Decision-Tree Models for Parsing

    CERN Document Server

    Magerman, D M

    1995-01-01

    Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...

  6. Combined prediction model for supply risk in nuclear power equipment manufacturing industry based on support vector machine and decision tree

    International Nuclear Information System (INIS)

    The prediction index for supply risk is developed based on the factor identifying of nuclear equipment manufacturing industry. The supply risk prediction model is established with the method of support vector machine and decision tree, based on the investigation on 3 important nuclear power equipment manufacturing enterprises and 60 suppliers. Final case study demonstrates that the combination model is better than the single prediction model, and demonstrates the feasibility and reliability of this model, which provides a method to evaluate the suppliers and measure the supply risk. (authors)

  7. A decision-tree-based model for evaluating the thermal comfort of horses

    Directory of Open Access Journals (Sweden)

    Ana Paula de Assis Maia

    2013-12-01

    Full Text Available Thermal comfort is of great importance in preserving body temperature homeostasis during thermal stress conditions. Although the thermal comfort of horses has been widely studied, there is no report of its relationship with surface temperature (T S. This study aimed to assess the potential of data mining techniques as a tool to associate surface temperature with thermal comfort of horses. T S was obtained using infrared thermography image processing. Physiological and environmental variables were used to define the predicted class, which classified thermal comfort as "comfort" and "discomfort". The variables of armpit, croup, breast and groin T S of horses and the predicted classes were then subjected to a machine learning process. All variables in the dataset were considered relevant for the classification problem and the decision-tree model yielded an accuracy rate of 74 %. The feature selection methods used to reduce computational cost and simplify predictive learning decreased model accuracy to 70 %; however, the model became simpler with easily interpretable rules. For both these selection methods and for the classification using all attributes, armpit and breast T S had a higher power rating for predicting thermal comfort. Data mining techniques show promise in the discovery of new variables associated with the thermal comfort of horses.

  8. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    Science.gov (United States)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  9. Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model

    Directory of Open Access Journals (Sweden)

    Takada Masahiro

    2012-06-01

    Full Text Available Abstract Background The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method—the alternating decision tree (ADTree. Methods Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174 and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC curve analysis to discriminate node-positive patients from node-negative patients. Results The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI, 0.689–0.850] for the model training dataset and 0.772 (95% CI: 0.689–0.856 for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763–0.774. Conclusions Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.

  10. Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis

    Science.gov (United States)

    Swift, Andrew J.; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M.

    2016-01-01

    Abstract Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH.

  11. Fuzzy Decision Tree based Effective IMine Indexing

    Directory of Open Access Journals (Sweden)

    Peer Fatima,

    2012-02-01

    Full Text Available Data base management system is a set of programs that allows storing, modify, and obtain information from a database. With the huge increase in the amount of information, it is very difficult to manage these databases. Hence there is a need for an effective indexing technique. The advantage of using index lies in the fact is that index makes search operation very fast. In this paper proposed the IMine index (a common and compressed structure which presents close integration of item set mining structure by using Fuzzy Decision Tree (FDT and I-Tree. Previous approaches have used Prefix Hash Tree (PHT and FP-Bonsai Tree but it exhibits long delays and unnecessary use of available memory size. FDT uses certain rules to generate the tree structure and hence it is easy to read the index based on the rule. FDT allows selective reading on the I-Tree. The experimental results proves that the use of FDT in the IMine provides low reading cost, very low utilization of available memory, and hence very low computation time.

  12. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  13. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    Science.gov (United States)

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  14. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  15. Generating Optimized Decision Tree Based on Discrete Wavelet Transform

    Directory of Open Access Journals (Sweden)

    Kiran Kumar Reddi

    2010-03-01

    Full Text Available Increasing growth of functionality in current IT trends proved the decision making operations through mass data mining techniques. There is still a requirement for further efficiency and optimization. The problem of constructing the optimization decision tree is now an active research area. Generating an efficient and optimized decision tree with multi-attribute data source is considered as one of the shortcomings. This paper emphasizes to propose a multivariate statistical method Discrete Wavelet Transform on multi-attribute data for reducing dimensionality and to transform traditional decision tree algorithm to form a new algorithmic model. The experimental results described that this method can not only optimizes the structure of the decision tree, but also improves the problems existing in pruning and to mine the better rule set without effecting the purpose of prediction accuracy altogether.

  16. Development and Test of Fixed Average K-means Base Decision Trees Grouping Method by Improving Decision Tree Clustering Method

    OpenAIRE

    Jai-Houng Leu; Chih-Yao Lo; Chi-Hau Liu

    2009-01-01

    New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees) on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning poten...

  17. Data acquisition in modeling using neural networks and decision trees

    Directory of Open Access Journals (Sweden)

    R. Sika

    2011-04-01

    Full Text Available The paper presents a comparison of selected models from area of artificial neural networks and decision trees in relation with actualconditions of foundry processes. The work contains short descriptions of used algorithms, their destination and method of data preparation,which is a domain of work of Data Mining systems. First part concerns data acquisition realized in selected iron foundry, indicating problems to solve in aspect of casting process modeling. Second part is a comparison of selected algorithms: a decision tree and artificial neural network, that is CART (Classification And Regression Trees and BP (Backpropagation in MLP (Multilayer Perceptron networks algorithms.Aim of the paper is to show an aspect of selecting data for modeling, cleaning it and reducing, for example due to too strong correlationbetween some of recorded process parameters. Also, it has been shown what results can be obtained using two different approaches:first when modeling using available commercial software, for example Statistica, second when modeling step by step using Excel spreadsheetbasing on the same algorithm, like BP-MLP. Discrepancy of results obtained from these two approaches originates from a priorimade assumptions. Mentioned earlier Statistica universal software package, when used without awareness of relations of technologicalparameters, i.e. without user having experience in foundry and without scheduling ranks of particular parameters basing on acquisition, can not give credible basis to predict the quality of the castings. Also, a decisive influence of data acquisition method has been clearly indicated, the acquisition should be conducted according to repetitive measurement and control procedures. This paper is based on about 250 records of actual data, for one assortment for 6 month period, where only 12 data sets were complete (including two that were used for validation of neural network and useful for creating a model. It is definitely too

  18. Soil Organic Matter Mapping by Decision Tree Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHOU Bin; ZHANG Xing-Gang; WANG Fan; WANG Ren-Chao

    2005-01-01

    Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data.The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform,geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative,landscape factor-associated regular system. This system could be used to predict continuous SOM spatial distribution.By analyzing factors such as elevation, geological unit, soil type, land use, remotely sensed data, upslope contributing area, slope, aspect, planform curvature, and profile curvature, the decision tree could predict distribution of soil organic matter levels. Among these factors, elevation, land use, aspect, soil type, the first principle component of bitemporal Landsat TM, and upslope contributing area were considered the most important variables for predicting SOM. Results of the prediction between SOM content and landscape types sorted by the decision tree showed a close relationship with an accuracy of 81.1%.

  19. Web People Search Using Ontology Based Decision Tree

    Directory of Open Access Journals (Sweden)

    Mrunal Patil

    2012-09-01

    Full Text Available Nowadays, searching for people on web is the most common activity done by most of the users. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web page of interest is left on the user. In this paper, we develop a technique for web people search which clusters the web pages based on semantic information and maps them using ontology based decision tree making the user to access the information in more easy way. This technique uses the concept of ontology thus reducing the number of inconsistencies. The result proves that ontology based decision tree and clustering helps in increasing the efficiency of the overall search.

  20. Web People Search Using Ontology Based Decision Tree

    Directory of Open Access Journals (Sweden)

    Mrunal Patil

    2012-03-01

    Full Text Available Nowadays, searching for people on web is the most common activity done by most of the users. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web page of interest is left on the user. In this paper, we develop a technique for web people search which clusters the web pages based on semantic information and maps them using ontology based decision tree making the user to access the information in more easy way. This technique uses the concept of ontology thus reducing the number of inconsistencies. The result proves that ontology based decision tree and clustering helps in increasing the efficiency of the overall search.

  1. Multitask Efficiencies in the Decision Tree Model

    CERN Document Server

    Drucker, Andrew

    2008-01-01

    In Direct Sum problems [KRW], one tries to show that for a given computational model, the complexity of computing a collection $F = \\{f_i\\}$ of functions on independent inputs is approximately the sum of their individual complexities. In this paper, by contrast, we study the diversity of ways in which the joint computational complexity can behave when all the $f_i$ are evaluated on a \\textit{common} input. Fixing some model of computational cost, let $C_F(X): \\{0, 1\\}^l \\to \\mathbf{R}$ give the cost of computing the subcollection $\\{f_i(x): X_i = 1\\}$, on common input $x$. What constraints do the functions $C_F(X)$ obey, when $F$ is chosen freely? $C_F(X)$ will, for reasonable models, obey nonnegativity, monotonicity, and subadditivity. We show that, in the deterministic, adaptive query model, these are `essentially' the only constraints: for any function $C(X)$ obeying these properties and any $\\epsilon > 0$, there exists a family $F$ of boolean functions and a $T > 0$ such that for all $X \\in \\{0, 1\\}^l$, \\...

  2. The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring

    Directory of Open Access Journals (Sweden)

    Mohammad Khanbabaei

    2013-11-01

    Full Text Available Decision tree modelling, as one of data mining techniques, is used for credit scoring of bank customers.The main problem is the construction of decision trees that could classify customers optimally. This studypresents a new hybrid mining approach in the designof an effective and appropriate credit scoring model.It is based on genetic algorithm for credit scoringof bank customers in order to offer credit facilities toeach class of customers. Genetic algorithm can helpbanks in credit scoring of customers by selectingappropriate features and building optimum decisiontrees. The new proposed hybrid classification modelisestablished based on a combination of clustering, feature selection, decision trees, and genetic algorithmtechniques. We used clustering and feature selection techniques to pre-process the input samples toconstruct the decision trees in the credit scoringmodel. The proposed hybrid model choices and combinesthe best decision trees based on the optimality criteria. It constructs the final decision tree for creditscoring of customers. Using one credit dataset, results confirm that the classification accuracy of theproposed hybrid classification model is more than almost the entire classification models that have beencompared in this paper. Furthermore, the number ofleaves and the size of the constructed decision tree(i.e. complexity are less, compared with other decision tree models. In this work, one financial dataset waschosen for experiments, including Bank Mellat credit dataset.

  3. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    Science.gov (United States)

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value. PMID:27221983

  4. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    OpenAIRE

    R. Bou Kheir; P. K. Bøcher; M. B. Greve; M. H. Greve

    2010-01-01

    Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation). This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow directio...

  5. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    Science.gov (United States)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  6. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  7. Development and Test of Fixed Average K-means Base Decision Trees Grouping Method by Improving Decision Tree Clustering Method

    Directory of Open Access Journals (Sweden)

    Jai-Houng Leu

    2009-01-01

    Full Text Available New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning potential, high stability and high emotional quotient are selected, the return of investment in human resources will be more apparent. If employees of the above mentioned traits can be well utilized and incubated, the industry competitiveness of the enterprise will be enhanced effectively. From the personality specialty point of view, its function is to predict the efficiency of the personal achievement in correlation to his some implying personality specialties (blood group, constellation, etc.. The main purpose of this research is to get the useful information and important message about human performance from their historical records with this method. The Decision Tree Clustering Method data mining skills were improved and applied to get the critical factors that affect the human traits for its feasibility in this study.

  8. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-06-01

    Full Text Available Accurate information about organic/mineral soil occurrence is a prerequisite for many land resources management applications (including climate change mitigation. This paper aims at investigating the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to explain organic/mineral field measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in organic/mineral field measurements. The overall accuracy of the predictive organic/inorganic landscapes' map produced (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to facilitate the implementation of pedological/hydrological plans for conservation

  9. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    Directory of Open Access Journals (Sweden)

    R. Bou Kheir

    2010-01-01

    Full Text Available Accurate information about soil organic carbon (SOC, presented in a spatially form, is prerequisite for many land resources management applications (including climate change mitigation. This paper aims to investigate the potential of using geomorphometrical analysis and decision tree modeling to predict the geographic distribution of hydromorphic organic landscapes at unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area and one secondary (steady-state topographic wetness index topographic parameters were generated from Digital Elevation Models (DEMs acquired using airborne LIDAR (Light Detection and Ranging systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type to statistically explain SOC field measurements in hydromorphic landscapes of the chosen Danish area. A large number of tree-based classification models (186 were developed using (1 all of the parameters, (2 the primary DEM-derived topographic (morphological/hydrological parameters only, (3 selected pairs of parameters and (4 excluding each parameter one at a time from the potential pool of predictor parameters. The best classification tree model (with the lowest misclassification error and the smallest number of terminal nodes and predictor parameters combined the steady-state topographic wetness index and soil type, and explained 68% of the variability in field SOC measurements. The overall accuracy of the produced predictive SOC map (at 1:50 000 cartographic scale using the best tree was estimated to be ca. 75%. The proposed classification-tree model is relatively simple, quick, realistic and practical, and it can be applied to other areas, thereby providing a tool to help with the implementation of pedological/hydrological plans for conservation and sustainable

  10. Cost effectiveness of community-based therapeutic care for children with severe acute malnutrition in Zambia: decision tree model

    OpenAIRE

    Bachmann Max O

    2009-01-01

    Abstract Background Children aged under five years with severe acute malnutrition (SAM) in Africa and Asia have high mortality rates without effective treatment. Primary care-based treatment of SAM can have good outcomes but its cost effectiveness is largely unknown. Method This study estimated the cost effectiveness of community-based therapeutic care (CTC) for children with severe acute malnutrition in government primary health care centres in Lusaka, Zambia, compared to no care. A decision...

  11. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    Energy Technology Data Exchange (ETDEWEB)

    Bou Kheir, Rania, E-mail: rania.boukheir@agrsci.d [Lebanese University, Faculty of Letters and Human Sciences, Department of Geography, GIS Research Laboratory, P.O. Box 90-1065, Fanar (Lebanon); Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Greve, Mogens H. [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark); Abdallah, Chadi [National Council for Scientific Research, Remote Sensing Center, P.O. Box 11-8281, Beirut (Lebanon); Dalgaard, Tommy [Department of Agroecology and Environment, Faculty of Agricultural Sciences (DJF), Aarhus University, Blichers Alle 20, P.O. Box 50, DK-8830 Tjele (Denmark)

    2010-02-15

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  12. Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon

    International Nuclear Information System (INIS)

    Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. - GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.

  13. Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference

    OpenAIRE

    Tran, Tung; Yang, Bo-Suk; Oh, Myung-Suck; Tan, Andy Chit Chiow

    2009-01-01

    This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and le...

  14. Cost Effectiveness of Imiquimod 5% Cream Compared with Methyl Aminolevulinate-Based Photodynamic Therapy in the Treatment of Non-Hyperkeratotic, Non-Hypertrophic Actinic (Solar) Keratoses: A Decision Tree Model

    OpenAIRE

    Wilson, Edward C F

    2010-01-01

    Background: Actinic keratosis (AK) is caused by chronic exposure to UV radiation (sunlight). First-line treatments are cryosurgery, topical 5-fluorouracil (5-FU) and topical diclofenac. Where these are contraindicated or less appropriate, alternatives are imiquimod and photodynamic therapy (PDT). Objective: To compare the cost effectiveness of imiquimod and methyl aminolevulinate-based PDT (MAL-PDT) from the perspective of the UK NHS. Methods: A decision tree model was populated with data fro...

  15. Case Study on High Dimensional Data Analysis Using Decision Tree Model

    OpenAIRE

    Smitha.T; Sundaram, V.

    2012-01-01

    The major aspire of this paper is to build a model to predict the chances of occurrences of disease in an area. This paper mainly concentrating the data mining technique-Decision tree model to identify the significant parameters for prediction process. The decision tree model created with the help of ID3 algorithm.

  16. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    Science.gov (United States)

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  17. A DECISION TREE-BASED CLASSIFICATION APPROACH TO RULE EXTRACTION FOR SECURITY ANALYSIS

    OpenAIRE

    Ren, N.; M. ZARGHAM; Rahimi, S.

    2006-01-01

    Stock selection rules are extensively utilized as the guideline to construct high performance stock portfolios. However, the predictive performance of the rules developed by some economic experts in the past has decreased dramatically for the current stock market. In this paper, C4.5 decision tree classification method was adopted to construct a model for stock prediction based on the fundamental stock data, from which a set of stock selection rules was derived. The experimental results showe...

  18. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    Science.gov (United States)

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  19. Computer Crime Forensics Based on Improved Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Ying Wang

    2014-04-01

    Full Text Available To find out the evidence of crime-related evidence and association rules among massive data, the classic decision tree algorithms such as ID3 for classification analysis have appeared in related prototype systems. So how to make it more suitable for computer forensics in variable environments becomes a hot issue. When selecting classification attributes, ID3 relies on computation of information entropy. Then the attributes owning more value are selected as classification nodes of the decision tress. Such classification is unrealistic under many cases. During the process of ID3 algorithm there are too many logarithms, so it is complicated to handle with the dataset which has various classification attributes. Therefore, contraposing the special demand for computer crime forensics, ID3 algorithm is improved and a novel classification attribute selection method based on Maclaurin-Priority Value First method is proposed. It adopts the foot changing formula and infinitesimal substitution to simplify the logarithms in ID3. For the errors generated in this process, an apposite constant is introduced to be multiplied by the simplified formulas for compensation. The idea of Priority Value First is introduced to solve the problems of value deviation. The performance of improved method is strictly proved in theory. Finally, the experiments verify that our scheme has advantage in computation time and classification accuracy, compared to ID3 and two existing algorithms

  20. Decision-tree induction from self-mapping space based on web

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shu-yu; ZHU Zhong-ying

    2007-01-01

    An improved decision tree method for web information retrieval with self-mapping attributes is proposed. The self-mapping tree has a value of self-mapping attribute in its internal node, and information based on dissimilarity between a pair of mapping sequences. This method selects self-mapping which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that the improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that the selfmapping decision tree is promising for data mining and knowledge discovery.

  1. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  2. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models

    Directory of Open Access Journals (Sweden)

    Hu Yun-tao

    2009-09-01

    Full Text Available Abstract Background In recent years, artificial neural network is advocated in modeling complex multivariable relationships due to its ability of fault tolerance; while decision tree of data mining technique was recommended because of its richness of classification arithmetic rules and appeal of visibility. The aim of our research was to compare the performance of ANN and decision tree models in predicting hospital charges on gastric cancer patients. Methods Data about hospital charges on 1008 gastric cancer patients and related demographic information were collected from the First Affiliated Hospital of Anhui Medical University from 2005 to 2007 and preprocessed firstly to select pertinent input variables. Then artificial neural network (ANN and decision tree models, using same hospital charge output variable and same input variables, were applied to compare the predictive abilities in terms of mean absolute errors and linear correlation coefficients for the training and test datasets. The transfer function in ANN model was sigmoid with 1 hidden layer and three hidden nodes. Results After preprocess of the data, 12 variables were selected and used as input variables in two types of models. For both the training dataset and the test dataset, mean absolute errors of ANN model were lower than those of decision tree model (1819.197 vs. 2782.423, 1162.279 vs. 3424.608 and linear correlation coefficients of the former model were higher than those of the latter (0.955 vs. 0.866, 0.987 vs. 0.806. The predictive ability and adaptive capacity of ANN model were better than those of decision tree model. Conclusion ANN model performed better in predicting hospital charges of gastric cancer patients of China than did decision tree model.

  3. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    Science.gov (United States)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  4. Diagnosis of Constant Faults in Read-Once Contact Networks over Finite Bases using Decision Trees

    KAUST Repository

    Busbait, Monther I.

    2014-05-01

    We study the depth of decision trees for diagnosis of constant faults in read-once contact networks over finite bases. This includes diagnosis of 0-1 faults, 0 faults and 1 faults. For any finite basis, we prove a linear upper bound on the minimum depth of decision tree for diagnosis of constant faults depending on the number of edges in a contact network over that basis. Also, we obtain asymptotic bounds on the depth of decision trees for diagnosis of each type of constant faults depending on the number of edges in contact networks in the worst case per basis. We study the set of indecomposable contact networks with up to 10 edges and obtain sharp coefficients for the linear upper bound for diagnosis of constant faults in contact networks over bases of these indecomposable contact networks. We use a set of algorithms, including one that we create, to obtain the sharp coefficients.

  5. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment

    Directory of Open Access Journals (Sweden)

    Hu Yuh-Jyh

    2012-11-01

    Full Text Available Abstract Background Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA, which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. Methods The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. Results The prediction accuracies of total analgesic consumption (continuous dose and PCA dose and PCA analgesic requirement (PCA dose only by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. Conclusion This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA

  6. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  7. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    Science.gov (United States)

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  8. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    Science.gov (United States)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  9. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    OpenAIRE

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe; Bak, Claus Leth; Thøgersen, Paul

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and futureDanish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then ...

  10. The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Bøcher, Peder Klith; Greve, Mette Balslev;

    2010-01-01

    ) topographic parameters were generated from Digital Elevation Models (DEMs) acquired using airborne LIDAR (Light Detection and Ranging) systems. They were used along with existing digital data collected from other sources (soil type, geological substrate and landscape type) to explain organic/mineral field...... distribution of hydromorphic organic landscapes in unsampled area in Denmark. Nine primary (elevation, slope angle, slope aspect, plan curvature, profile curvature, tangent curvature, flow direction, flow accumulation, and specific catchment area) and one secondary (steady-state topographic wetness index...... measurements in hydromorphic landscapes of the Danish area chosen. A large number of tree-based classification models (186) were developed using (1) all of the parameters, (2) the primary DEM-derived topographic (morphological/hydrological) parameters only, (3) selected pairs of parameters and (4) excluding...

  11. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients

    OpenAIRE

    Freitas, Alex. A.; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Background Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug’s distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been...

  12. Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    Jing-Fen Zhang; Si-Min He; Jin-Jin Cai; Xing-Jun Cao; Rui-Xiang Sun; Yan Fu; Rong Zeng; Wen Gao

    2005-01-01

    In this study, we present a preprocessing method for quadrupole time-of-flight(Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.

  13. Hyper-Graph Based Documents Categorization on Knowledge from Decision Trees

    Directory of Open Access Journals (Sweden)

    Merjulah Roby

    2012-03-01

    Full Text Available This document has devised a novel representation that compactly captures a Hyper-graph Partitioning and Clustering of the documents based on the weightages. The approach we take integrates data mining and decision making to improve the effectiveness of the approach, we also present a NeC4.5 decision trees. This algorithm is creating the cluster and sub clusters according to the user query. This project is forming sub clustering in the database. Some of the datas in the database may be efficient one, so we are clustering the datas depending upon the ability.

  14. Dynamic Security Assessment of Danish Power System Based on Decision Trees: Today and Tomorrow

    DEFF Research Database (Denmark)

    Rather, Zakir Hussain; Liu, Leo; Chen, Zhe;

    2013-01-01

    The research work presented in this paper analyzes the impact of wind energy, phasing out of central power plants and cross border power exchange on dynamic security of Danish Power System. Contingency based decision tree (DT) approach is used to assess the dynamic security of present and future...... Danish Power System. Results from offline time domain simulation for large number of possible operating conditions (OC) and critical contingencies are organized to build up the database, which is then used to predict the security of present and future power system. The mentioned approach is implemented...... have significant impact on dynamic security of Danish power system in future, if alternative measures are not considered seriously....

  15. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    Directory of Open Access Journals (Sweden)

    Wan-Yu Chang

    2015-09-01

    Full Text Available In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  16. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    Directory of Open Access Journals (Sweden)

    Loukis Euripides N

    2004-06-01

    Full Text Available Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS and "clear" Mitral Regurgitation (MR using heart sounds. Methods For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases. Results Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that

  17. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  18. Reweighting with Boosted Decision Trees

    CERN Document Server

    Rogozhnikov, A

    2016-01-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the "signal" events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  19. A Decision Tree Based Pedometer and its Implementation on the Android Platform

    Directory of Open Access Journals (Sweden)

    Juanying Lin

    2015-02-01

    Full Text Available This paper describes a decision tree (DT based ped ometer algorithm and its implementation on Android. The DT- based pedometer can classify 3 gai t patterns, including walking on level ground (WLG, up stairs (WUS and down stairs (WDS . It can discard irrelevant motion and count user’s steps accurately. The overall classifi cation accuracy is 89.4%. Accelerometer, gyroscope and magnetic field sensors are used in th e device. When user puts his/her smart phone into the pocket, the pedometer can automatica lly count steps of different gait patterns. Two methods are tested to map the acceleration from mobile phone’s reference frame to the direction of gravity. Two significant features are employed to classify different gait patterns.

  20. A Genetic Algorithm Optimized Decision Tree-SVM based Stock Market Trend Prediction System

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2010-12-01

    Full Text Available Prediction of stock market trends has been an area of great interest both to researchers attempting to uncover the information hidden in the stock market data and for those who wish to profit by trading stocks. The extremely nonlinear nature of the stock market data makes it very difficult to design a system that can predict the future direction of the stock market with sufficient accuracy. This work presents a data mining based stock market trend prediction system, which produces highly accurate stock market forecasts. The proposed system is a genetic algorithm optimized decision tree-support vector machine (SVM hybrid, which can predict one-day-ahead trends in stockmarkets. The uniqueness of the proposed system lies in the use ofthe hybrid system which can adapt itself to the changing market conditions and in the fact that while most of the attempts at stockmarket trend prediction have approached it as a regression problem, present study converts the trend prediction task into a classification problem, thus improving the prediction accuracysignificantly. Performance of the proposed hybrid system isvalidated on the historical time series data from the Bombaystock exchange sensitive index (BSE-Sensex. The system performance is then compared to that of an artificial neural network (ANN based system and a naïve Bayes based system. It is found that the trend prediction accuracy is highest for the hybrid system and the genetic algorithm optimized decision tree- SVM hybrid system outperforms both the artificial neural network and the naïve bayes based trend prediction systems.

  1. Fault diagnosis method for nuclear power plant based on decision tree and neighborhood rough sets

    International Nuclear Information System (INIS)

    Nuclear power plants (NPP) are very complex system, which need to collect and monitor vast parameters. It's hard to diagnose the faults. A parameter reduction method based on neighborhood rough sets was proposed according to the problem. Granular computing was realized in a real space, so numerical parameters could be directly processed. On this basis, the decision tree was applied to learn from training samples which were the typical faults of nuclear power plant, i.e., loss of coolant accident, feed water pipe rupture,steam generator tube rupture, main steam pipe rupture, and diagnose by using the acquired knowledge. Then the diagnostic results were compared with the results of support vector machine. The simulation results show that this method can rapidly and accurately diagnose the above mentioned faults of the NPP. (authors)

  2. A New Architecture for Making Moral Agents Based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Meisam Azad-Manjiri

    2014-04-01

    Full Text Available Regarding to the influence of robots in the various fields of life, the issue of trusting to them is important, especially when a robot deals with people directly. One of the possible ways to get this confidence is adding a moral dimension to the robots. Therefore, we present a new architecture in order to build moral agents that learn from demonstrations. This agent is based on Beauchamp and Childress’s principles of biomedical ethics (a type of deontological theory and uses decision tree algorithm to abstract relationships between ethical principles and morality of actions. We apply this architecture to build an agent that provides guidance to health care workers faced with ethical dilemmas. Our results show that the agent is able to learn ethic well.

  3. A hybrid model using decision tree and neural network for credit scoring problem

    Directory of Open Access Journals (Sweden)

    Amir Arzy Soltan

    2012-08-01

    Full Text Available Nowadays credit scoring is an important issue for financial and monetary organizations that has substantial impact on reduction of customer attraction risks. Identification of high risk customer can reduce finished cost. An accurate classification of customer and low type 1 and type 2 errors have been investigated in many studies. The primary objective of this paper is to develop a new method, which chooses the best neural network architecture based on one column hidden layer MLP, multiple columns hidden layers MLP, RBFN and decision trees and ensembling them with voting methods. The proposed method of this paper is run on an Australian credit data and a private bank in Iran called Export Development Bank of Iran and the results are used for making solution in low customer attraction risks.

  4. Integrating individual trip planning in energy efficiency – Building decision tree models for Danish fisheries

    DEFF Research Database (Denmark)

    Bastardie, Francois; Nielsen, J. Rasmus; Andersen, Bo Sølgaard;

    2013-01-01

    integrate detailed information on vessel distribution, catch and fuel consumption for different fisheries with a detailed resource distribution of targeted stocks from research surveys to evaluate the optimum consumption and efficiency to reduce fuel costs and the costs of displacement of effort. The energy...... hypothetical conditions influencing their trip decisions, covering the duration of fishing time, choice of fishing ground(s), when to stop fishing and return to port, and the choice of the port for landing. Fleet-based energy and economy efficiency are linked to the decision (choice) dynamics. Larger fuel...... efficiency for the value of catch per unit of fuel consumed is analysed by merging the questionnaire, logbook and VMS (vessel monitoring system) information. Logic decision trees and conditional behaviour probabilities are established from the responses of fishermen regarding a range of sequential...

  5. MODELLING AND IMPLEMENTATION OF DECISION TREE-BASED CONSUMPTION BEHAVIOUR FACTORS%基于决策树的消费行为因素建模与实现

    Institute of Scientific and Technical Information of China (English)

    黎旭; 李国和; 吴卫江; 洪云峰; 刘智渊; 程远

    2015-01-01

    消费行为因素分析对产品生产和销售具有重要指导作用. 为了利用消费者的消费数据进行消费行为建模和分析,首先进行消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达. 然后在消费客户交易数据集上定义信息增益率,反映消费因素的分类能力. 在C4 .5算法基础上,改进二分法为多分法,对连续型属性(因素)进行离散化,并建立决策树. 决策树每一分支构成决策规则,反映消费者的消费因素之间的依赖关系. 每条规则的统计信息表示决策规则的不确定性. 采用Web体系架构,以Oracle为数据库,实现了消费行为建模与分析系统,该系统不仅消费行为模型分析精度高,而且具有高效性和友好性.%The analysis on consumption behaviour factors plays an important guiding role on production and sales of products.In order to use consumers' consumption data to model and analyse the consumption behaviours, first the formalised presentation of consumption data is made to form the consumer transaction data sets and the transaction statistics expression.Then, on consumer transaction data sets the information gain-ratio is defined to reflect the classification ability of the consumption factors.On the basis of C4.5 algorithm, the bi-segmentation is improved to multi-segmentation, the discretisation is applied to continuous attributes ( namely factors) , and the decision tree is constructed as well.Each branch of the decision tree forms a decision rule which reflects the dependency relationship between the consumption factors of consumer.Statistical information of each rule expresses the uncertainty of the decision rule.By means of WEB architecture and using Oracle as database, the modelling and analysis system of consumption behaviour is implemented, which not only has high accuracy in consumption behaviour model analysis, but is also high efficient and friendly.

  6. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    Science.gov (United States)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  7. FPGA-Based Network Traffic Security:Design and Implementation Using C5.0 Decision Tree Classifier

    Institute of Scientific and Technical Information of China (English)

    Tarek Salah Sobh; Mohamed Ibrahiem Amer

    2013-01-01

    In this work, a hardware intrusion detection system (IDS) model and its implementation are introduced to perform online real-time traffic monitoring and analysis. The introduced system gathers some advantages of many IDSs: hardware based from implementation point of view, network based from system type point of view, and anomaly detection from detection approach point of view. In addition, it can detect most of network attacks, such as denial of services (DoS), leakage, etc. from detection behavior point of view and can detect both internal and external intruders from intruder type point of view. Gathering these features in one IDS system gives lots of strengths and advantages of the work. The system is implemented by using field programmable gate array (FPGA), giving a more advantages to the system. A C5.0 decision tree classifier is used as inference engine to the system and gives a high detection ratio of 99.93%.

  8. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Science.gov (United States)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  9. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    OpenAIRE

    Dong-sheng Liu; Shu-jiang Fan

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context cla...

  10. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  11. Objective consensus from decision trees

    International Nuclear Information System (INIS)

    Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties

  12. 基于决策树技术的预离网客户识别模型%Identifying Model for Anticipated Communication Service-discontinuing Customers Based on Decision Tree Technology

    Institute of Scientific and Technical Information of China (English)

    李智勇; 冷夔

    2011-01-01

    The loss of customers will directly impact the survival and development of telecom enterprises,therefore,it is necessary to use the data to develop technology,identify anticipated communication service-discontinuing customers by establishing forecast test model as well as carrying out effective measures to retain.Taking the CRISP-DM(Cross-Industry Standard Process for Data Mining) as a tool,from aspects of business understanding,data understanding,data preparation,establishment of model,model evaluation and outcome arrangement,the method of establishing the model for identifying anticipated communication service-discontinuing customers was discussed in detail.Decision tree node model was used as data mining tool and technology to establish the identifying model.The model has played a positive role in the process of retaining works for communication customers and achieved good effect.%客户流失将直接影响到通信运营企业的生存与发展.对此,需要利用数据挖掘技术通过建立预测模型,将有离网倾向的客户(预离网客户)识别出来,并采用有效措施进行保有.以CRISP-DM(跨行业数据挖掘过程标准)为工具,从商业理解、数据理解、数据准备、建立模型、模型评估和结果部署6个阶段,详细阐述了预离网客户识别模型的构建方法,并以决策树节点模型作为数据挖掘工具及数据挖掘技术来建立预离网客户识别模型.预离网客户识别模型已经在移动客户保有工作当中起到了积极的作用,并取得了良好的实际效果.

  13. Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models

    OpenAIRE

    Dieu Tien Bui; Biswajeet Pradhan; Owe Lofman; Inge Revhaug

    2012-01-01

    The objective of this study is to investigate and compare the results of three data mining approaches, the support vector machines (SVM), decision tree (DT), and Naïve Bayes (NB) models for spatial prediction of landslide hazards in the Hoa Binh province (Vietnam). First, a landslide inventory map showing the locations of 118 landslides was constructed from various sources. The landslide inventory was then randomly partitioned into 70% for training the models and 30% for the model validation....

  14. A Data Mining Algorithm Based on Distributed Decision-Tree in Grid Computing Environments

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Yanfeng Hong; Kun Deng

    2006-01-01

    Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree, which has taken the advantage of conveniences and services supplied by the computing platform-grid, and can perform a data mining of distributed classification on grid.

  15. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    OpenAIRE

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe; Bak, Claus Leth; Thøgersen, Paul

    2013-01-01

    Decision Trees (DT) based security assessment helps Power System Operators (PSO) by providing them with the most significant system attributes and guiding them in implementing the corresponding emergency control actions to prevent system insecurity and blackouts. DT is obtained offline from time-domain simulation and the process of data mining, which is then implemented online as guidelines for preventive control schemes. An algorithm named Classification and Regression Trees (CART) is used t...

  16. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    Science.gov (United States)

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  17. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Rohulla Kosari Langari

    2014-02-01

    Full Text Available Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential opportunity the fundamental pillars and determinates of e-banking that in cyber space has been faced with various obstacles and threats. One of this challenge is complete uncertainty in security guarantee of financial transactions also exist of suspicious and unusual behavior with mail fraud for financial abuse. Now various systems because of intelligence mechanical methods and data mining technique has been designed for fraud detection in users’ behaviors and applied in various industrial such as insurance, medicine and banking. Main of article has been recognizing of unusual users behaviors in e-banking system. Therefore, detection behavior user and categories of emerged patterns to paper the conditions for predicting unauthorized penetration and detection of suspicious behavior. Since detection behavior user in internet system has been uncertainty and records of transactions can be useful to understand these movement and therefore among machine method, decision tree technique is considered common tool for classification and prediction, therefore in this research at first has determinate banking effective variable and weight of everything in internet behaviors production and in continuation combining of various behaviors manner draw out such as the model of inductive rules to provide ability recognizing of different behaviors. At least trend of four algorithm Chaid, ex_Chaid, C4.5, C5.0 has compared and evaluated for classification and detection of exist

  18. Teratozoospermia Classification Based on the Shape of Sperm Head Using OTSU Threshold and Decision Tree

    Directory of Open Access Journals (Sweden)

    Masdiyasa I Gede Susrama

    2016-01-01

    Full Text Available Teratozoospermia is one of the results of expert analysis of male infertility, by conducting lab tests microscopically to determine the morphology of spermatozoa, one of which is the normal and abnormal form of the head of spermatozoa. The laboratory test results are in the form of a complete image of spermatozoa. In this study, the shape of the head of spermatozoa was taken from a WHO standards book. The pictures taken had a fairly clear imaging and still had noise, thus to differentiate between the head of normal and abnormal spermatozoa, several processes need to be performed, which include: a pre-process or image adjusting, a threshold segmentation process using Otsu threshold method, and a classification process using a decision tree. Training and test data are presented in stages, from 5 to 20 data. Test results of using Otsu segmentation and a decision tree produced different errors in each level of training data, which were 70%, 75%, and 80% for training data of size 5×2, 10×2, and 20×2, respectively, with an average error of 75%. Thus, this study of using Otsu threshold segmentation and a Decision Tree can classify the form of the head of spermatozoa as abnormal or Normal

  19. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    Science.gov (United States)

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising. PMID:25865524

  20. An Applied Research of Decision Tree Algorithm in Track and Field Equipment Training

    OpenAIRE

    Liu Shaoqing; Wang Kebin

    2015-01-01

    This paper has conducted a study on the applications of track and field equipment training based on ID3 algorithm of decision tree model. For the selection of the elements used by decision tree, this paper can be divided into track training equipment, field events training equipment and auxiliary training equipment according to the properties of track and field equipment. The decision tree that regards track training equipment as root nodes has been obtained under the conditions of lowering c...

  1. Skin autofluorescence based decision tree in detection of impaired glucose tolerance and diabetes.

    Directory of Open Access Journals (Sweden)

    Andries J Smit

    Full Text Available AIM: Diabetes (DM and impaired glucose tolerance (IGT detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE which are considered to be a carrier of glycometabolic memory. We compared SAF and a SAF-based decision tree (SAF-DM with fasting plasma glucose (FPG and HbA1c, and additionally with the Finnish Diabetes Risk Score (FINDRISC questionnaire±FPG for detection of oral glucose tolerance test (OGTT- or HbA1c-defined IGT and diabetes in intermediate risk persons. METHODS: Participants had ≥1 metabolic syndrome criteria. They underwent an OGTT, HbA1c, SAF and FINDRISC, in adition to SAF-DM which includes SAF, age, BMI, and conditional questions on DM family history, antihypertensives, renal or cardiovascular disease events (CVE. RESULTS: 218 persons, age 56 yr, 128M/90F, 97 with previous CVE, participated. With OGTT 28 had DM, 46 IGT, 41 impaired fasting glucose, 103 normal glucose tolerance. SAF alone revealed 23 false positives (FP, 34 false negatives (FN (sensitivity (S 68%; specificity (SP 86%. With SAF-DM, FP were reduced to 18, FN to 16 (5 with DM (S 82%; SP 89%. HbA1c scored 48 FP, 18 FN (S 80%; SP 75%. Using HbA1c-defined DM-IGT/suspicion ≥6%/42 mmol/mol, SAF-DM scored 33 FP, 24 FN (4 DM (S76%; SP72%, FPG 29 FP, 41 FN (S71%; SP80%. FINDRISC≥10 points as detection of HbA1c-based diabetes/suspicion scored 79 FP, 23 FN (S 69%; SP 45%. CONCLUSION: SAF-DM is superior to FPG and non-inferior to HbA1c to detect diabetes/IGT in intermediate-risk persons. SAF-DM's value for diabetes/IGT screening is further supported by its established performance in predicting diabetic complications.

  2. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    Science.gov (United States)

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers. PMID:27386995

  3. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions

    Science.gov (United States)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3- contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3- pollution activities via an unsupervised learning algorithm based on δ15N- and δ18O-NO3- and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3- contamination via a decision tree model. When a combination of δ15N-, δ18O-NO3- and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO42 - and Cl- variables. The NO3- and the δ15N- and δ18O-NO3- variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3- concentrations and isotopic values. Although only the SO42 - and Cl- were selected as important discriminating variables, concentration data alone could not identify the specific NO3- sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3- contamination, an integrated approach should be set-up by combining N and O isotopes of NO3- with land-uses and physico-chemical properties, especially in areas with complex agricultural activities.

  4. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    International Nuclear Information System (INIS)

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies

  5. Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM

    Directory of Open Access Journals (Sweden)

    Shweta Tiwari

    2012-06-01

    Full Text Available Around the world, trading in the stock market has gained huge attractiveness as a means through which, one can obtain vast profits. Attempting to profitably and precisely predict the financial market has long engrossed the interests and attention of bankers, economists and scientists alike. Stock market prediction is the act of trying, to determine the future value of a company’s stock or other financial instrument traded on a financial exchange. Accurate stock market predictions are important for many reasons. Chief among all is the need for investors, to hedge against potential market risks and the opportunities for arbitrators and speculators, to make profits by trading indexes. Stock Market is a place, where shares are issued and traded. These shares are either traded through Stock exchanges or Overthe-Counter in physical or electronic form. Data mining, as a process of discovering useful patterns, correlations has its own role in financial modeling. Data mining is a discipline in computational intelligence that deals with knowledge discovery, data analysis and full and semi-autonomous decision making. Prediction of stock market by data mining techniques has been receiving a lot of attention recently. This paper presents a hybrid system based on decision tree- rough set, for predicting the trends in the Bombay Stock Exchange (BSESENSEX with the combination of Hierarchical Hidden Markov Model. In this paper we present future trends on the bases of price earnings and dividend. The data on accounting earnings when averaged over many years help to predict the present value of future dividends.

  6. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    Science.gov (United States)

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. PMID:26796566

  7. Childhood Cancer-a Hospital based study using Decision Tree Techniques

    Directory of Open Access Journals (Sweden)

    K. Kalaivani

    2011-01-01

    Full Text Available Problem statement: Cancer is generally regarded as a disease of adults. But there being a higher proportion of childhood cancer (ALL-Acute Lymphoblastic Leukemia in India. The incidence of childhood cancer has increased over the last 25 years, but the increase is much larger in females. The aim was to increase our understanding of the determinants of south Indian parental reactions and needs. This facilitates the development of the care and follow-up routines for families, paying attention to both individual risk and resilience factors and to ways in which limitations related to treatment centre and organizational characteristics could be compensated. Approach: Decision Trees may be used for classification, clustering, affinity, grouping, prediction or estimation and description. One of the useful medical applications in India is the management of Leukemia, as it accounts for about 33% of childhood malignancies. Results: Female survivors showed greater functional disability in comparison to male survivors-demonstrated by poorer overall health status. Family stress results from a perceived imbalance between the demands on the family and the resources available to meet such demands. Conclusion: The pattern and severity of health and functional outcomes differed significantly between survivors in diagnostic subgroups. Family impact was aggravated by patients’ lasting sequelae and by parent perceived shortcomings of long-term follow-up. Female survivors were at greater risk for health related late effects.

  8. Decision tree methods:applicaitons for classiifcaiton and prediciton

    Institute of Scientific and Technical Information of China (English)

    Yan-yan SONG; Ying LU

    2015-01-01

    Summary:Decision tree methodology is a commonly used data mining method for establishing classiifcaiton systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can effciently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validaiton datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the opitmal ifnal model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  9. Algorithms for Decision Tree Construction

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.

  10. Design of a new hybrid artificial neural network method based on decision trees for calculating the Froude number in rigid rectangular channels

    Directory of Open Access Journals (Sweden)

    Ebtehaj Isa

    2016-09-01

    Full Text Available A vital topic regarding the optimum and economical design of rigid boundary open channels such as sewers and drainage systems is determining the movement of sediment particles. In this study, the incipient motion of sediment is estimated using three datasets from literature, including a wide range of hydraulic parameters. Because existing equations do not consider the effect of sediment bed thickness on incipient motion estimation, this parameter is applied in this study along with the multilayer perceptron (MLP, a hybrid method based on decision trees (DT (MLP-DT, to estimate incipient motion. According to a comparison with the observed experimental outcome, the proposed method performs well (MARE = 0.048, RMSE = 0.134, SI = 0.06, BIAS = -0.036. The performance of MLP and MLP-DT is compared with that of existing regression-based equations, and significantly higher performance over existing models is observed. Finally, an explicit expression for practical engineering is also provided.

  11. Application of portfolio theory in decision tree analysis.

    Science.gov (United States)

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation. PMID:1894809

  12. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    Science.gov (United States)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 ‑3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for

  13. Genetic program based data mining of fuzzy decision trees and methods of improving convergence and reducing bloat

    Science.gov (United States)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  14. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree.

    Science.gov (United States)

    Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung

    2014-10-01

    The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame. PMID:25119239

  15. Method for Walking Gait Identification in a Lower Extremity Exoskeleton based on C4.5 Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Qing Guo

    2015-04-01

    Full Text Available A gait identification method for a lower extremity exoskeleton is presented in order to identify the gait sub-phases in human-machine coordinated motion. First, a sensor layout for the exoskeleton is introduced. Taking the difference between human lower limb motion and human-machine coordinated motion into account, the walking gait is divided into five sub-phases, which are ‘double standing’, ‘right leg swing and left leg stance’, ‘double stance with right leg front and left leg back’, ‘right leg stance and left leg swing’, and ‘double stance with left leg front and right leg back’. The sensors include shoe pressure sensors, knee encoders, and thigh and calf gyroscopes, and are used to measure the contact force of the foot, and the knee joint angle and its angular velocity. Then, five sub-phases of walking gait are identified by a C4.5 decision tree algorithm according to the data fusion of the sensors’ information. Based on the simulation results for the gait division, identification accuracy can be guaranteed by the proposed algorithm. Through the exoskeleton control experiment, a division of five sub-phases for the human-machine coordinated walk is proposed. The experimental results verify this gait division and identification method. They can make hydraulic cylinders retract ahead of time and improve the maximal walking velocity when the exoskeleton follows the person’s motion.

  16. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    OpenAIRE

    Wan-Yu Chang; Chung-Cheng Chiu; Jia-Horng Yang

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and in...

  17. Skin Autofluorescence Based Decision Tree in Detection of Impaired Glucose Tolerance and Diabetes

    NARCIS (Netherlands)

    Smit, Andries J.; Smit, Jitske M.; Botterblom, Gijs J.; Mulder, Douwe J.

    2013-01-01

    Aim: Diabetes (DM) and impaired glucose tolerance (IGT) detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF) is a noninvasive proxy of tissue accumulation of advanced glycation endproducts (AGE) which are considered to be a carrier of glycometabolic memory. We compare

  18. A Decision Tree Based Word Sense Disambiguation System in Manipuri Language

    OpenAIRE

    Richard Laishram Singh; Krishnendu Ghosh1; Kishorjit Nongmeikapam; Sivaji Bandyopadhyay

    2014-01-01

    This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri language. The paper discusses related attempts made in the Manipuri language followed by the proposed plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study. Conventional positional and context based features are suggested to capture the sense of the words, which have ambiguous and multiple senses. The proposed work is expected ...

  19. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe; Lund, Per

    with online wide-area measurement data, it is capable of not only predicting the security states of current operating conditions (OC) with high accuracy, but also indicating the confidence of the security states 1 minute ahead of the real time by an outlier identification method. The results of EDT...... together with outlier identification show high accuracy in the presence of variance and uncertainties due to wind power generation and other dispersed generation units. The performance of this approach is demonstrated on the operational model of western Danish power system with the scale of around 200...

  20. A Decision Tree Based Word Sense Disambiguation System in Manipuri Language

    Directory of Open Access Journals (Sweden)

    Richard Laishram Singh

    2014-07-01

    Full Text Available This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri language. The paper discusses related attempts made in the Manipuri language followed by the proposed plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study. Conventional positional and context based features are suggested to capture the sense of the words, which have ambiguous and multiple senses. The proposed work is expected to predict the senses of the polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The system produces an accuracy of 71.75 %.

  1. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  2. Building Customers` Credit Scoring Models with Combination of Feature Selection and Decision Tree Algorithms

    Directory of Open Access Journals (Sweden)

    Zahra Davoodabadi

    Full Text Available Today`s financial transactions have been increased through banks and financial institutions. Therefore, credit scoring is a critical task to forecast the customers’ credit. We have created 9 different models for the credit scoring by combining three metho ...

  3. Tailored approach in inguinal hernia repair – Decision tree based on the guidelines

    Directory of Open Access Journals (Sweden)

    FerdinandKöckerling

    2014-06-01

    Full Text Available The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society and the European Association of Endoscopic Surgery. 82 % of experienced hernia surgeons use the "tailored approach", the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anaesthesia possible, recurrence and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarized the recommendations of the various guidelines and to transfer them into a practical dicision tree for the daily work of surgeons performing inguinal hernia repair.

  4. A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree

    Directory of Open Access Journals (Sweden)

    Suresh Ramakrishnan

    2015-04-01

    Full Text Available The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector indicators. The results show significant relationship between probability of default and sector indicators. The results of this study may improve the default prediction models performance and reduce the costs of risk management.

  5. Modelling alcohol consumption during adolescence using zero inflated negative binomial and decision trees

    Directory of Open Access Journals (Sweden)

    Alfonso Palmer

    2010-07-01

    Full Text Available Alcohol is currently the most consumed substance among the Spanish adolescent population. Some of the variables that bear an influence on this consumption include ease of access, use of alcohol by friends and some personality factors. The aim of this study was to analyze and quantify the predictive value of these variables specifically on alcohol consumption in the adolescent population. The useful sample was made up of 6,145 adolescents (49.8% boys and 50.2% girls with a mean age of 15.4 years (SE= 1.2. The data were analyzed using the statistical model for a count variable and Data Mining techniques. The results show the influence of ease of access, alcohol consumption by the group of friends, and certain personality factors on alcohol intake, allowing us to quantify the intensity of this influence according to age and gender. Knowing these factors is the starting point in elaborating specific preventive actions against alcohol consumption.

  6. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    Directory of Open Access Journals (Sweden)

    Wided Khiari

    2013-09-01

    Full Text Available This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through the nature (type of information disclosed. Applying the decision tree method, the obtained Tree diagrams provide ways to know the characteristics of a particular firm regardless of its level of disclosure. Obtained results show that the characteristics of corporate governance to achieve good quality of disclosure are not unique for all firms. These structures are not necessarily all of the recommendations of best practices, but converge towards the best combination. Indeed, in practice, there are companies which have a good quality of disclosure but are not well governed. However, we hope that by improving their governance system their level of disclosure may be better. These findings show, in a general way, a convergence towards the standards of corporate governance with a few exceptions related to the specificity of Tunisian listed firms and show the need for the adoption of a code for each context. These findings shed the light on corporate governance features that enhance incentives for good disclosure. It allows identifying, for each firm and in any date, corporate governance determinants of disclosure quality. More specifically, and all being equal, obtained tree makes a rule of decision for the company to know the level of disclosure based on certain characteristics of the governance strategy adopted by the latter.

  7. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Directory of Open Access Journals (Sweden)

    A. Khader

    2012-12-01

    Full Text Available Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i ignore the health risk of nitrate contaminated water, (ii switch to alternative water sources such as bottled water, or (iii implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012. The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  8. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    Science.gov (United States)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  9. Decision trees with minimum average depth for sorting eight elements

    KAUST Repository

    AbouEisha, Hassan

    2015-11-19

    We prove that the minimum average depth of a decision tree for sorting 8 pairwise different elements is equal to 620160/8!. We show also that each decision tree for sorting 8 elements, which has minimum average depth (the number of such trees is approximately equal to 8.548×10^326365), has also minimum depth. Both problems were considered by Knuth (1998). To obtain these results, we use tools based on extensions of dynamic programming which allow us to make sequential optimization of decision trees relative to depth and average depth, and to count the number of decision trees with minimum average depth.

  10. Remote Sensing Image Classification Based on Decision Tree in the Karst Rocky Desertification Areas: A Case Study of Kaizuo Township

    Institute of Scientific and Technical Information of China (English)

    Shuyong; MA; Xinglei; ZHU; Yulun; AN

    2014-01-01

    Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.

  11. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  12. 基于LBP和SVM决策树的人脸表情识别%Facial Expression Recognition Based on LBP and SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    李扬; 郭海礁

    2014-01-01

    为了提高人脸表情识别的识别率,提出一种LBP和SVM决策树相结合的人脸表情识别算法。首先利用LBP算法将人脸表情图像转换为LBP特征谱,然后将LBP特征谱转换成LBP直方图特征序列,最后通过SVM决策树算法完成人脸表情的分类和识别,并且在JAFFE人脸表情库的识别中证明该算法的有效性。%In order to improve the recognition rate of facial expression, proposes a facial expression recognition algorithm based on a LBP and SVM decision tree. First facial expression image is converted to LBP characteristic spectrum using LBP algorithm, and then the LBP character-istic spectrum into LBP histogram feature sequence, finally completes the classification and recognition of facial expression by SVM deci-sion tree algorithm, and proves the effectiveness of the proposed method in the recognition of facial expression database in JAFFE.

  13. 基于粗糙变精度的食品安全决策树研究%Research on Decision Tree for Food Safety Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    鄂旭; 任骏原; 毕嘉娜; 沈德海

    2014-01-01

    Food safety decision is an important content of food safety research. Based on variable precision rough sets model,a method of building decision tree with rules that have definite confidence is proposed for food safety analysis. It is an improvement for decision tree inducing approach presented in traditional methods. Present a new algorithm for constructing decision tree with variable precision weighted mean roughness as the criteria for selecting attribute. The new algorithm used variable precision approximate accuracy instead the approxi-mate accuracy. Noisy data of training sets are considered enough. Limited inconsistency is allowed to existed examples of the positive re-gions. So the decision tree is simplified and its extensive ability is improved and more comprehensible. Experiments show that the algo-rithm is feasible and effective.%食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。

  14. Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System

    Directory of Open Access Journals (Sweden)

    Jashan Koshal

    2012-08-01

    Full Text Available Main reason for the attack being introduced to the system is because of popularity of the internet. Information security has now become a vital subject. Hence, there is an immediate need to recognize and detect the attacks. Intrusion Detection is defined as a method of diagnosing the attack and the sign of malicious activity in a computer network by evaluating the system continuously. The software that performs such task can be defined as Intrusion Detection Systems (IDS. System developed with the individual algorithms like classification, neural networks, clustering etc. gives good detection rate and less false alarm rate. Recent studies show that the cascading of multiple algorithm yields much better performance than the system developed with the single algorithm. Intrusion detection systems that uses single algorithm, the accuracy and detection rate were not up to mark. Rise in the false alarm rate was also encountered. Cascading of algorithm is performed to solve this problem. This paper represents two hybrid algorithms for developing the intrusion detection system. C4.5 decision tree and Support Vector Machine (SVM are combined to maximize the accuracy, which is the advantage of C4.5 and diminish the wrong alarm rate which is the advantage of SVM. Results show the increase in the accuracy and detection rate and less false alarm rate.

  15. Comparison of greedy algorithms for α-decision tree construction

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.

  16. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  17. A Cost-Sensitive Decision Tree Learning Model—An Application to Customer-Value Based Segmentation%基于代价敏感决策树的客户价值细分

    Institute of Scientific and Technical Information of China (English)

    邹鹏; 莫佳卉; 江亦华; 叶强

    2011-01-01

    The objective of this research is to extend the current decision tree learning model, to handle data sets with unequal misclassification costs.The research explores the issue of asymmetric misclassification costs through an application to customer-value based segmentation using empirical data collected from one of the largest credit card issuing banks in China.The data includes attributes from customer satisfaction survey and credit card transaction history is used to validate the proposed model.The results show that the proposed cost-sensitive decision tree for customer-value based segmentation is an effective method compared to the original decision tree learning model.%由于错误分类代价差异和不同价值客户数量的不平衡分布,基于总体准确率的数据挖掘方法不能体现由于客户价值不同对分类效果带来的影响.为了解决错误分类不平衡的数据分类问题,利用代价敏感学习技术扩展现有决策树模型,将这一方法应用在客户价值细分,建立基于客户价值的错分代价矩阵,以分类代价最小化作为决策树分支的标准,建立分类的期望损失函数作为分类效果的评价标准,采用中国某银行的信用卡客户数据进行实验.实验结果表明,与传统决策树方法相比,代价敏感决策树对客户价值细分问题有更好的分类效果,可以更精确地控制代价敏感性和不同种分类错误的分布,降低总体的错误分类代价,使模型能更准确反映分类的代价,有效识别客户价值

  18. Ensemble of randomized soft decision trees for robust classification

    Indian Academy of Sciences (India)

    G KISHOR KUMAR; P VISWANATH; A ANANDA RAO

    2016-03-01

    For classification, decision trees have become very popular because of its simplicity, interpret-ability and good performance. To induce a decision tree classifier for data having continuous valued attributes, the most common approach is, split the continuous attribute range into a hard (crisp) partition having two or more blocks, using one or several crisp (sharp) cut points. But, this can make the resulting decision tree, very sensitive to noise.An existing solution to this problem is to split the continuous attribute into a fuzzy partition (soft partition) using soft or fuzzy cut points which is based on fuzzy set theory and to use fuzzy decisions at nodes of the tree. Theseare called soft decision trees in the literature which are shown to perform better than conventional decision trees, especially in the presence of noise. Current paper, first proposes to use an ensemble of soft decision trees forrobust classification where the attribute, fuzzy cut point, etc. parameters are chosen randomly from a probability distribution of fuzzy information gain for various attributes and for their various cut points. Further, the paperproposes to use probability based information gain to achieve better results. The effectiveness of the proposed method is shown by experimental studies carried out using three standard data sets. It is found that an ensembleof randomized soft decision trees has outperformed the related existing soft decision tree. Robustness against the presence of noise is shown by injecting various levels of noise into the training set and a comparison is drawnwith other related methods which favors the proposed method.

  19. Detection and Extraction of Videos using Decision Trees

    Directory of Open Access Journals (Sweden)

    Sk.Abdul Nabi

    2011-12-01

    Full Text Available This paper addresses a new multimedia data mining framework for the extraction of events in videos by using decision tree logic. The aim of our DEVDT (Detection and Extraction of Videos using Decision Trees system is for improving the indexing and retrieval of multimedia information. The extracted events can be used to index the videos. In this system we have considered C4.5 Decision tree algorithm [3] which is used for managing both continuous and discrete attributes. In this process, firstly we have adopted an advanced video event detection method to produce event boundaries and some important visual features. This rich multi-modal feature set is filtered by a pre-processing step to clean the noise as well as to reduce the irrelevant data. This will improve the performance of both Precision and Recall. After producing the cleaned data, it will be mined and classified by using a decision tree model. The learning and classification steps of this Decision tree are simple and fast. The Decision Tree has good accuracy. Subsequently, by using our system we will reach maximum Precision and Recall i.e. we will extract pure video events effectively and proficiently.

  20. Relationships for Cost and Uncertainty of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    This chapter is devoted to the design of new tools for the study of decision trees. These tools are based on dynamic programming approach and need the consideration of subtables of the initial decision table. So this approach is applicable only to relatively small decision tables. The considered tools allow us to compute: 1. Theminimum cost of an approximate decision tree for a given uncertainty value and a cost function. 2. The minimum number of nodes in an exact decision tree whose depth is at most a given value. For the first tool we considered various cost functions such as: depth and average depth of a decision tree and number of nodes (and number of terminal and nonterminal nodes) of a decision tree. The uncertainty of a decision table is equal to the number of unordered pairs of rows with different decisions. The uncertainty of approximate decision tree is equal to the maximum uncertainty of a subtable corresponding to a terminal node of the tree. In addition to the algorithms for such tools we also present experimental results applied to various datasets acquired from UCI ML Repository [4]. © Springer-Verlag Berlin Heidelberg 2013.

  1. Meta-learning in decision tree induction

    CERN Document Server

    Grąbczewski, Krzysztof

    2014-01-01

    The book focuses on different variants of decision tree induction but also describes  the meta-learning approach in general which is applicable to other types of machine learning algorithms. The book discusses different variants of decision tree induction and represents a useful source of information to readers wishing to review some of the techniques used in decision tree learning, as well as different ensemble methods that involve decision trees. It is shown that the knowledge of different components used within decision tree learning needs to be systematized to enable the system to generate and evaluate different variants of machine learning algorithms with the aim of identifying the top-most performers or potentially the best one. A unified view of decision tree learning enables to emulate different decision tree algorithms simply by setting certain parameters. As meta-learning requires running many different processes with the aim of obtaining performance results, a detailed description of the experimen...

  2. Short-Time Fourier Transform and Decision Tree-Based Pattern Recognition for Gas Identification Using Temperature Modulated Microhotplate Gas Sensors

    Directory of Open Access Journals (Sweden)

    Aixiang He

    2016-01-01

    Full Text Available Because the sensor response is dependent on its operating temperature, modulated temperature operation is usually applied in gas sensors for the identification of different gases. In this paper, the modulated operating temperature of microhotplate gas sensors combined with a feature extraction method based on Short-Time Fourier Transform (STFT is introduced. Because the gas concentration in the ambient air usually has high fluctuation, STFT is applied to extract transient features from time-frequency domain, and the relationship between the STFT spectrum and sensor response is further explored. Because of the low thermal time constant, the sufficient discriminatory information of different gases is preserved in the envelope of the response curve. Feature information tends to be contained in the lower frequencies, but not at higher frequencies. Therefore, features are extracted from the STFT amplitude values at the frequencies ranging from 0 Hz to the fundamental frequency to accomplish the identification task. These lower frequency features are extracted and further processed by decision tree-based pattern recognition. The proposed method shows high classification capability by the analysis of different concentration of carbon monoxide, methane, and ethanol.

  3. The use of a decision tree based on the rabies diagnosis scenario, to assist the implementation of alternatives to laboratory animals.

    Science.gov (United States)

    Bones, Vanessa C; Molento, Carla Forte Maiolino

    2016-05-01

    Brazilian federal legislation makes the use of alternatives mandatory, when there are validated methods to replace the use of laboratory animals. The objective of this paper is to introduce a novel decision tree (DT)-based approach, which can be used to assist the replacement of laboratory animal procedures in Brazil. This project is based on a previous analysis of the rabies diagnosis scenario, in which we identified certain barriers that hinder replacement, such as: a) the perceived higher costs of alternative methods; b) the availability of staff qualified in these methods; c) resistance to change by laboratory staff; d) regulatory obstacles, including incompatibilities between the Federal Environmental Crimes Act and specific norms and working practices relating to the use of laboratory animals; and e) the lack of government incentives. The DT represents a highly promising means to overcome these reported barriers to the replacement of laboratory animal use in Brazil. It provides guidance to address the main obstacles, and, followed step-by-step, would lead to the implementation of validated alternative methods (VAMs), or their development when such alternatives do not exist. The DT appears suitable for application to laboratory animal use scenarios where alternative methods already exist, such as in the case of rabies diagnosis, and could contribute to increase compliance with the Three Rs principles in science and with the current legal requirements in Brazil. PMID:27256454

  4. Importance Sampling Based Decision Trees for Security Assessment and the Corresponding Preventive Control Schemes: the Danish Case Study

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    and adopts a methodology of importance sampling to maximize the information contained in the database so as to increase the accuracy of DT. Further, this paper also studies the effectiveness of DT by implementing its corresponding preventive control schemes. These approaches are tested on the detailed model...

  5. Influence diagrams and decision trees for severe accident management

    International Nuclear Information System (INIS)

    A review of relevant methodologies based on Influence Diagrams (IDs), Decision Trees (DTs), and Containment Event Trees (CETs) was conducted to assess the practicality of these methods for the selection of effective strategies for Severe Accident Management (SAM). The review included an evaluation of some software packages for these methods. The emphasis was on possible pitfalls of using IDs and on practical aspects, the latter by performance of a case study that was based on an existing Level 2 Probabilistic Safety Assessment (PSA). The study showed that the use of a combined ID/DT model has advantages over CET models, in particular when conservatisms in the Level 2 PSA have been identified and replaced by fair assessments of the uncertainties involved. It is recommended to use ID/DT models complementary to CET models. (orig.)

  6. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    OpenAIRE

    D.Shanth; Dr.G.Sahoo; Dr.N.Saravanan

    2011-01-01

    Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods ...

  7. Representing Boolean Functions by Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    A Boolean or discrete function can be represented by a decision tree. A compact form of decision tree named binary decision diagram or branching program is widely known in logic design [2, 40]. This representation is equivalent to other forms, and in some cases it is more compact than values table or even the formula [44]. Representing a function in the form of decision tree allows applying graph algorithms for various transformations [10]. Decision trees and branching programs are used for effective hardware [15] and software [5] implementation of functions. For the implementation to be effective, the function representation should have minimal time and space complexity. The average depth of decision tree characterizes the expected computing time, and the number of nodes in branching program characterizes the number of functional elements required for implementation. Often these two criteria are incompatible, i.e. there is no solution that is optimal on both time and space complexity. © Springer-Verlag Berlin Heidelberg 2011.

  8. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  9. Comparison of Greedy Algorithms for Decision Tree Optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.

  10. Redundant Data Mining Based on Residual Data Merging in Decision Tree%决策树下引入残差数据合并的冗余数据挖掘

    Institute of Scientific and Technical Information of China (English)

    王倩

    2014-01-01

    提出采用残差数据合并技术的冗余数据优化挖掘算法,利用训练集建立决策树模型,引入C4.5决策树模型进行冗余数据主特征建模,在主分量特征决策树下,引入残差数据合并技术,设定数据残差特征伴随追踪模式,把传统方法中用于滤除的数据信息进行拼接伴随追踪定位,实现了冗余数据特征的优化挖掘。把方法应用到网络流量时间序列数据处理中实现网络异常监测,仿真实验表明,新的数据挖掘算法能有效提取到冗余数据特征作为有用检测特征,数据挖掘效率大幅提高,有效促进了海量数据隐藏特征的挖掘和应用,设计的网络流量监测软件能提高网络管理和监测实效性。%An improved optimization data mining algorithm based on redundant data merging technology was proposed. The training set was used to build the decision tree model, the C4.5 decision tree model was used for redundant data main fea-ture modeling. The accompanied tracking model of residual feature was set, and the information was used for tracking and positioning with data splicing. The optimization of redundant data mining was realized finally. It was applied into the net-work traffic anomaly detection, simulation result shows that improved method can extract the effective redundant data fea-ture as useful feature, and data mining efficiency is improved greatly. It can promote the massive data mining development with using the hidden features. And the designed network traffic monitoring software can improve the effectiveness of net-work management and monitoring.

  11. The Information Extraction of Freshwater Marsh Wetland Based on the Decision Tree Method:Taking Zhalong Wetland as An Example%基于决策树方法的淡水沼泽湿地信息提取——以扎龙湿地为例

    Institute of Scientific and Technical Information of China (English)

    乔艳雯; 臧淑英; 那晓东

    2013-01-01

    In order to achieve timely and accurately basic information about wetland, which can be applied to the dynamic monitoring and protection of the wetland. The author chose zhalong wetland as the research area, during the process of extracting regional remote sensing information by using the TM image data, DEM data, normalized vegetation index, texture information compound identification index, finally the author classified the types of zhanglong wetland through constructing a decision tree model. For checking the feasible degree of method of classification based on decision tree model, the author made a comparison between the traditional maximum classification of supervision and the decision tree model. The results showed that: the decision tree method based on the index was used to classify, classification accuracy increased by 14.6%, overall Kappa coefficient increased by 0.1751, supervised classification accuracy was improved noticeably. Building decision tree classification which adopted multi-source data for extracting information of inland freshwater mire wetland was a very effective approach.%为了及时准确地获取湿地基础信息,对湿地进行动态监测和保护.以扎龙湿地为研究区,以区域湿地遥感信息提取为目标,采用TM影像数据、DEM数据、归一化植被指数、纹理信息等复合识别指标构建决策树模型,对研究区不同地类进行分类.然后与传统的最大监督分类法所得到的结果进行对比.结果表明,采用基于指数的决策树分类方法对扎龙湿地类型进行分类,较传统的最大似然监督分类精度提高了14.6%;总体Kappa系数提高了0.1751,分类精度较监督分类有明显的提高,证明基于多源数据决策树分类方法是内陆淡水沼泽湿地信息提取的有效手段.

  12. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    Directory of Open Access Journals (Sweden)

    Bruno Carneiro da Rocha

    2010-10-01

    Full Text Available This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DMmanagement model of data mining in large operational databases logged from internet banktransactions.

  13. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Energy Technology Data Exchange (ETDEWEB)

    Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru [Saint Petersburg Electrotechnical University “LETI” (Russian Federation)

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  14. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    International Nuclear Information System (INIS)

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article

  15. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    Science.gov (United States)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  16. 基于决策树分类的云南省迪庆地区景观类型研究%Exploring Landscapes Based on Decision Tree Classification in the Diqin Region, Yunnan Province

    Institute of Scientific and Technical Information of China (English)

    李亚飞; 刘高焕; 黄翀

    2011-01-01

    Decision tree classification is a type of supervised classification method based on spatial data mining and knowledge discovery. In this paper, the authors examined the landscape pattern of the Diqin region by building the classification decision tree in Yunnan province and using Landsat TM imagery and digital elevation models (DEMs). Subsequently, a landscape distribution map was made. In order to look at the reliability and robustness of the decision tree classification method,the traditional supervised classification was used to derive a landscape distribution map over the region. A multitude of field sampling points were used to evaluate the accuracy of the two classification methods, covering the whole Diqing region and consisting of information regarding geographic coordinates, elevations, and the description of the major landscape types. Results indicate that the overall classification accuracies of the decision tree classification and the traditional supervised classification were 85.5% and 67.4% , respectively. The landscape distribution map derived by the decision tree classification method seems to be reliable in terms of the achievable accuracy. Several conclusions could be drawn by analyzing the derived landscape distribution map as follows. Landscape types in the Diqin region primarily included valley shrub,coniferous forest, sub alpine shrub meadow, alpine snow and ice, bare land, and water body,accounting for 5.5%, 36.16%, 3.4%, 3.7%, 25.4%, and 4.4% of the Diqin region area, respectively.Except bare land and water body, other landscape types varied essentially with elevation and aspect of maintains. The landscape of the largest area was found to be coniferous forest, which was consistent with the landform of alpine and canyon. Coniferous forest was the major landscape in the region, which was distributed over 3000 m above the sea level. In terms of different elevations,the coniferous forest could be conceptually divided into three

  17. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    CERN Document Server

    Farid, Dewan Md; Rahman, Mohammad Zahidur; 10.5121/ijnsa.2010.2202

    2010-01-01

    In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learn...

  18. The potential impact of improving appropriate treatment for fever on malaria and non-malarial febrile illness management in under-5s: a decision-tree modelling approach.

    Directory of Open Access Journals (Sweden)

    V Bhargavi Rao

    Full Text Available BACKGROUND: As international funding for malaria programmes plateaus, limited resources must be rationally managed for malaria and non-malarial febrile illnesses (NMFI. Given widespread unnecessary treatment of NMFI with first-line antimalarial Artemisinin Combination Therapies (ACTs, our aim was to estimate the effect of health-systems factors on rates of appropriate treatment for fever and on use of ACTs. METHODS: A decision-tree tool was developed to investigate the impact of improving aspects of the fever care-pathway and also evaluate the impact in Tanzania of the revised WHO malaria guidelines advocating diagnostic-led management. RESULTS: Model outputs using baseline parameters suggest 49% malaria cases attending a clinic would receive ACTs (95% Uncertainty Interval:40.6-59.2% but that 44% (95% UI:35-54.8% NMFI cases would also receive ACTs. Provision of 100% ACT stock predicted a 28.9% increase in malaria cases treated with ACT, but also an increase in overtreatment of NMFI, with 70% NMFI cases (95% UI:56.4-79.2% projected to receive ACTs, and thus an overall 13% reduction (95% UI:5-21.6% in correct management of febrile cases. Modelling increased availability or use of diagnostics had little effect on malaria management outputs, but may significantly reduce NMFI overtreatment. The model predicts the early rollout of revised WHO guidelines in Tanzania may have led to a 35% decrease (95% UI:31.2-39.8% in NMFI overtreatment, but also a 19.5% reduction (95% UI:11-27.2%, in malaria cases receiving ACTs, due to a potential fourfold decrease in cases that were untested or tested false-negative (42.5% vs.8.9% and so untreated. DISCUSSION: Modelling multi-pronged intervention strategies proved most effective to improve malaria treatment without increasing NMFI overtreatment. As malaria transmission declines, health system interventions must be guided by whether the management priority is an increase in malaria cases receiving ACTs (reducing the

  19. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    OpenAIRE

    Tran Hoai Linh; Pham Van Nam; Vuong Hoang Nam

    2014-01-01

    The paper presents a new system for ECG (ElectroCardioGraphy) signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron), modified TSK (Takagi-Sugeno-Kang) and the SVM (Support Vector Machine), will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to...

  20. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  1. Application of vector projection method based on decision-tree-based support vector machines in fault diagnosis for transformer%DTBSVM的向量投影法在变压器故障诊断中的应用

    Institute of Scientific and Technical Information of China (English)

    张翠玲; 王大志; 江雪晨; 宁一

    2013-01-01

    By applying vector projection method in fault diagnosis for transformer ,the problem that how to structure effective SVM hierarchy based on decision-tree-based support vector machines (DTBSVM ) is solved . According to the cross situation between classification and classification sample sets ,Euclidean distance and radial basis function are utilized to calculate spatial distance and divisibility measure between different classifi-cations ,and the sequence on the basis of divisibility measure is made to design more reasonable hierarchy structure for classification .The fault diagnosis model combining one-to-rest with rest-to-rest classification is established by using the method of vector projection on decision-tree-based support vector machines ,and it can solve the multi-classification problem better .The method of vector projection aiming at N classification problem just constructs (N-1) SVM classifiers and has no unrecognized sector ,so the classification process is faster and the generalization ability is better .The test results show that correct-sentence rate increases compa-ring with traditional three-ratio method and neural network method in fault diagnosis ,so the method has bet-ter utility value .%文章将向量投影法应用在变压器故障诊断中,解决了如何构建有效SVM 层次的问题。按照类与类样本集之间的相交情况,利用欧氏距离和径向基函数计算类与类的空间距离和类间可分性测度,根据可分性测度进行排序,设计比较合理的层次结构进行分类。这种方法建立的故障诊断模型,是一种一对多、多对多分类相结合的故障诊断模型,用于解决多分类问题效果较好;这种方法对于 N类分类问题,只需构造(N-1)个SVM分类器,并且不存在不可识别的区域,分类过程比较快速,具有较好的泛化能力。实验证明与传统的三比值法和神经网络方法相比,所提出的方法在故障诊断的正判率

  2. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score

    Directory of Open Access Journals (Sweden)

    Andrej Kastrin

    2008-12-01

    Full Text Available Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP. The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.

  3. Efficent-cutting packet classification algorithm based on the statistical decision tree%基于统计的高效决策树分组分类算法

    Institute of Scientific and Technical Information of China (English)

    陈立南; 刘阳; 马严; 黄小红; 赵庆聪; 魏伟

    2014-01-01

    Packet classification algorithms based on decision tree are easy to implement and widely employed in high-speed packet classification. The primary objective of constructing a decision tree is minimal storage and searching time complexity. An improved decision-tree algorithm is proposed based on statistics and evaluation on filter sets. HyperEC algorithm is a multiple dimensional packet classification algorithm. The proposed algorithm allows the tradeoff between storage and throughput during constructing decision tree. For it is not sensitive to IP address length, it is suitable for IPv6 packet classifi-cation as well as IPv4. The algorithm applies a natural and performance-guided decision-making process. The storage budget is preseted and then the best throughput is achieved. The results show that the HyperEC algorithm outperforms the HiCuts and HyperCuts algorithm, improving the storage and throughput performance and scalable to large filter sets.%基于决策树的分组分类算法因易于实现和高效性,在快速分组分类中广泛使用。决策树算法的基本目标是构造一棵存储高效且查找时间复杂度低的决策树。设计了一种基于规则集统计特性和评价指标的决策树算法——HyperEC 算法。HyperEC算法避免了在构建决策树过程中决策树高度过高和存储空间膨胀的问题。HyperEC算法对IP地址长度不敏感,同样适用于IPv6的多维分组分类。实验证明,HyperEC算法当规则数量较少时,与HyperCuts基本相同,但随着规则数量的增加,该算法在决策树高度、存储空间占用和查找性能方面都明显优于经典的决策树算法。

  4. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features

    Directory of Open Access Journals (Sweden)

    D. Mudali

    2015-01-01

    Full Text Available Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson’s disease, multiple system atrophy, and progressive supranuclear palsy compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (fMRI data.

  5. Using Decision Trees for Coreference Resolution

    CERN Document Server

    McCarthy, J F; Carthy, Joseph F. Mc; Lehnert, Wendy G.

    1995-01-01

    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.

  6. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    OpenAIRE

    Ali Idri; Sanaa Elyassami

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the...

  7. DECISION TREE ANALYSIS OF THE PREDICTORS OF INTERNET AFFINITY

    OpenAIRE

    BUBAŠ, Goran; Kliček, Božidar; Hutinski, Željko

    2001-01-01

    A recently developed model of Internet affinity was used for survey design and data collection on variables that have potential influence on affinity for Internet use. A total of 600 Croatian students with access to the Internet at their college participated in this survey. The collected data were used for investigation of the relation between decision tree analysis and regression analysis of predictor variables of Internet affinity. Different predictors were found to influence two distinct c...

  8. Rule Extraction in Transient Stability Study Using Linear Decision Trees

    Institute of Scientific and Technical Information of China (English)

    SUN Hongbin; WANG Kang; ZHANG Boming; ZHAO Feng

    2011-01-01

    Traditional operation rules depend on human experience, which are relatively fixed and difficult to fulfill the new demand of the modern power grid. In order to formulate suitable and quickly refreshed operation rules, a method of linear decision tree based on support samples is proposed for rule extraction in this paper. The operation rules extracted by this method have advantages of refinement and intelligence, which helps the dispatching center meet the requirement of smart grid construction.

  9. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    Directory of Open Access Journals (Sweden)

    Kai-Wei Chiang

    2015-12-01

    Full Text Available Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS in some indoor environments. Pedestrian Dead Reckoning (PDR is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS. Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  10. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    Directory of Open Access Journals (Sweden)

    D.Shanth

    2010-12-01

    Full Text Available Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.

  11. Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision

    Directory of Open Access Journals (Sweden)

    D.Shanthi

    2011-02-01

    Full Text Available Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.

  12. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-07-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  13. Applying Fuzzy ID3 Decision Tree for Software Effort Estimation

    CERN Document Server

    Elyassami, Sanaa

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation; it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used as measures of prediction accuracy for this study. A series of experiments is reported using two different software projects datasets namely, Tukutuku and COCOMO'81 datasets. The results are compared with those produced by the crisp version of the ID3 decision tree.

  14. ASSESSING GAMEPLAY EMOTIONS FROM PHYSIOLOGICAL SIGNALS: A FUZZY DECISION TREES BASED MODEL

    OpenAIRE

    Orero, Joseph Onderi; Levillain, Florent; Damez-Fontaine, Marc; Rifqi, Maria; Bouchon-Meunier, Bernadette

    2010-01-01

    International audience As video games become a widespread form of entertainment, there is need to develop new evaluative methodologies for acknowledging the various aspects of the player's subjective experience, and especially the emotional aspect. Video game developers could benefit from being aware of how the player reacts emotionally to specific game parameters. In this study, we addressed the possibility to record physiological measures on players involved in an action game, with the m...

  15. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  16. Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection

    Directory of Open Access Journals (Sweden)

    Dewan Md. Farid

    2010-04-01

    Full Text Available In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposedalgorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS. We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR andsignificant reduce false positives (FP for different types of network intrusions using limited computational resources

  17. 基于决策树和链接相似的DeepWeb查询接口判定%Deep Web query interface identification based on decision tree and link-similar

    Institute of Scientific and Technical Information of China (English)

    李雪玲; 施化吉; 兰均; 李星毅

    2011-01-01

    针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法.该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口.结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率.%In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished, this paper proposed a Deep Web query interface identification method based on decision tree and link-similar. This method used attribute information gain ratio as selection level, built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features, and then used a new method based on link-similar to identify these unidentified again, distinguishing between Deep Web query interface and the interface of search engines. The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.

  18. STUDY ON DECISION TREE COMPETENT DATA CLASSIFICATION

    OpenAIRE

    Vanitha, A.; S.Niraimathi

    2013-01-01

    Data mining is a process where intelligent methods are applied in order to extract data patterns.This is used in cases of discovering patterns and trends among large datasets. Data classification involvescategorization of data into different category according to protocols. They are many classification algorithmsavailable and among the decision tree is the most commonly used method. Classification of data objectsbased on a predefined knowledge of objects is a data mining. This paper discussed...

  19. COMPARING THE PERFORMANCE OF SEMANTIC IMAGE RETRIEVAL USING SPARQL QUERY, DECISION TREE ALGORITHM AND LIRE

    Directory of Open Access Journals (Sweden)

    Magesh

    2013-01-01

    Full Text Available The ontology based framework is developed for representing image domain. The textual features of images are extracted and annotated as the part of the ontology. The ontology is represented in Web Ontology Language (OWL format which is based on Resource Description Framework (RDF and Resource Description Framework Schema (RDFS. Internally, the RDF statements represent an RDF graph which provides the way to represent the image data in a semantic manner. Various tools and languages are used to retrieve the semantically relevant textual data from ontology model. The SPARQL query language is more popular methods to retrieve the textual data stored in the ontology. The text or keyword based search is not adequate for retrieving images. The end users are not able to convey the visual features of an image in SPARQL query form. Moreover, the SPARQL query provides more accurate results by traversing through RDF graph. The relevant images cannot be retrieved by one to one mapping. So the relevancy can be provided by some kind of onto mapping. The relevancy is achieved by applying a decision tree algorithm. This study proposes methods to retrieve the images from ontology and compare the image retrieval performance by using SPARQL query language, decision tree algorithm and Lire which is an open source image search engine. The SPARQL query language is used to retrieving the semantically relevant images using keyword based annotation and the decision tree algorithms are used in retrieving the relevant images using visual features of an image. Lastly, the image retrieval efficiency is compared and graph is plotted to indicate the efficiency of the system.

  20. Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

    Directory of Open Access Journals (Sweden)

    O.BENCHAREF

    2011-09-01

    Full Text Available The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

  1. FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ferdaws Ezzi

    2016-01-01

    Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.

  2. Algorithms for optimal dyadic decision trees

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  3. Constructing an optimal decision tree for FAST corner point detection

    KAUST Repository

    Alkhalid, Abdulaziz

    2011-01-01

    In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.

  4. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    Directory of Open Access Journals (Sweden)

    Jörg Huwyler

    2012-08-01

    Full Text Available Predicting blood-brain barrier (BBB permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction on both descriptor sets. Models with a corrected classification rate (CCR of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP and charge (polar surface area, which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  5. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  6. 基于改进决策树算法的Web数据库查询结果自动分类方法%A Categorization Approach Based on Adapted Decision Tree Algorithm for Web Databases Query Results

    Institute of Scientific and Technical Information of China (English)

    孟祥福; 马宗民; 张霄雁; 王星

    2012-01-01

    To deal with the problem that too many results are returned from a Web database in response to a user query, this paper proposes a novel approach based on adapted decision tree algorithm for automatically categorizing Web database query results. The query history of all users in the system is analyzed offline and then similar queries in semantics are merged into the same cluster. Next, a set of tuple clusters over the original data is generated in accordance to the query clusters, each tuple cluster corresponding to one type of user preferences. When a query is coming, based on the tuple clusters generated in the offline time, a labeled and leveled categorization tree, which can enable the user to easily select and locate the information he/she needs, is constructed by using the adapted decision tree algorithm. Experimental results demonstrate that the categorization approach has lower navigational cost and better categorization effectiveness, and can meet different type user's personalized query needs effectively as well.%为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.

  7. Fuzzy Decision Tree Model for Driver Behavior Confronting Yellow Signal at Signalized Intersection%交叉口黄灯期间驾驶员行为的模糊决策树模型

    Institute of Scientific and Technical Information of China (English)

    龙科军; 赵文秀; 肖向良

    2011-01-01

    Drivers decision to go or stop during the yellow interval belongs to uncertain decision making. This paper collects drivers behavior data at four similar intersections. Fuzzy Decision Tree(FDT) is applied to model driver behavior at signalized intersection. Considering vehicle location,velocity and countdown timer as the influencing factors, the FDT model is constructed using FID3 algorithm, and decision roles are generated as well. Test sample is applied to test FDT model, and results indicate that FDT model can predict drivers' decision with overall accuracy of 84.8%.%采集黄灯期间驾驶员行为的相关数据,考虑车辆位置、车速、倒计时表3个影响因素,分别设定其隶属度函数,应用模糊决策树中的FID3算法,以模糊信息熵为启发信息,构建驾驶员选择的模糊决策树模型,生成决策规则.利用测试样本对模型进行检验,结果表明,基于模糊决策树的预测结果准确率总体达到84.8%.

  8. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  9. Modeling of stage-discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi-Sugeno inference system technique: a comparative study

    Science.gov (United States)

    Al-Abadi, Alaa M.

    2014-12-01

    The potential of using three different data-driven techniques namely, multilayer perceptron with backpropagation artificial neural network (MLP), M5 decision tree model, and Takagi-Sugeno (TS) inference system for mimic stage-discharge relationship at Gharraf River system, southern Iraq has been investigated and discussed in this study. The study used the available stage and discharge data for predicting discharge using different combinations of stage, antecedent stages, and antecedent discharge values. The models' results were compared using root mean squared error (RMSE) and coefficient of determination (R 2) error statistics. The results of the comparison in testing stage reveal that M5 and Takagi-Sugeno techniques have certain advantages for setting up stage-discharge than multilayer perceptron artificial neural network. Although the performance of TS inference system was very close to that for M5 model in terms of R 2, the M5 method has the lowest RMSE (8.10 m3/s). The study implies that both M5 and TS inference systems are promising tool for identifying stage-discharge relationship in the study area.

  10. On algorithm for building of optimal α-decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.

  11. Automatic design of decision-tree induction algorithms

    CERN Document Server

    Barros, Rodrigo C; Freitas, Alex A

    2015-01-01

    Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning, and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain o

  12. Using Decision Trees to Characterize Verbal Communication During Change and Stuck Episodes in the Therapeutic Process

    Directory of Open Access Journals (Sweden)

    Víctor Hugo eMasías

    2015-04-01

    Full Text Available Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBtree, and REPtree are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1,760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  13. A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams

    Institute of Scientific and Technical Information of China (English)

    Xue-Gang Hu; Pei-Pei Li; Xin-Dong Wu; Gong-Qing Wu

    2007-01-01

    Mining with streaming data is a hot topic in data mining. When performing classification on data streams,traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

  14. Antibiogram-Derived Radial Decision Trees: An Innovative Approach to Susceptibility Data Display

    Directory of Open Access Journals (Sweden)

    Rocco J. Perla

    2005-01-01

    Full Text Available Hospital antibiograms (ABGMs are often presented in the form of large 2-factor (single organism vs. single antimicrobial tables. Presenting susceptibility data in this fashion, although of value, does have limitations relative to drug resistant subpopulations. As the crisis of antimicrobial drug-resistance continues to escalate globally, clinicians need (1 to have access to susceptibility data that, for isolates resistant to first-line drugs, indicates susceptibility to second line drugs and (2 to understand the probabilities of encountering such organisms in a particular institution. This article describes a strategy used to transform data in a hospital ABGM into a probability-based radial decision tree (RDT that can be used as a guide to empiric antimicrobial therapy. Presenting ABGM data in the form of a radial decision tree versus a table makes it easier to visually organize complex data and to demonstrate different levels of therapeutic decision-making. The RDT model discussed here may also serve as a more effective tool to understand the prevalence of different resistant subpopulations in a given institution compared to the traditional ABGM.

  15. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  16. A tool for study of optimal decision trees

    KAUST Repository

    Alkhalid, Abdulaziz

    2010-01-01

    The paper describes a tool which allows us for relatively small decision tables to make consecutive optimization of decision trees relative to various complexity measures such as number of nodes, average depth, and depth, and to find parameters and the number of optimal decision trees. © 2010 Springer-Verlag Berlin Heidelberg.

  17. Greedy algorithm with weights for decision tree construction

    KAUST Repository

    Moshkov, Mikhail

    2010-12-01

    An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.

  18. Minimizing size of decision trees for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-09-29

    We used decision tree as a model to discover the knowledge from multi-label decision tables where each row has a set of decisions attached to it and our goal is to find out one arbitrary decision from the set of decisions attached to a row. The size of the decision tree can be small as well as very large. We study here different greedy as well as dynamic programming algorithms to minimize the size of the decision trees. When we compare the optimal result from dynamic programming algorithm, we found some greedy algorithms produce results which are close to the optimal result for the minimization of number of nodes (at most 18.92% difference), number of nonterminal nodes (at most 20.76% difference), and number of terminal nodes (at most 18.71% difference).

  19. Computational study of developing high-quality decision trees

    Science.gov (United States)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  20. Empirically Derived Dehydration Scoring and Decision Tree Models for Children With Diarrhea: Assessment and Internal Validation in a Prospective Cohort Study in Dhaka, Bangladesh

    OpenAIRE

    Levine, Adam C.; Glavis-Bloom, Justin; Modi, Payal; Nasrin, Sabiha; Rege, Soham; Chu, Chieh; Schmid, Christopher H.; Alam, Nur H

    2015-01-01

    The DHAKA Dehydration Score and the DHAKA Dehydration Tree are the first empirically derived and internally validated diagnostic models for assessing dehydration in children with acute diarrhea for use by general practice nurses in a resource-limited setting. Frontline providers can use these new tools to better classify and manage dehydration in children.

  1. MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees

    Directory of Open Access Journals (Sweden)

    Vasile PURDILĂ

    2014-03-01

    Full Text Available Learning decision trees against very large amounts of data is not practical on single node computers due to the huge amount of calculations required by this process. Apache Hadoop is a large scale distributed computing platform that runs on commodity hardware clusters and can be used successfully for data mining task against very large datasets. This work presents a parallel decision tree learning algorithm expressed in MapReduce programming model that runs on Apache Hadoop platform and has a very good scalability with dataset size.

  2. 基于邻域粗糙集和决策树算法的核电厂故障诊断方法%Fault Diagnosis Method for Nuclear Power Plant Based on Decision Tree and Neighborhood Rough Sets

    Institute of Scientific and Technical Information of China (English)

    慕昱; 夏虹; 刘永阔

    2011-01-01

    核动力装置系统复杂,需要采集和监测的变量较多,这给装置故障诊断增加了困难.针对该问题提出基于邻域粗糙集的参数约简算法,该算法实现了实数空间的粒度计算,可直接处理数值型参数,无需离散化参数.在此基础上,采用决策树算法对核电厂的失水事故、给水管道破裂、蒸汽发生器U形管破裂和主蒸汽管道破裂等4种典型故障进行训练学习,并将诊断决策结果与支持向量机算法进行对比.仿真结果表明,该算法可快速、准确地诊断出核电厂上述故障.%Nuclear power plants (NPP) are very complex system, which need to collect and monitor vast parameters. It's hard to diagnose the faults. A parameter reduction method based on neighborhood rough sets was proposed according to the problem.Granular computing was realized in a real space, so numerical parameters could be directly processed. On this basis, the decision tree was applied to learn from training samples which were the typical faults of nuclear power plant, i. e. , loss of coolant accident, feed water pipe rupture, steam generator tube rupture, main steam pipe rupture, and diagnose by using the acquired knowledge. Then the diagnostic results were compared with the results of support vector machine. The simulation results show that this method can rapidly and accurately diagnose the above mentioned faults of the NPP.

  3. 基于 C4.5决策树的股票数据挖掘%Stock Data Mining Based on C4.5 Decision Tree

    Institute of Scientific and Technical Information of China (English)

    王领; 胡扬

    2015-01-01

    由于目前利用数据挖掘算法对股票分析和预测存在数据量及技术指标等方面的问题,本文基于对股市数据的分析,适当选取某些指标作为决策属性,利用C4.5决策树分类算法进行分类预测。主要对股票技术指标进行介绍和优化,对C4.5算法的效率进行改进。改进后的算法结合优化的技术指标不仅能够提高数据挖掘的执行效率,同时也能在股票预测方面得到更高的收益。%Using data mining algorithms to analze and forecast the stock still has problems in technical indicators and quantity of data.Based on the analysis of stock market data, this paper selected certain indicators as decision attribute, and used C4.5 deci-sion tree to classify and forecast the stock.This article mainly optimized the indicators of stock, and improved the efficiency of C4.5 algorithm.Optimized algorithm combining with improved indicators not only enhances the efficiency of data mining, also gets better returns in stock forecasting.

  4. 基于决策树数据挖掘算法的大学生消费数据分析%Analysis of College Students Consumption Data Based on Decision Tree Data Mining Algorithm

    Institute of Scientific and Technical Information of China (English)

    黄剑

    2015-01-01

    文章使用决策树数据挖掘算法为基本工具,以近年大学生在校校园卡消费数据为基础,探讨数据挖掘在分析和研究大学生在校消费行为变化、消费特点以及与消费价格之间的深入关系.通过对消费数据的数据挖掘,分析得到近年来大学生消费行为、习惯、消费量的信息,找出其中的内在关联和变化趋势.并使文章结果能够更好、更有效的指导学校餐饮价格波动、菜品的新增;在学生可承受的价格范围内更好的提供餐饮服务.%This paper uses decision tree data mining algorithm as the basic tool. Based on the consumption data of college students in college in recent years, the relationship between college students consumption behavior, consumption characteristics and consumption price is analyzed and studied by data mining. Through data mining of consumption data, the information of College Students' consumption behavior, habits and consumption is analyzed, and the inherent relation and changing trend are found out. And the results of this paper can better and more effectively guide the food price fluctuation and the new dishes, and provide catering service for the students who can afford the price range.

  5. Multiple neural network integration using a binary decision tree to improve the ECG signal recognition accuracy

    Directory of Open Access Journals (Sweden)

    Tran Hoai Linh

    2014-09-01

    Full Text Available The paper presents a new system for ECG (ElectroCardioGraphy signal recognition using different neural classifiers and a binary decision tree to provide one more processing stage to give the final recognition result. As the base classifiers, the three classical neural models, i.e., the MLP (Multi Layer Perceptron, modified TSK (Takagi-Sugeno-Kang and the SVM (Support Vector Machine, will be applied. The coefficients in ECG signal decomposition using Hermite basis functions and the peak-to-peak periods of the ECG signals will be used as features for the classifiers. Numerical experiments will be performed for the recognition of different types of arrhythmia in the ECG signals taken from the MIT-BIH (Massachusetts Institute of Technology and Boston’s Beth Israel Hospital Arrhythmia Database. The results will be compared with individual base classifiers’ performances and with other integration methods to show the high quality of the proposed solution

  6. Relationships among various parameters for decision tree optimization

    KAUST Repository

    Hussain, Shahid

    2014-01-14

    In this chapter, we study, in detail, the relationships between various pairs of cost functions and between uncertainty measure and cost functions, for decision tree optimization. We provide new tools (algorithms) to compute relationship functions, as well as provide experimental results on decision tables acquired from UCI ML Repository. The algorithms presented in this paper have already been implemented and are now a part of Dagger, which is a software system for construction/optimization of decision trees and decision rules. The main results presented in this chapter deal with two types of algorithms for computing relationships; first, we discuss the case where we construct approximate decision trees and are interested in relationships between certain cost function, such as depth or number of nodes of a decision trees, and an uncertainty measure, such as misclassification error (accuracy) of decision tree. Secondly, relationships between two different cost functions are discussed, for example, the number of misclassification of a decision tree versus number of nodes in a decision trees. The results of experiments, presented in the chapter, provide further insight. © 2014 Springer International Publishing Switzerland.

  7. Classification of Liss IV Imagery Using Decision Tree Methods

    Science.gov (United States)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  8. Data mining with decision trees theory and applications

    CERN Document Server

    Rokach, Lior

    2014-01-01

    Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced. This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new

  9. Intrusion Detection System using Memtic Algorithm Supporting with Genetic and Decision Tree Algorithms

    OpenAIRE

    K. P. Kaliyamurthie; D. Parameswari; R.M.Suresh

    2012-01-01

    This paper has proposed a technique of combining Decision Tree, Genetic Algorithm, DT-GA and Memtic algorithm to find more accurate models for fitting the behavior of network intrusion detection system. We simulate this sort of integrated algorithm and the results obtained with encouragement to further work.

  10. Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments

    Science.gov (United States)

    Lees, Brian G.; Ritman, Kim

    1991-11-01

    The integration of Landsat TM and environmental GIS data sets using artificial intelligence rule-induction and decision-tree analysis is shown to facilitate the production of vegetation maps with both floristic and structural information. This technique is particularly suited to vegetation mapping in disturbed or hilly environments that are unsuited to either conventional remote sensing methods or GIS modeling using environmental data bases.

  11. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  12. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    Science.gov (United States)

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  13. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  14. Decision-Tree Formulation With Order-1 Lateral Execution

    Science.gov (United States)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  15. On the relationship between the prices of oil and the precious metals: Revisiting with a multivariate regime-switching decision tree

    International Nuclear Information System (INIS)

    This study examines the volatility and correlation and their relationships among the euro/US dollar exchange rates, the S and P500 equity indices, and the prices of WTI crude oil and the precious metals (gold, silver, and platinum) over the period 2005 to 2012. Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. The ensuing Hidden Markov Decision Tree (HMDT) model is in fact an extension of the Hidden Markov Model (HMM) introduced by Jordan et al. (1997). The architecture of this model is the opposite that of the classical deterministic approach based on a binary decision tree and, it allows a probabilistic vision of the relationship between univariate volatility and correlation. Our results are categorized into three groups, namely (1) exchange rates and oil, (2) S and P500 indices, and (3) precious metals. A switching dynamics is seen to characterize the volatilities, while, in the case of the correlations, the series switch from one regime to another, this movement touching a peak during the period of the Subprime crisis in the US, and again during the days following the Tohoku earthquake in Japan. Our findings show that the relationships between volatility and correlation are dependent upon the nature of the series considered, sometimes corresponding to those found in econometric studies, according to which correlation increases in bear markets, at other times differing from them. - Highlights: • This study examines the volatility and correlation and their relationships of precious metals and crude oil. • Our model links the univariate volatilities with the correlations via a hidden stochastic decision tree. • This model allows a probabilistic point of view of the relationship between univariate volatility and correlation. • Results show the relationships between volatility and correlation are dependent upon the nature of the series considered

  16. Identifying Bank Frauds Using CRISP-DM and Decision Trees

    OpenAIRE

    Bruno Carneiro da Rocha; Rafael Timóteo de Sousa Júnior

    2010-01-01

    This article aims to evaluate the use of techniques of decision trees, in conjunction with the managementmodel CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of t...

  17. Decision tree approach to power systems security assessment

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1993-01-01

    An overview of the general decision tree approach to power system security assessment is presented. The general decision tree methodology is outlined, modifications proposed in the context of transient stability assessment are embedded, and further refinements are considered. The approach is then suitably tailored to handle other specifics of power systems security, relating to both preventive and emergency voltage control, in addition to transient stability. Trees are accordingly built in th...

  18. Confidence sets for split points in decision trees

    OpenAIRE

    Banerjee, Moulinath; McKeague, Ian W.

    2007-01-01

    We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others cali...

  19. 基于决策树方法的银行客户关系管理的研究和应用%Research and Application of Bank Customer Relationship Management based on the Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    李明辉

    2012-01-01

      Decision tree algorithm in data mining is a very important value in the banking industry. Decision tree technology for the banking industry, through the analysis of specific customer background information, predict the customer's customer categories in order to take the appropriate business strategy, both to improve the service level of banking services, development of client resources, to avoid the loss of customers, to conserve resources, use of a minimum investment to get a larger income. Bank lending to judge whether the borrowers have the risk of the loan proposal is feasible, customers will be classified in accordance with the actual needs of the bank, these problems can be resolved through the decision tree algorithm%  数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。

  20. 决策树算法在团购商品销售预测中的应用%Application of Sales Volume Forecast of Group Purchase Based on Decision Tree Method

    Institute of Scientific and Technical Information of China (English)

    费斐; 叶枫

    2013-01-01

      网络团购,指的是互相不认识的消费者在特定的时间内在同一网站上共同购买同一种商品,以求得最优价格的一种网络购物方式。现如今,作为平台方的团购网站在面对大量报名参加团购的商品,审核过程中需要介入大量人力,对经验过于依赖。利用决策树算法,对影响团购商品销量水平的变量进行分析,生成可读的决策树,用以辅助决策,筛选出优质的商品。%Group purchase is a shopping mode that customers buying goods which been selling at a discount in a limited period of time and specific website. Nowadays, facing the large number of application of commodity. Group purchase website as a Platform, which has to intervene a lot of manpower for product review. Also may excessively dependent on the former experience. This paper intends to use the decision tree algorithm to analyse the sales volume of the group purchase goods. Generate readable decision tree to make a strategic decision and select the high quality goods.

  1. Bounds on Average Time Complexity of Decision Trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    In this chapter, bounds on the average depth and the average weighted depth of decision trees are considered. Similar problems are studied in search theory [1], coding theory [77], design and analysis of algorithms (e.g., sorting) [38]. For any diagnostic problem, the minimum average depth of decision tree is bounded from below by the entropy of probability distribution (with a multiplier 1/log2 k for a problem over a k-valued information system). Among diagnostic problems, the problems with a complete set of attributes have the lowest minimum average depth of decision trees (e.g, the problem of building optimal prefix code [1] and a blood test study in assumption that exactly one patient is ill [23]). For such problems, the minimum average depth of decision tree exceeds the lower bound by at most one. The minimum average depth reaches the maximum on the problems in which each attribute is "indispensable" [44] (e.g., a diagnostic problem with n attributes and kn pairwise different rows in the decision table and the problem of implementing the modulo 2 summation function). These problems have the minimum average depth of decision tree equal to the number of attributes in the problem description. © Springer-Verlag Berlin Heidelberg 2011.

  2. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    OpenAIRE

    Oral, L. O.; V. Tecim

    2013-01-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from h...

  3. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    International Nuclear Information System (INIS)

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, ≤Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  4. Decision tree sensitivity analysis for cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma)

    Energy Technology Data Exchange (ETDEWEB)

    Kosuda, Shigeru; Watanabe, Masumi; Kobayashi, Hideo; Kusano, Shoichi [National Defence Medical College, Tokorozawa, Saitama (Japan); Ichihara, Kiyoshi

    1998-07-01

    Decision tree analysis was used to assess cost-effectiveness of chest FDG-PET in patients with a pulmonary tumor (non-small cell carcinoma, {<=}Stage IIIB), based on the data of the current decision tree. Decision tree models were constructed with two competing strategies (CT alone and CT plus chest FDG-PET) in 1,000 patient population with 71.4% prevalence. Baselines of FDG-PET sensitivity and specificity on detection of lung cancer and lymph node metastasis, and mortality and life expectancy were available from references. Chest CT plus chest FDG-PET strategy increased a total cost by 10.5% when a chest FDG-PET study costs 0.1 million yen, since it increased the number of mediastinoscopy and curative thoracotomy despite reducing the number of bronchofiberscopy to half. However, the strategy resulted in a remarkable increase by 115 patients with curable thoracotomy and decrease by 51 patients with non-curable thoracotomy. In addition, an average life expectancy increased by 0.607 year/patient, which means increase in medical cost is approximately 218,080 yen/year/patient when a chest FDG-PET study costs 0.1 million yen. In conclusion, chest CT plus chest FDG-PET strategy might not be cost-effective in Japan, but we are convinced that the strategy is useful in cost-benefit analysis. (author)

  5. Application of alternating decision trees in selecting sparse linear solvers

    KAUST Repository

    Bhowmick, Sanjukta

    2010-01-01

    The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of the result. However, the best solution method can vary even across linear systems generated in course of the same PDE-based simulation, thereby making solver selection a very challenging problem. In this paper, we use a machine learning technique, Alternating Decision Trees (ADT), to select efficient solvers based on the properties of sparse linear systems and runtime-dependent features, such as the stages of simulation. We demonstrate the effectiveness of this method through empirical results over linear systems drawn from computational fluid dynamics and magnetohydrodynamics applications. The results also demonstrate that using ADT can resolve the problem of over-fitting, which occurs when limited amount of data is available. © 2010 Springer Science+Business Media LLC.

  6. 基于决策树法的北京城市居民通勤距离模式挖掘%Data mining on commuting distance mode of urban residents based on the analysis of decision tree

    Institute of Scientific and Technical Information of China (English)

    王茂军; 宋国庆; 许洁

    2009-01-01

    以问卷调查数据为基础,引进决策树分析方法,讨论了北京市城市居民通勤距离模式.研究发现:第一,在设定的修剪纯度下,北京城市居民通勤距离远近与出行工具、居住地变更、职业、居住地就业率、最小孩子求学状况、住房而积、家庭月收入、机动车利用状况密切相关;第二,在影响城市居民通勤距离的变量中,出行工具变量的重要性最大,其次是住房而积变量、最小孩子求学变量,再次为居住地变更变量、职业变量,家庭月收入变量为第四等级,机动车使用变量和本地就业率为第五等级.第三,因住房产权复杂性、迁居原因的多样性、被动郊区化以及生产、育儿福利及家庭内部事务分工等因素的影响,住房面积、迁居史、家庭生命周期、职业与通勤距离的关系与国内已有结论相悖,部分变量因子对短距离通勤具有决定性影响,部分变量对于长距离通勤有决定性影响.%With the development of suburbanization, urban residents now have more choices in jobs and housing locations. Nowadays, scholars increasingly pay attention to the studies on citizens' commuting mode. The analysis of commuting space characteristics belongs to the study of geography. Based on questionnaire survey, this paper first makes a descrip-tive analysis of people's commuting variables, distances, and directions. Then it discusses the commuters of Beijing by decision tree analysis and data mining. Conclusions are ob-tained as follows:First, under the fixed pruning severity, people's commuting distance is related to their traveling vehicles, resident locations, jobs, youngest child's education conditions, living space, family incomes, usage of cars, and employment rate on local areas. Factors such as gender, educational level, marital status, housing property are not involved in the mode. Second, our study of the relations between the eight variables and commuting distance is

  7. Social Impact on Android Applications using Decision Tree

    Directory of Open Access Journals (Sweden)

    Waseem Iqbal

    2015-11-01

    Full Text Available Mobile phones have evolved very rapidly from black and white to smart phones. Google has launched Android operating system (OS, based on Linux targeting the smart phones. After this, people became addicted to these smart phones due to the facilities provided by these phones. But the security leaks possess in Android are the big hurdle to use it in a secured way. The Android operating system is mostly used because it is an Open Source/freeware and most of its applications are also freely available on different online applications stores. To install  any application,  we  must accept  the  terms  and conditions  regarding  the access  to multiple part of device and personal information, otherwise unable to install these free or paid applications.  The main problem is that when we allow the access to multiple parts of our device and our personal information, the inherited security leaks become more vulnerable to threat.  A very simple and handy solution is that we only install the applications that are positively reviewed by other users who already installed and are still using these applications. We implement the Decision Tree, a machine learning technique, to analyze these positively reviewed application and make a recommendation  whether to install them in the device or not.

  8. BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES

    Directory of Open Access Journals (Sweden)

    Amaranatha Reddy P

    2015-11-01

    Full Text Available This research paper proposes an algorithm to find association rules for incremental databases. Most of the transaction databases are often dynamic. Suppose consider super market customers daily purchase transactions. Day to day customer’s behaviour to purchase items may change and new products replace old products. In this scenario static data mining algorithms doesn't make good significance. If an algorithm continuously learns day to day, then we can get most updated knowledge. This is very much helpful in present fast updating world. Famous and benchmarked algorithms for Association rules mining are Apriory and FP- Growth. However, the major drawback in Appriory and FP-Growth is, they must be rebuilt all over again once the original database is changed. Therefore, in this paper we introduce an efficient algorithm called Binary Decision Tree (BDT to process incremental data. To process continuously data we need so much of processing and storage resources. In this algorithm we scan data base only once by which we construct dynamic growing binary tree to find association rules with better performance and optimum storage. We can apply for static data also, but our main intension is to give optimum solution for incremental data.

  9. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  10. Extracting decision rules from police accident reports through decision trees.

    Science.gov (United States)

    de Oña, Juan; López, Griselda; Abellán, Joaquín

    2013-01-01

    Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules. PMID:23021419

  11. Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods

    OpenAIRE

    Khan, Saqib Hussain

    2010-01-01

    Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the databas...

  12. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    OpenAIRE

    Barbara Kraszewska-Głomba; Zofia Szymańska-Toczek; Leszek Szenborn

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with t...

  13. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense

    OpenAIRE

    Pedersen, Ted

    2001-01-01

    This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

  14. A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree Induction and BPN

    Directory of Open Access Journals (Sweden)

    S. Pitchumani Angayarkanni, V. Saravanan

    2011-02-01

    Full Text Available An intelligent computer-aided diagnosis system can be very helpful for radiologist indetecting and diagnosing micro calcifications patterns earlier and faster than typicalscreening programs. In this paper, we present a system based on fuzzy-C Meansclustering and feature extraction techniques using texture based segmentation andgenetic algorithm for detecting and diagnosing micro calcifications’ patterns in digitalmammograms. We have investigated and analyzed a number of feature extractiontechniques and found that a combination of three features, such as entropy,standard deviation, and number of pixels, is the best combination to distinguish abenign micro calcification pattern from one that is malignant. A fuzzy C Meanstechnique in conjunction with three features was used to detect a micro calcificationpattern and a neural network to classify it into benign/malignant. The system wasdeveloped on a Windows platform. It is an easy to use intelligent system that givesthe user options to diagnose, detect, enlarge, zoom, and measure distances of areasin digital mammograms. The present study focused on the investigation of theapplication of artificial intelligence and data mining techniques to the predictionmodels of breast cancer. The artificial neural network, decision tree, Fuzzy C Means,and genetic algorithm were used for the comparative studies and the accuracy andpositive predictive value of each algorithm were used as the evaluation indicators.699 records acquired from the breast cancer patients at the MIAS database, 9predictor variables, and 1 outcome variable were incorporated for the data analysisfollowed by the 10-fold cross-validation. The results revealed that the accuracies ofFuzzy C Means were 0.9534 (sensitivity 0.98716 and specificity 0.9582, thedecision tree model 0.9634 (sensitivity 0.98615, specificity 0.9305, the neuralnetwork model 0.96502 (sensitivity 0.98628, specificity 0.9473, the geneticalgorithm model 0.9878 (sensitivity 1

  15. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  16. A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

    OpenAIRE

    Ali Idri; Sanaa Elyassami

    2011-01-01

    Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the...

  17. Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    OpenAIRE

    Ertek, Gürdal; Ertek, Gurdal; Tunç, Murat Mustafa; Tunc, Murat Mustafa

    2012-01-01

    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding ...

  18. Independent Component Analysis and Decision Trees for ECG Holter Recording De-Noising

    OpenAIRE

    Jakub Kuzilek; Vaclav Kremen; Filip Soucek; Lenka Lhotska

    2014-01-01

    We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA). This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE) between origin...

  19. Imitation learning of car driving skills with decision trees and random forests

    Directory of Open Access Journals (Sweden)

    Cichosz Paweł

    2014-09-01

    Full Text Available Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots

  20. Relationships between depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2011-01-01

    This paper describes a new tool for the study of relationships between depth and number of misclassifications for decision trees. In addition to the algorithm the paper also presents the results of experiments with three datasets from UCI Machine Learning Repository [3]. © 2011 Springer-Verlag.

  1. Construction of a decision tree in linear programming problems

    International Nuclear Information System (INIS)

    The dependence of the solution of a linear programming problem on its parameter has been analyzed. An algorithm for the construction of a decision tree has been proposed with the use of the simplex method together with the validity support system

  2. Practical secure decision tree learning in a teletreatment application

    NARCIS (Netherlands)

    Hoogh, de Sebastiaan; Schoenmakers, Berry; Chen, Ping; Akker, op den Harm

    2014-01-01

    In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approa

  3. Decision tree ensembles for online operation of large smart grids

    International Nuclear Information System (INIS)

    Highlights: ► We present a new technique for the online control of large smart grids. ► We use a Decision Tree Ensemble in a Receding Horizon Controller. ► Decision Trees can approximate online optimisation approaches. ► Decision Trees can make adjustments to their output in real time. ► The new technique outperforms heuristic online optimisation approaches. - Abstract: Smart grids utilise omnidirectional data transfer to operate a network of energy resources. Associated technologies present operators with greater control over system elements and more detailed information on the system state. While these features may improve the theoretical optimal operating performance, determining the optimal operating strategy becomes more difficult. In this paper, we show how a decision tree ensemble or ‘forest’ can produce a near-optimal control strategy in real time. The approach substitutes the decision forest for the simulation–optimisation sub-routine commonly employed in receding horizon controllers. The method is demonstrated on a small and a large network, and compared to controllers employing particle swarm optimisation and evolutionary strategies. For the smaller network the proposed method performs comparably in terms of total energy usage, but delivers a greater demand deficit. On the larger network the proposed method is superior with respect to all measures. We conclude that the method is useful when the time required to evaluate possible strategies via simulation is high.

  4. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  5. Extraction of information on construction land based on multi-feature decision tree classification%基于多特征决策树的建设用地信息提取

    Institute of Scientific and Technical Information of China (English)

    饶萍; 王建力; 王勇

    2014-01-01

    Spatial distribution status of construction land is closely related to the regional economic and social development. Therefore, timely monitoring and delivery of data on the dynamics of construction land are far-reaching for policy and decision making processes. Classifying land-use/land-cover and analyzing changes are among the most common applications of remote sensing. One of the most basic and difficult classification tasks is to distinguish the construction land from other land surfaces. Landsat imagery is one of the most widely used sources of data in remote sensing of construction land. Several techniques of construction land extraction using Landsat data are described in some literatures, but their applications are constrained by low accuracy in various situations, and usually using the technique of single index or multi-index. The purpose of this study was to devise a method to improve the accuracy of construction land extraction in the presence of various kinds of environmental noise. Thus we introduce a multi-features decision tree (DT) classification model for improving classification accuracy in the areas that including bare land, shadow and some streams, in which the other classification methods often fail to classify correctly. The model integrates four spectral indexes, the pattern recognition technique and spatial algorithms. The four spectral indexes are the normalized difference three bands index (NDTBI), the normalized difference building index (NDBI), the modified normalized difference water index (MNDWI) and the normalized difference vegetation index (NDVI) respectively. The pattern recognition technique is referred to support vector machine (SVM). And the spatial algorithm is to create buffer zone. The test site was deliberately selected so that it consists of complex surface features, such as bare land, hill shade, and some small streams that are liable to be mixed up with construction land on the Landsat imagery. For that reason, Landsat-8

  6. A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

    Directory of Open Access Journals (Sweden)

    Ali Idri

    2011-09-01

    Full Text Available Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used, as measures of prediction accuracy, for this study. A series of experiments is reported using Tukutuku software projects dataset. The results are compared with those produced by three crisp versions of decision trees: ID3, C4.5 and CART.

  7. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    Science.gov (United States)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  8. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    Science.gov (United States)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  9. Supervised hashing using graph cuts and boosted decision trees.

    Science.gov (United States)

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data. PMID:26440270

  10. Research on the accuracy of TM images land-use classification based on QUEST decision tree: A case study of Lijiang in Yunnan%基于QUEST决策树的遥感影像土地利用分类——以云南省丽江市为例

    Institute of Scientific and Technical Information of China (English)

    吴健生; 潘况; 彭建; 黄秀兰

    2012-01-01

    The accuracy of research on land use/cover change (LUCC) is determined directly by the accuracy of land use classification derived from aerial and satellite images. In analysis of the factors of accuracy of current remote sensing image classification, some methods were introduced to study new trends of classification modes. Some previous studies showed that the speed and accuracy of QUEST (Quick, Unbiased, and Efficient Statistical Tree) decision tree classification were superior to those of other decision tree classifications. On the basis of this approach, the research classified the Landsat TM-5 images in Lijiang, Yunnan province. This paper compared the result with that of maximum likelihood image classification. The overall accuracy was 90. 086 %, which was higher than the overall accuracy (85. 965%) of CART (Classification And Regression Tree). Meanwhile, the Kappa efficient was 0. 849, which was higher than the Kappa efficient (0. 760) of CART. Therefore, it is concluded that in the complex terrain area such as in mountainous regions, the choice of QUEST decision tree classification on TM image would improve the accuracy of land use classification. This type of classification decision tree can precisely obtain new classification rules from integrated satellite images, land use thematic maps, DEM maps and other field investigation materials. Simultaneously, the method can also help users to find new classification rules in multidimensional information, and to build decision tree classifier models. Furthermore, the methods, including a large number of high-resolution and hyperspectral image data, integrated multi-sensor platform, multi-temporal remote sensing image, the pattern recognition and data mining of spectral and texture features, and auxiliary geographic data, will become a trend.%土地利用分类精度直接决定土地利用/土地覆被变化相关研究的准确性,而基于决策树的遥感影像分类是近年来提高土地利用分类

  11. Distributed Decision-Tree Induction in Peer-to-Peer Systems

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in...

  12. Analisa Performansi menggunakan Algoritma Decision Tree

    OpenAIRE

    Swendy, Maries

    2016-01-01

    Data mining have been implemented to get the information more usefull then using conventional database combine with using human analysis as the user from the organization/ company systems. This Thesis proposes a tools to monitoring and tracking performance from the connectedness rule model of the results of survey, audit data and revenue’s data in organization/ company systems. The more dominant factors that influence the growth revenue as the variable of organization/ compa...

  13. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  14. Decision Trees and Transient Stability of Electric Power Systems

    OpenAIRE

    Wehenkel, Louis; Pavella, Mania

    1991-01-01

    An inductive inference method for the automatic building of decision trees is investigated. Among its various tasks, the splitting and the stop splitting criteria successively applied to the nodes of a grown tree, are found to play a crucial role on its overall shape and performances. The application of this general method to transient stability is systematically explored. Parameters related to the stop splitting criterion, to the learning set and to the tree classes are thus considered, a...

  15. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    Science.gov (United States)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  16. USING DECISION TREES FOR ESTIMATING MODE CHOICE OF TRIPS IN BUCA-IZMIR

    Directory of Open Access Journals (Sweden)

    L. O. Oral

    2013-05-01

    Full Text Available Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  17. Extensions of dynamic programming as a new tool for decision tree optimization

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-01-01

    The chapter is devoted to the consideration of two types of decision trees for a given decision table: α-decision trees (the parameter α controls the accuracy of tree) and decision trees (which allow arbitrary level of accuracy). We study possibilities of sequential optimization of α-decision trees relative to different cost functions such as depth, average depth, and number of nodes. For decision trees, we analyze relationships between depth and number of misclassifications. We also discuss results of computer experiments with some datasets from UCI ML Repository. ©Springer-Verlag Berlin Heidelberg 2013.

  18. Decision Tree and Texture Analysis for Mapping Debris-Covered Glaciers in the Kangchenjunga Area, Eastern Himalaya

    Directory of Open Access Journals (Sweden)

    Adina Racoviteanu

    2012-10-01

    Full Text Available In this study we use visible, short-wave infrared and thermal Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER data validated with high-resolution Quickbird (QB and Worldview2 (WV2 for mapping debris cover in the eastern Himalaya using two independent approaches: (a a decision tree algorithm, and (b texture analysis. The decision tree algorithm was based on multi-spectral and topographic variables, such as band ratios, surface reflectance, kinetic temperature from ASTER bands 10 and 12, slope angle, and elevation. The decision tree algorithm resulted in 64 km2 classified as debris-covered ice, which represents 11% of the glacierized area. Overall, for ten glacier tongues in the Kangchenjunga area, there was an area difference of 16.2 km2 (25% between the ASTER and the QB areas, with mapping errors mainly due to clouds and shadows. Texture analysis techniques included co-occurrence measures, geostatistics and filtering in spatial/frequency domain. Debris cover had the highest variance of all terrain classes, highest entropy and lowest homogeneity compared to the other classes, for example a mean variance of 15.27 compared to 0 for clouds and 0.06 for clean ice. Results of the texture image for debris-covered areas were comparable with those from the decision tree algorithm, with 8% area difference between the two techniques.

  19. Porting Decision Tree Algorithms to Multicore using FastFlow

    CERN Document Server

    aldinucci, Marco; Torquati, Massimo

    2010-01-01

    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.

  20. An analysis and study of decision tree induction operating under adaptive mode to enhance accuracy and uptime in a dataset introduced to spontaneous variation in data attributes

    Directory of Open Access Journals (Sweden)

    Uttam Chauhan

    2011-01-01

    Full Text Available Many methods exist for the purpose of classification of an unknown dataset. Decision tree induction is one of the well-known methods for classification. Decision tree method operates under two different modes: nonadaptive and adaptive mode. The non adaptive mode of operation is applied when the data set is completely mature and available or the data set is static and their will be no changes in dataset attributes. However when the dataset is likely to have changes in the values and attributes leading to fluctuation i.e., monthly, quarterly or annually, then under the circumstances decision tree method operating under adaptive mode needs to be applied, as the conventional non-adaptive method fails, as it needs to be applied once again starting from scratch on the augmented dataset. This makes things expensive in terms of time and space. Sometimes attributesare added into the dataset, at the same time number of records also increases. This paper mainly studies the behavioral aspects of classification model particularly, when number of attr bute in dataset increase due to spontaneous changes in the value(s/attribute(s. Our investigative studies have shown that accuracy of decision tree model can be maintained when number of attributes including class increase in dataset which increases thenumber of records as well. In addition, accuracy also can be maintained when number of values increase in class attribute of dataset. The way Adaptive mode decision tree method operates is that it reads data instance by instance and incorporates the same through absorption to the said model; update the model according to valueof attribute particular and specific to the instance. As the time required to updating decision tree can be less than introducing it from scratch, thus eliminating the problem of introducing decision tree repeatedly from scratch and at the same time gaining upon memory and time.

  1. Totally Optimal Decision Trees for Monotone Boolean Functions with at Most Five Variables

    KAUST Repository

    Chikalov, Igor

    2013-01-01

    In this paper, we present the empirical results for relationships between time (depth) and space (number of nodes) complexity of decision trees computing monotone Boolean functions, with at most five variables. We use Dagger (a tool for optimization of decision trees and decision rules) to conduct experiments. We show that, for each monotone Boolean function with at most five variables, there exists a totally optimal decision tree which is optimal with respect to both depth and number of nodes.

  2. Quantifying human and organizational factors in accident management using decision trees: the HORAAM method

    International Nuclear Information System (INIS)

    In the framework of the level 2 Probabilistic Safety Study (PSA 2) project, the Institute for Nuclear Safety and Protection (IPSN) has developed a method for taking into account Human and Organizational Reliability Aspects during accident management. Actions are taken during very degraded installation operations by teams of experts in the French framework of Crisis Organization (ONC). After describing the background of the framework of the Level 2 PSA, the French specific Crisis Organization and the characteristics of human actions in the Accident Progression Event Tree, this paper describes the method developed to introduce in PSA the Human and Organizational Reliability Analysis in Accident Management (HORAAM). This method is based on the Decision Tree method and has gone through a number of steps in its development. The first one was the observation of crisis center exercises, in order to identify the main influence factors (IFs) which affect human and organizational reliability. These IFs were used as headings in the Decision Tree method. Expert judgment was used in order to verify the IFs, to rank them, and to estimate the value of the aggregated factors to simplify the quantification of the tree. A tool based on Mathematica was developed to increase the flexibility and the efficiency of the study

  3. Analisis Dan Perancangan Sistem Pendukung Keputusan Untuk Menghindari Kredit Macet (Non Performing Loan) Perbankan Menggunakan Algoritma Decision Tree

    OpenAIRE

    Sinuhaji, Andika Rafon

    2010-01-01

    A model of decision maker is needed to help people, especially to make a decission accurate, efficient, and effective, the model called decision support system. The aim of decision support system is to utilize the advantages of human and electronic instrument for solving various unstructured problems. The objective of this study is to avoid non performing loan credit in the proces of granting credit facility. Decision of the study by using decision tree method. The solution method consist of...

  4. Visualization method and tool for interactive learning of large decision trees

    Science.gov (United States)

    Nguyen, Trong Dung; Ho, TuBao

    2002-03-01

    When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficiently trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. This paper presents a visualization method and tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (stands for Tress 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for the issues (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.

  5. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  6. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  7. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  8. Fuzzy decision trees for planning and autonomous control of a coordinated team of UAVs

    Science.gov (United States)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A fuzzy logic resource manager that enables a collection of unmanned aerial vehicles (UAVs) to automatically cooperate to make meteorological measurements will be discussed. Once in flight no human intervention is required. Planning and real-time control algorithms determine the optimal trajectory and points each UAV will sample, while taking into account the UAVs' risk, risk tolerance, reliability, mission priority, fuel limitations, mission cost, and related uncertainties. The control algorithm permits newly obtained information about weather and other events to be introduced to allow the UAVs to be more effective. The approach is illustrated by a discussion of the fuzzy decision tree for UAV path assignment and related simulation. The different fuzzy membership functions on the tree are described in mathematical detail. The different methods by which this tree is obtained are summarized including a method based on using a genetic program as a data mining function. A second fuzzy decision tree that allows the UAVs to automatically collaborate without human intervention is discussed. This tree permits three different types of collaborative behavior between the UAVs. Simulations illustrating how the tree allows the different types of collaboration to be automated are provided. Simulations also show the ability of the control algorithm to allow UAVs to effectively cooperate to increase the UAV team's likelihood of success.

  9. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  10. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe;

    2013-01-01

    Tree (CART), has gained increasing interests because of its high performance in terms of computational efficiency, uncertainty manageability, and interpretability. This paper presents an overview of a variety of DT applications to power systems for better interfacing of power systems with data......The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression...... analytics. The fundamental knowledge of CART algorithm is also introduced which is then followed by examples of both classification tree and regression tree with the help of case study for security assessment of Danish power system....

  11. On the Complexity of Decision Making in Possibilistic Decision Trees

    CERN Document Server

    Fargier, Helene; Guezguez, Wided

    2012-01-01

    When the information about uncertainty cannot be quantified in a simple, probabilistic way, the topic of possibilistic decision theory is often a natural one to consider. The development of possibilistic decision theory has lead to a series of possibilistic criteria, e.g pessimistic possibilistic qualitative utility, possibilistic likely dominance, binary possibilistic utility and possibilistic Choquet integrals. This paper focuses on sequential decision making in possibilistic decision trees. It proposes a complexity study of the problem of finding an optimal strategy depending on the monotonicity property of the optimization criteria which allows the application of dynamic programming that offers a polytime reduction of the decision problem. It also shows that possibilistic Choquet integrals do not satisfy this property, and that in this case the optimization problem is NP - hard.

  12. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  13. A greedy algorithm for construction of decision trees for tables with many-valued decisions - A comparative study

    KAUST Repository

    Azad, Mohammad

    2013-11-25

    In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.

  14. Decision tree analysis of factors influencing rainfall-related building damage

    Directory of Open Access Journals (Sweden)

    M. H. Spekkers

    2014-04-01

    Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of

  15. Decision tree analysis of factors influencing rainfall-related building damage

    Science.gov (United States)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  16. Studies of Boosted Decision Trees for MiniBooNE Particle Identification

    OpenAIRE

    Yang, Hai-Jun; Roe, Byron P.; Zhu, Ji

    2005-01-01

    Boosted decision trees are applied to particle identification in the MiniBooNE experiment operated at Fermi National Accelerator Laboratory (Fermilab) for neutrino oscillations. Numerous attempts are made to tune the boosted decision trees, to compare performance of various boosting algorithms, and to select input variables for optimal performance.

  17. Independent component analysis and decision trees for ECG holter recording de-noising.

    Directory of Open Access Journals (Sweden)

    Jakub Kuzilek

    Full Text Available We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA. This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG, but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact were compared.

  18. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    Directory of Open Access Journals (Sweden)

    joko popo minardi

    2016-01-01

    Full Text Available Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not complete to get an appropriate diagnosis while the decision tree is used as a decision support tool reference tracking of disease symptoms This Research aims to develop an expert system that can perform a diagnosis of pregnancy using Dempster Shafer method, which can produce a trust value to a disease diagnosis. Based on the results of diagnostic testing Dempster-Shafer method and expert systems, the resulting accuracy of 76%.   Keywords: Expert system; Diseases of pregnancy; Dempster Shafer

  19. The use of decision trees in the classification of beach forms/patterns on IKONOS-2 data

    Science.gov (United States)

    Teodoro, A. C.; Ferreira, D.; Gonçalves, H.

    2013-10-01

    Evaluation of beach hydromorphological behaviour and its classification is highly complex. The available beach morphologic and classification models are mainly based on wave, tidal and sediment parameters. Since these parameters are usually unavailable for some regions - such as in the Portuguese coastal zone - a morphologic analysis using remotely sensed data seems to be a valid alternative. Data mining for spatial pattern recognition is the process of discovering useful information, such as patterns/forms, changes and significant structures from large amounts of data. This study focuses on the application of data mining techniques, particularly Decision Trees (DT), to an IKONOS-2 image in order to classify beach features/patterns, in a stretch of the northwest coast of Portugal. Based on the knowledge of the coastal features, five classes were defined: Sea, Suspended-Sediments, Breaking-Zone, Beachface and Beach. The dataset was randomly divided into training and validation subsets. Based on the analysis of several DT algorithms, the CART algorithm was found to be the most adequate and was thus applied. The performance of the DT algorithm was evaluated by the confusion matrix, overall accuracy, and Kappa coefficient. In the classification of beach features/patterns, the algorithm presented an overall accuracy of 98.2% and a kappa coefficient of 0.97. The DTs were compared with a neural network algorithm, and the results were in agreement. The methodology presented in this paper provides promising results and should be considered in further applications of beach forms/patterns classification.

  20. Decision Tree Classifiers for Star/Galaxy Separation

    CERN Document Server

    Vasconcellos, E C; Gal, R R; LaBarbera, F L; Capelato, H V; Velho, H F Campos; Trevisan, M; Ruiz, R S R

    2010-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14\\le r\\le21$ ($85.2%$) and $r\\ge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15\\le r\\le21$, with m...

  1. Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems

    CERN Document Server

    Gupta, Anupam; Nagarajan, Viswanath; Ravi, R

    2010-01-01

    We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disease, what is a good adaptive strategy to perform these tests to minimize the expected cost to identify the disease? We settle the approximability of this problem by giving a tight $O(\\log m)$-approximation algorithm. We also consider a more substantial generalization, the Adaptive TSP problem. Given an underlying metric space, a random subset $S$ of cities is drawn from a known distribution, but $S$ is initially unknown to us--we get information about whether any city is in $S$ only when we visit the city in question. What is a good adaptive way of visiting all the cities in the random subset $S$ while minimizing the expected distance traveled? For this problem, we give the first poly-logarithmic approximation, and show that this algorithm is best possible unless w...

  2. Efficient OCR using simple features and decision trees with backtracking

    International Nuclear Information System (INIS)

    In this paper, it is shown that it is adequate to use simple and easy-to-compute figures such as those we call sliced horizontal and vertical projections to solve the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman type face are used to train our system. Activating backtracking, smoothing and cropping achieved more than 98% successes rate for a recognition time below 30ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high successes rate and low error rate for highly polluted images, which is a result of backtracking, and smoothing and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers. (author)

  3. Extensions of Dynamic Programming: Decision Trees, Combinatorial Optimization, and Data Mining

    KAUST Repository

    Hussain, Shahid

    2016-07-10

    This thesis is devoted to the development of extensions of dynamic programming to the study of decision trees. The considered extensions allow us to make multi-stage optimization of decision trees relative to a sequence of cost functions, to count the number of optimal trees, and to study relationships: cost vs cost and cost vs uncertainty for decision trees by construction of the set of Pareto-optimal points for the corresponding bi-criteria optimization problem. The applications include study of totally optimal (simultaneously optimal relative to a number of cost functions) decision trees for Boolean functions, improvement of bounds on complexity of decision trees for diagnosis of circuits, study of time and memory trade-off for corner point detection, study of decision rules derived from decision trees, creation of new procedure (multi-pruning) for construction of classifiers, and comparison of heuristics for decision tree construction. Part of these extensions (multi-stage optimization) was generalized to well-known combinatorial optimization problems: matrix chain multiplication, binary search trees, global sequence alignment, and optimal paths in directed graphs.

  4. Construction of α-decision trees for tables with many-valued decisions

    KAUST Repository

    Moshkov, Mikhail

    2011-01-01

    The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.

  5. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    Science.gov (United States)

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  6. Minimization of decision tree depth for multi-label decision tables

    KAUST Repository

    Azad, Mohammad

    2014-10-01

    In this paper, we consider multi-label decision tables that have a set of decisions attached to each row. Our goal is to find one decision from the set of decisions for each row by using decision tree as our tool. Considering our target to minimize the depth of the decision tree, we devised various kinds of greedy algorithms as well as dynamic programming algorithm. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of depth of decision trees.

  7. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  8. 基于高分一号影像光谱指数识别火烧迹地的决策树方法%Decision Tree Method for Burned Area Identification Based on the Spectral Index of GF-1 WFV Image

    Institute of Scientific and Technical Information of China (English)

    祖笑锋; 覃先林; 尹凌宇; 陈小中; 钟祥清

    2015-01-01

    This paper describes the technique to be needed for rapidly and accurately identifying the burn-ed area by forest fires,following the catastrophic fires by the vegetation index CART decision tree methods using the wide coverage image of GF-1(GF-1 WFV).They were compared between the maximum likeli-hood classification of supervised and unsupervised classification(ISODATA),within burned area indexes, to improve the accuracy of the burned area,shaded vegetation index,global environment monitoring in-dex,improved shadows and bare commission or omission burned phenomenon.The results showed that the decision tree classification method based on CART algorithms for burned area identification has signifi-cantly improved the overall accuracy by 4.38% compared with the maximum likelihood method;Kappa coefficient increased by 0.1024.GF-1 satellite imagery for unsupervised classification(ISODATA)identi-fies the burned area poorly,the overall accuracy and Kappa coefficient are low,the map making accuracy and user accuracy have not reached 1%.%森林火灾发生后,为及时、准确地掌握森林受灾情况,利用高分一号卫星(GF -1)16m 宽幅影像各波段反射率信息,结合计算的归一化植被指数(NDVI)、过火区识别指数(BAI)、阴影植被指数(SVI)、归一化差异水体指数(NDWI)和全球环境监测指数(GEMI)等5种光谱指数,构建森林火烧迹地识别决策树模型(CART);在选取的研究区对该模型方法进行验证,并与最大似然监督分类法和非监督分类(ISODATA)方法所得到的结果精度进行了对比分析,结果表明:采用基于 CART 模型的决策树方法对火烧迹地识别结果精度较最大似然法总体分类精度提高了4.38%,Kappa 系数提高了0.1024,制图精度提高了14.96%,用户精度提高了8.50%;而采用ISODATA 方法识别的火烧迹地的总体精度和 Kappa 系数都较低,制图精度

  9. Using Decision Trees to Detect and Isolate Leaks in the J-2X

    Data.gov (United States)

    National Aeronautics and Space Administration — Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt...

  10. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions

    KAUST Repository

    Azad, Mohammad

    2014-09-13

    The paper is devoted to the analysis of greedy algorithms for the minimization of average depth of decision trees for decision tables such that each row is labeled with a set of decisions. The goal is to find one decision from the set of decisions. When we compare with the optimal result obtained from dynamic programming algorithm, we found some greedy algorithms produces results which are close to the optimal result for the minimization of average depth of decision trees.

  11. Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

    OpenAIRE

    De Rosa, Rocco

    2016-01-01

    Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have b...

  12. Constructing decision trees for user behavior prediction in the online consumer market

    OpenAIRE

    Fokin, Dennis; Hagrot, Joel

    2016-01-01

    This thesis intends to investigate the usefulness of various aspects of product data for user behavior prediction in the online shopping market. Specifically, a data set from BestBuy was used, containing information regarding what product a user clicked on given their search query. Decision trees are machine learning algorithms used for making predictions. The decision tree algorithm ID3 was used because of its simplicity and interpretability. It uses information gain to measure how different...

  13. Greedy heuristics for minimization of number of terminal nodes in decision trees

    KAUST Repository

    Hussain, Shahid

    2014-10-01

    This paper describes, in detail, several greedy heuristics for construction of decision trees. We study the number of terminal nodes of decision trees, which is closely related with the cardinality of the set of rules corresponding to the tree. We compare these heuristics empirically for two different types of datasets (datasets acquired from UCI ML Repository and randomly generated data) as well as compare with the optimal results obtained using dynamic programming method.

  14. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  15. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    Directory of Open Access Journals (Sweden)

    Trefz Florian M

    2012-12-01

    Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed

  16. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    Directory of Open Access Journals (Sweden)

    Somaya Hashem

    2016-01-01

    Full Text Available Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2 or advanced (F3-F4 fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy.

  17. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients.

    Science.gov (United States)

    Hashem, Somaya; Esmat, Gamal; Elakel, Wafaa; Habashy, Shahira; Abdel Raouf, Safaa; Darweesh, Samar; Soliman, Mohamad; Elhefnawi, Mohamed; El-Adawy, Mohamed; ElHefnawi, Mahmoud

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0-F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy. PMID:26880886

  18. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    Directory of Open Access Journals (Sweden)

    Malueka Rusdy

    2012-03-01

    Full Text Available Abstract Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.

  19. Production of diagnostic rules from a neurotologic database with decision trees.

    Science.gov (United States)

    Kentala, E; Viikki, K; Pyykkö, I; Juhola, M

    2000-02-01

    A decision tree is an artificial intelligence program that is adaptive and is closely related to a neural network, but can handle missing or nondecisive data in decision-making. Data on patients with Meniere's disease, vestibular schwannoma, traumatic vertigo, sudden deafness, benign paroxysmal positional vertigo, and vestibular neuritis were retrieved from the database of the otoneurologic expert system ONE for the development and testing of the accuracy of decision trees in the diagnostic workup. Decision trees were constructed separately for each disease. The accuracies of the best decision trees were 94%, 95%, 99%, 99%, 100%, and 100% for the respective diseases. The most important questions concerned the presence of vertigo, hearing loss, and tinnitus; duration of vertigo; frequency of vertigo attacks; severity of rotational vertigo; onset and type of hearing loss; and occurrence of head injury in relation to the timing of onset of vertigo. Meniere's disease was the most difficult to classify correctly. The validity and structure of the decision trees are easily comprehended and can be used outside the expert system. PMID:10685569

  20. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  1. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  2. Cost-effectiveness of exercise 201Tl myocardial SPECT in patients with chest pain assessed by decision-tree analysis

    International Nuclear Information System (INIS)

    To evaluate the potential cost-effectiveness of exercise 201Tl myocardial SPECT in outpatients with angina-like chest pain, we developed a decision-tree model which comprises three 1000-patients groups, i.e., a coronary arteriography (CAG) group, a follow-up group, and a SPECT group, and total cost and cardiac events, including cardiac deaths, were calculated. Variables used for the decision-tree analysis were obtained from references and the data available at out hospital. The sensitivity and specificity of 201Tl SPECT for diagnosing angina pectoris, and its prevalence were assumed to be 95%, 85%, and 33%, respectively. The mean costs were 84.9 x 104 yen/patient in the CAG group, 30.2 x 104 yen/patient in the follow-up group, and 71.0 x 104 yen/patient in the SPECT group. The numbers of cardiac events and cardiac deaths were 56 and 15, respectively in the CAG group, 264 and 81 in the follow-up group, and 65 and 17 in the SPECT group. SPECT increases cardiac events and cardiac deaths by 0.9% and 0.2%, but it reduces the number of CAG studies by 50.3%, and saves 13.8 x 104 yen/patient, as compared to the CAG group. In conclusion, the exercise 201Tl myocardial SPECT strategy for patients with chest pain has the potential to reduce health care costs in Japan. (author)

  3. P2P Domain Classification using Decision Tree

    CERN Document Server

    Ismail, Anis

    2011-01-01

    In Peer-to-Peer context, a challenging problem is how to find the appropriate peer to deal with a given query without overly consuming bandwidth? Different methods proposed routing strategies of queries taking into account the P2P network at hand. This paper considers an unstructured P2P system based on an organization of peers around Super-Peers that are connected to Super-Super- Peer according to their semantic domains; By analyzing the queries log file, a predictive model that avoids flooding queries in the P2P network is constructed after predicting the appropriate Super-Peer, and hence the peer to answer the query. A challenging problem in a schema-based Peer-to-Peer (P2P) system is how to locate peers that are relevant to a given query. In this paper, architecture, based on (Super-)Peers is proposed, focusing on query routing. The approach to be implemented, groups together (Super-)Peers that have similar interests for an efficient query routing method. In such groups, called Super-Super-Peers (SSP), Su...

  4. Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies Nonparametric decision tree: The impact of ISO 9000 on certified and non certified companies

    Directory of Open Access Journals (Sweden)

    Joaquín Texeira Quirós

    2013-09-01

    Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies

  5. Relationships between average depth and number of misclassifications for decision trees

    KAUST Repository

    Chikalov, Igor

    2014-02-14

    This paper presents a new tool for the study of relationships between the total path length or the average depth and the number of misclassifications for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [9] and datasets representing Boolean functions with 10 variables.

  6. Relationships Between Average Depth and Number of Nodes for Decision Trees

    KAUST Repository

    Chikalov, Igor

    2013-07-24

    This paper presents a new tool for the study of relationships between total path length or average depth and number of nodes of decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1]. © Springer-Verlag Berlin Heidelberg 2014.

  7. Binary Decision Tree Development for Probabilistic Safety Assessment Applications

    International Nuclear Information System (INIS)

    The aim of this article is to describe state of the development for the relatively new approach in the probabilistic safety analysis (PSA). This approach is based on the application of binary decision diagrams (BDD) representation for the logical function on the quantitative and qualitative analysis of complex systems that are presented by fault trees and event trees in the PSA applied for the nuclear power plants risk determination. Even BDD approach offers full solution comparing to the partial one from the conventional quantification approach there are still problems to be solved before new approach could be fully implemented. Major problem with full application of BDD is difficulty of getting any solution for the PSA models of certain complexity. This paper is comparing two approaches in PSA quantification. Major focus of the paper is description of in-house developed BDD application with implementation of the original algorithms. Resulting number of nodes required to represent the BDD is extremely sensitive to the chosen order of variables (i.e., basic events in PSA). The problem of finding an optimal order of variables that form the BDD falls under the class of NP-complete complexity. This paper presents an original approach to the problem of finding the initial order of variables utilized for the BDD construction by various dynamical reordering schemes. Main advantage of this approach compared to the known methods of finding the initial order is with better results in respect to the required working memory and time needed to finish the BDD construction. Developed method is compared against results from well known methods such as depth-first, breadth-first search procedures. Described method may be applied in finding of an initial order for fault trees/event trees being created from basic events by means of logical operations (e.g. negation, and, or, exclusive or). With some testing models a significant reduction of used memory has been achieved, sometimes

  8. Detecting subcanopy invasive plant species in tropical rainforest by integrating optical and microwave (InSAR/PolInSAR) remote sensing data, and a decision tree algorithm

    Science.gov (United States)

    Ghulam, Abduwasit; Porton, Ingrid; Freeman, Karen

    2014-02-01

    In this paper, we propose a decision tree algorithm to characterize spatial extent and spectral features of invasive plant species (i.e., guava, Madagascar cardamom, and Molucca raspberry) in tropical rainforests by integrating datasets from passive and active remote sensing sensors. The decision tree algorithm is based on a number of input variables including matching score and infeasibility images from Mixture Tuned Matched Filtering (MTMF), land-cover maps, tree height information derived from high resolution stereo imagery, polarimetric feature images, Radar Forest Degradation Index (RFDI), polarimetric and InSAR coherence and phase difference images. Spatial distributions of the study organisms are mapped using pixel-based Winner-Takes-All (WTA) algorithm, object oriented feature extraction, spectral unmixing, and compared with the newly developed decision tree approach. Our results show that the InSAR phase difference and PolInSAR HH-VV coherence images of L-band PALSAR data are the most important variables following the MTMF outputs in mapping subcanopy invasive plant species in tropical rainforest. We also show that the three types of invasive plants alone occupy about 17.6% of the Betampona Nature Reserve (BNR) while mixed forest, shrubland and grassland areas are summed to 11.9% of the reserve. This work presents the first systematic attempt to evaluate forest degradation, habitat quality and invasive plant statistics in the BNR, and provides significant insights as to management strategies for the control of invasive plants and conversation in the reserve.

  9. Utilizing Home Healthcare Electronic Health Records for Telehomecare Patients With Heart Failure: A Decision Tree Approach to Detect Associations With Rehospitalizations.

    Science.gov (United States)

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H

    2016-04-01

    Heart failure is a complex condition with a significant impact on patients' lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date, there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home healthcare electronic medical record called the Outcome and Assessment Information Set-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issues was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient's living situation, patient's overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits combined. Examining risk factors for rehospitalization from the Outcome and Assessment Information Set-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomecare or who need additional supports. PMID:26848645

  10. Soft context clustering for F0 modeling in HMM-based speech synthesis

    Science.gov (United States)

    Khorram, Soheil; Sameti, Hossein; King, Simon

    2015-12-01

    This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure

  11. Transient Stability Assessment using Decision Trees and Fuzzy Logic Techniques

    OpenAIRE

    A. Y. Abdelaziz; M. A. El-Dessouki

    2013-01-01

    Many techniques are used for Transient Stability assessment (TSA) of synchronous generators encompassing traditional time domain state numerical integration, Lyapunov based methods, probabilistic approaches and Artificial Intelligence (AI) techniques like pattern recognition and artificial neural networks.This paper examines another two proposed artificial intelligence techniques to tackle the transient stability problem. The first technique is based on the Inductive Inference Reasoning (IIR)...

  12. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    Science.gov (United States)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  13. Condition monitoring on grinding wheel wear using wavelet analysis and decision tree C4.5 algorithm

    Directory of Open Access Journals (Sweden)

    S.Devendiran

    2013-10-01

    Full Text Available A new online grinding wheel wear monitoring approach to detect a worn out wheel, based on acoustic emission (AE signals processed by discrete wavelet transform and statistical feature extraction carried out using statistical features such as root mean square and standard deviation for each wavelet decomposition level and classified using tree based knowledge representation methodology decision tree C4.5 data mining techniques is proposed. The methodology was validate with AE signal data obtained in Aluminium oxide 99 A(38A grinding wheel which is used in three quarters of majority grinding operations under different grinding conditions to validate the proposed classification system. The results of this scheme with respect to classification accuracy were discussed.

  14. Nitrogen removal influence factors in A/O process and decision trees for nitrification/denitrification system

    Institute of Scientific and Technical Information of China (English)

    MA Yong; PENG Yong-zhen; WANG Shu-ying; WANG Xiao-lian

    2004-01-01

    In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.

  15. An Approach of Improving Student’s Academic Performance by using K-means clustering algorithm and Decision tree

    Directory of Open Access Journals (Sweden)

    Hedayetul Islam Shovon

    2012-08-01

    Full Text Available Improving student’s academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab -work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student’s GPA and based on that instructor can take necessary step to improve student academic performance

  16. Comparative Analysis of Serial Decision Tree Classification Algorithms

    OpenAIRE

    Matthew Nwokejizie Anyanwu; Sajjan Shiva

    2009-01-01

    Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card ra...

  17. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    Science.gov (United States)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  18. Enchancing And Deriving Actionable Knowledge From Decision Trees

    OpenAIRE

    P. Senthil Vadivu; Vasantha Kalyani David

    2010-01-01

    Data mining algorithms are used to discover customer models for distribution information, Using customer profiles in customer relationship management (CRM), it has been used in pointing out the customers who are loyal and who are attritors but they require human experts for discovering knowledge manually. Many post processing technique have been introduced that do not suggest action to increase the objective function such as profit. In this paper, a novel algorithm is proposed that suggest ac...

  19. Transient Stability Assessment using Decision Trees and Fuzzy Logic Techniques

    Directory of Open Access Journals (Sweden)

    A. Y. Abdelaziz

    2013-09-01

    Full Text Available Many techniques are used for Transient Stability assessment (TSA of synchronous generators encompassing traditional time domain state numerical integration, Lyapunov based methods, probabilistic approaches and Artificial Intelligence (AI techniques like pattern recognition and artificial neural networks.This paper examines another two proposed artificial intelligence techniques to tackle the transient stability problem. The first technique is based on the Inductive Inference Reasoning (IIR approach which belongs to a particular family of machine learning from examples. The second presents a simple fuzzy logic classifier system for TSA. Not only steady state but transient attributes are used for transient stability estimation so as to reflect machine dynamics and network changes due to faults.The two techniques are tested on a standard test power system. The performance evaluation demonstrated satisfactory results in early detection of machine instability. The advantage of the two techniques is that they are straightforward and simple for on-line implementation.

  20. Social Impact on Android Applications using Decision Tree

    OpenAIRE

    Waseem Iqbal; Muhammad Arfan; Muhammad Asif

    2015-01-01

    Mobile phones have evolved very rapidly from black and white to smart phones. Google has launched Android operating system (OS), based on Linux targeting the smart phones. After this, people became addicted to these smart phones due to the facilities provided by these phones. But the security leaks possess in Android are the big hurdle to use it in a secured way. The Android operating system is mostly used because it is an Open Source/freeware and most of its applications are also freely avai...

  1. Measurement of the t-channel single top-quark production using boosted decision trees in ATLAS experiment at √(s)=7 TeV

    International Nuclear Information System (INIS)

    This thesis presents a measurement of the cross section of t-channel single top-quark production using 1.04 fb-1 data collected by the ATLAS detector at the LHC with proton-proton collision at center-of-mass √(s)=7 TeV. Selected events contain one lepton, missing transverse energy, and two or three jets, one of them b-tagged. The background model consists of multi-jets, W+jets and top-quark pair events, with smaller contributions from Z+jets and di-boson events. By using a selection based on the distribution of a multivariate discriminant constructed with the boosted decision trees, the cross section of t-channel single top-quark production is measured: σt = (97.3 +30.7 -30.2) pb, which is in good agreement with the prediction of the Standard Model. Assuming that the top-quark-related CKM matrix elements obey the relation |Vtb|>> |Vts|, |Vtd|, the coupling strength at the Wtb vertex is extracted from the measured cross section, |Vtb| = (1.23 +0.20 -0.19). If it is assumed that |Vtb| ≤ 1 a lower limit of |Vtb| > 0.61 is obtained at the 95% confidence level. (author)

  2. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  3. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    Science.gov (United States)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  4. 'Misclassification error' greedy heuristic to construct decision trees for inconsistent decision tables

    KAUST Repository

    Azad, Mohammad

    2014-01-01

    A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.

  5. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    International Nuclear Information System (INIS)

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  6. An Examination of Mathematically Gifted Students' Learning Styles by Decision Trees

    OpenAIRE

    Esra Aksoy; Serkan Narlı

    2015-01-01

    The aim of this study was to examine mathematically gifted students' learning styles through data mining method. ‘Learning Style Inventory’ and ‘Multiple Intelligences Scale’ were used to collect data. The sample included 234 mathematically gifted middle school students. The construct decision tree was examined predicting mathematically gifted students’ learning styles according to their multiple intelligences and gender and grade level. Results showed that all t...

  7. Comparison of the Bayesian and Randomised Decision Tree Ensembles within an Uncertainty Envelope Technique

    OpenAIRE

    Schetinin, Vitaly; Fieldsend, Jonathan E.; Partridge, Derek; Krzanowski, Wojtek J.; Everson, Richard M.; Bailey, Trevor C; Hernandez, Adolfo

    2005-01-01

    Multiple Classifier Systems (MCSs) allow evaluation of the uncertainty of classification outcomes that is of crucial importance for safety critical applications. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the classifier diversity and the required performance. The interpretability of MCSs can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem t...

  8. Normal form backward induction for decision trees with coherent lower previsions.

    OpenAIRE

    Huntley, Nathan; Troffaes, Matthias C. M.

    2012-01-01

    We examine normal form solutions of decision trees under typical choice functions induced by lower previsions. For large trees, finding such solutions is hard as very many strategies must be considered. In an earlier paper, we extended backward induction to arbitrary choice functions, yielding far more efficient solutions, and we identified simple necessary and sufficient conditions for this to work. In this paper, we show that backward induction works for maximality and E-admissibility, but ...

  9. Deeper understanding of Flaviviruses including Zika virus by using Apriori Algorithm and Decision Tree

    Directory of Open Access Journals (Sweden)

    Yang Youjin

    2016-01-01

    Full Text Available Zika virus is spreaded by mosquito. There is high probability of Microcephaly. In 1947, the virus was first found from Uganda, but it has broken outall around world, specially North and south America. So, apriori algorithm and decision tree were used to compare polyprotein sequences of zika virus among other flavivirus; Yellow fever, West Nile virus, Dengue virus, Tick borne encephalitis. By this, dissimilarity and similarity about them were found.

  10. Scenario Analysis, Decision Trees and Simulation for Cost Benefit Analysis of the Cargo Screening Process

    OpenAIRE

    Sherman, Galina; Siebers, Peer-Olaf; Aickelin, Uwe; Menachof, David

    2013-01-01

    In this paper we present our ideas for conducting a cost benefit analysis by using three different methods: scenario analysis, decision trees and simulation. Then we introduce our case study and examine these methods in a real world situation. We show how these tools can be used and what the results are for each of them. Our aim is to conduct a comparison of these different probabilistic methods of estimating costs for port security risk assessment studies. Methodologically, we are trying ...

  11. A Decision Tree Approach to Classify Web Services using Quality Parameters

    OpenAIRE

    Sonawani, Shilpa; Mukhopadhyay, Debajyoti

    2013-01-01

    With the increase in the number of web services, many web services are available on internet providing the same functionality, making it difficult to choose the best one, fulfilling users all requirements. This problem can be solved by considering the quality of web services to distinguish functionally similar web services. Nine different quality parameters are considered. Web services can be classified and ranked using decision tree approach since they do not require long training period and...

  12. 基于三期决策树分析平台建立护理质量综合评价体系%Establishment of comprehensive evaluation system of nursing quality based on three stage decision tree analysis plat-form

    Institute of Scientific and Technical Information of China (English)

    吴疆; 肖红著; 夏丽娅; 伍艳玲; 甘露; 桂文芳; 李劼; 邓晖

    2016-01-01

    [目的]运用决策树法客观、准确、快速地构建护理质量综合评价的等级评估平台。[方法]运用 SPSS18.0中决策树卡方自动交互检测法对各病区单元护理种类、数量、技术风险等级、护理人力配置和能级状况完成综合评价和分类分析,并按以上因素的综合分布情况将病区单元归属为不同等级的护理集群,构建三期决策树分析平台。[结果]建立能兼顾护理工作量、技术风险程度、护理人员配置和能级状况分类分析的三期决策树护理质量综合评价体系。[结论]建立在三期决策树分析平台上的护理质量综合评价体系,分类功能强大,平台建立精准、便捷,分类灵活。%Objective:Using decision tree to objectively,accurately,quickly set up a grade evaluation platform of comprehensive evaluation system of nursing quality.Methods:In SPSS18.0,the decision tree method was used to analyze the types,quantity,technical risk level,nursing manpower allocation and energy level of all units.And according to the above factors,the comprehensive distribution of the unit belongs to the different grades of nurs-ing group,and the three stage decision making power analysis platform was built.Results:A comprehensive e-valuation system of three stage of decision making tree nursing quality was established which took into account the workload of nursing,technical risk level,nursing staff configuration and energy level classification analysis. Conclusion:The comprehensive evaluation system of nursing quality on the three stage decision making analy-sis platform was established.Its classification function was powerful,and the platform was established accurate-ly,conveniently and flexiblely.

  13. Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.

    Science.gov (United States)

    Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K

    2008-07-01

    A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made. PMID:18632325

  14. Performance Evaluation of Discriminant Analysis and Decision Tree, for Weed Classification of Potato Fields

    Directory of Open Access Journals (Sweden)

    Farshad Vesali

    2012-09-01

    Full Text Available In present study we tried to recognizing weeds in potato fields to effective use from herbicides. As we know potato is one of the crops which is cultivated vastly all over the world and it is a major world food crop that is consumed by over one billion people world over, but it is threated by weed invade, because of row cropping system applied in potato tillage. Machine vision is used in this research for effective application of herbicides in field. About 300 color images from 3 potato farms of Qorveh city and 2 farms of Urmia University-Iran, was acquired. Images were acquired in different illumination condition from morning to evening in sunny and cloudy days. Because of overlap and shading of plants in farm condition it is hard to use morphologic parameters. In method used for classifying weeds and potato plants, primary color components of each plant were extracted and the relation between them was estimated for determining discriminant function and classifying plants using discrimination analysis. In addition the decision tree method was used to compare results with discriminant analysis. Three different classifications were applied: first, Classification was applied to discriminate potato plant from all other weeds (two groups, the rate of correct classification was 76.67% for discriminant analysis and 83.82% for decision tree; second classification was applied to discriminate potato plant from separate groups of each weed (6 groups, the rate of correct classification was 87%. And the third, Classification of potato plant versus weed species one by one. As the weeds were different, the results of classification were different in this composition. The decision tree in all conditions showed the better result than discriminant analysis.

  15. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    Energy Technology Data Exchange (ETDEWEB)

    Elmer, John; Butherus, Michael [S.M. Stoller Corporation (United States); Barr, Deborah L. [U.S. Department of Energy Office of Legacy Management (United States)

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining

  16. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    Science.gov (United States)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  17. Snow event classification with a 2D video disdrometer - A decision tree approach

    Science.gov (United States)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  18. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  19. Comparison of Attribute Reduction Methods for Coronary Heart Disease Data by Decision Tree Classification

    Institute of Scientific and Technical Information of China (English)

    ZHENG Gang; HUANG Yalou; WANG Pengtao; SHU Guangfu

    2005-01-01

    Attribute reduction is necessary in decision making system. Selecting right attribute reduction method is more important. This paper studies the reduction effects of principal components analysis (PCA) and system reconstruction analysis (SRA) on coronary heart disease data. The data set contains 1723 records, and 71 attributes in each record. PCA and SRA are used to reduce attributes number (less than 71 ) in the data set. And then decision tree algorithms, C4.5, classification and regression tree ( CART), and chi-square automatic interaction detector ( CHAID), are adopted to analyze the raw data and attribute reduced data. The parameters of decision tree algorithms, including internal node number, maximum tree depth, leaves number, and correction rate are analyzed. The result indicates that, PCA and SRA data can complete attribute reduction work,and the decision-making rate on the reduced data is quicker than that on the raw data; the reduction effect of PCA is better than that of SRA, while the attribute assertion of SRA is better than that of PCA. PCA and SRA methods exhibit goodperformance in selecting and reducing attributes.

  20. Effective use of Fibro Test to generate decision trees in hepatitis C

    Institute of Scientific and Technical Information of China (English)

    Dana Lau-Corona; Luís Alberto Pineda; Héctor Hugo Aviés; Gabriela Gutiérrez-Reyes; Blanca Eugenia Farfan-Labonne; Rafael Núnez-Nateras; Alan Bonder; Rosalinda Martínez-García; Clara Corona-Lau; Marco Antonio Olivera-Martíanez; Maria Concepción Gutiérrez-Ruiz; Guillermo Robles-Díaz; David Kershenobich

    2009-01-01

    AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein,haptoglobin, α2 macroglobulin, and γ-glutamyl FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of F0 and F4 were classified with very high accuracy (18/20 for F0, 9/9 for F0-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in F0 and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression.transpeptidase were used as predictors, and the FibroTest

  1. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    Science.gov (United States)

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods. PMID:26737960

  2. The effect of the fragmentation problem in decision tree learning applied to the search for single top quark production

    International Nuclear Information System (INIS)

    Decision tree learning constitutes a suitable approach to classification due to its ability to partition the variable space into regions of class-uniform events, while providing a structure amenable to interpretation, in contrast to other methods such as neural networks. But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system called DTFE, for Decision Tree Fragmentation Evaluator, that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large and similar backgrounds, low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of data fragmentation.

  3. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  4. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    Science.gov (United States)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  5. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    Science.gov (United States)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  6. New energy opinion leaders' lifestyles and media usage - applying data mining decision tree analysis for UNIDO - ICHET web site users

    International Nuclear Information System (INIS)

    According to the innovation diffusion research, the innovators, opinion leaders, and diffusion agents play vital roles in promoting the acceptance of innovation. The innovators and opinion leaders must be able to cope with the high degree of uncertainty about an innovation and usually they have higher innovation-related media usage than the majority. Based on consumer behavior studies, lifestyle analysis could help researchers divide consumers into different lifestyle groups to understand and predict consumer behaviors. Lifestyle allows researchers to investigate consumers via their activities, interests and opinions instead of using demographic variables. The purpose of this research is to investigate how new energy innovators and opinion leaders' different lifestyles affect their new energy product adoption, and their media usage regarding new energy reports or promotion. In order to achieve the purposes listed above, the researchers need to locate and contact the potential innovators and opinion leaders in this field. Thus the researchers cooperate with UNIDO-ICHET to launch this survey. This cross-discipline online survey was formally launched from Aug 2005 to Oct 2006. The result of this survey successfully collected 2040 new energy innovators and opinion leaders' information. The researchers analyzed the data using SPSS statistics software and Data Mining decision tree analysis. Then the researchers divided new energy innovators into four groups: social-oriented, young modern, conservative, and show-off-oriented. They also analyzed which lifestyle groups are better targets for innovation agencies to launch innovation-related promotions or campaigns

  7. Decision tree method applied to computerized prediction of ternary intermetallic compounds

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Decision tree method and atomic parameters were used to find the regularities of the formation of ternary intermetallic compounds in alloy systems. The criteria of formation can be expressed by a group of inequalities with two kinds of atomic parameters Zl (number of valence electrons in the atom of constituent element) and Ri/Rj (ratio of the atomic radius of constituent element i and j) as independent variables. The data of 2238 known ternary alloy systems were used to extract the empirical rules governing the formation of ternary intermetallic compounds, and the facts of ternary compound formation of other 1334 alloy systems were used as samples to test the reliability of the empirical criteria found. The rate of correctness of prediction was found to be nearly 95%. An expert system for ternary intermetallic compound formation was built and some prediction results of the expert system were confirmed.

  8. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  9. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    CERN Document Server

    Rajendran, P

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper. The frequent patterns from the CT scan images are generated by frequent pattern tree (FP-Tree) algorithm that mines the association rules. The decision tree method has been used to classify the medical images for diagnosis. This system enhances the classification process to be more accurate. The hybrid method improves the efficiency of the proposed method than the traditional image mining methods. The experimental result on prediagnosed database of brain images showed 97% sensitivity and 95% accuracy respectively. The ph...

  10. Multi-output decision trees for lesion segmentation in multiple sclerosis

    Science.gov (United States)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  11. Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?

    Directory of Open Access Journals (Sweden)

    Robert Haworth

    2010-01-01

    Full Text Available The popularity and usage of digital games has increased in recent years, bringing further attention to their design. Some digital games require a significant use of higher order thought processes, such as problem solving and reflective and analytical thinking. Through the use of appropriate and interactive representations, these thought processes could be supported. A visualization of the game's internal structure is an example of this. However, it is unknown whether including these extra representations will have a negative effect on gameplay. To investigate this issue, a digital maze-like game was designed with its underlying structure represented as a decision tree. A qualitative, exploratory study with children was performed to examine whether the tree supported their thought processes and what effects, if any, the tree had on gameplay. This paper reports the findings of this research and discusses the implications for the design of games in general.

  12. Simulation of human behavior elements in a virtual world using decision trees

    Directory of Open Access Journals (Sweden)

    Sandra Mercado Pérez

    2013-05-01

    Full Text Available Human behavior refers to the way an individual responds to certain events or occurrences, naturally cannot predict how an individual can act, for it the computer simulation is used. This paper presents the development of the simulation of five possible human reactions within a virtual world, as well as the steps needed to create a decision tree that supports the selection of any of any of these reactions. For that creation it proposes three types of attributes, those are the personality, the environment and the level of reaction. The virtual world Second Life was selected because of its internal programming language LSL (Linden Scripting Language which allows the execution of predefined animation sequences or creates your own.

  13. Normal form backward induction for decision trees with coherent lower previsions

    CERN Document Server

    Huntley, Nathan

    2011-01-01

    We examine normal form solutions of decision trees under typical choice functions induced by lower previsions. For large trees, finding such solutions is hard as very many strategies must be considered. In an earlier paper, we extended backward induction to arbitrary choice functions, yielding far more efficient solutions, and we identified simple necessary and sufficient conditions for this to work. In this paper, we show that backward induction works for maximality and E-admissibility, but not for interval dominance and Gamma-maximin. We also show that, in some situations, a computationally cheap approximation of a choice function can be used, even if the approximation violates the conditions for backward induction; for instance, interval dominance with backward induction will yield at least all maximal normal form solutions.

  14. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  15. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree

    DEFF Research Database (Denmark)

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva;

    2004-01-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study to...

  16. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  17. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Monica [Brown Univ., Providence, RI (United States)

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the

  18. 小波分析和决策树在低饱和度气层识别中的应用%Applying the wavelet analysis and decision tree to identify low-saturation natural gas

    Institute of Scientific and Technical Information of China (English)

    贺旭; 李雄炎; 周金煜; 于红岩

    2011-01-01

    The particular reservoir condition and low-amplitude structural trap generate the abundant low saturation natural gas in the Quaternary ot the Sanhu area in the Qaidam basin. It is difficult to accurately delineate reservoirs because of the poor reservoir properties, thin reservoir thickness and limitations of surrounding rocks and logging instrument resolution. The effects of the high shale content, high irreducible water saturation, high formation water salinity, and clay minerals result in the log curves show much ambiguity at Iow-saturation natural gas, so that the identification of low-saturation natural gas is particularly difficult. To solve this problem, this work uses wavelet analysis to reconstruct log curves in order to improve the vertical resolution, makes a comparative analysis with the imaging logging data, and uses improved log curves to accurately delineate reservoirs. At the same time, we employ the decision tree to set up the predictive model of low-saturation natural gas based on the transparency of learning process and intelligibility of study results of the decision tree. This study amends the predictive model based on actual characteristics of reservoirs and achieves the purpose of an accurate identification of low-saturation natural gas. Practical application shows that the wavelet analysis and decision tree can effectively solve the reservoir delineationand identification of low-saturation natural gas problem in the research area.%特殊的成藏条件和低幅度构造圈闭致使柴达木盆地三湖地区第四系存在大量的低饱和度气藏.储层物性较差,储层厚度偏薄,受围岩和测井仪器分辨率的限制,难以准确划分储层;高泥质含量、高束缚水饱和度、高地层水矿化度和粘土矿物的影响,致使测井曲线在低饱和度气层表现出许多模糊性,使低饱和废气层的识别显得尤为困难.针对这一问题,文章采用小波分析对测井曲线进行重构,以提高测井曲

  19. Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

    Science.gov (United States)

    Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

    2016-05-01

    Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively. PMID:27250158

  20. Models, methods and software for distributed knowledge acquisition for the automated construction of integrated expert systems knowledge bases

    International Nuclear Information System (INIS)

    Based on an analysis of existing models, methods and means of acquiring knowledge, a base method of automated knowledge acquisition has been chosen. On the base of this method, a new approach to integrate information acquired from knowledge sources of different typologies has been proposed, and the concept of a distributed knowledge acquisition with the aim of computerized formation of the most complete and consistent models of problem areas has been introduced. An original algorithm for distributed knowledge acquisition from databases, based on the construction of binary decision trees has been developed

  1. A Fuzzy Optimization Technique for the Prediction of Coronary Heart Disease Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Persi Pamela. I

    2013-06-01

    Full Text Available Data mining along with soft computing techniques helps to unravel hidden relationships and diagnose diseases efficiently even with uncertainties and inaccuracies. Coronary Heart Disease (CHD is akiller disease leading to heart attack and sudden deaths. Since the diagnosis involves vague symptoms and tedious procedures, diagnosis is usually time-consuming and false diagnosis may occur. A fuzzy system is one of the soft computing methodologies is proposed in this paper along with a data mining technique for efficient diagnosis of coronary heart disease. Though the database has 76 attributes, only 14 attributes are found to be efficient for CHD diagnosis as per all the published experiments and doctors’ opinion. So only the essential attributes are taken from the heart disease database. From these attributes crisp rules are obtained by employing CART decision tree algorithm, which are then applied to the fuzzy system. A Particle Swarm Optimization (PSO technique is applied for the optimization of the fuzzy membership functions where the parameters of the membership functions are altered to new positions. The result interpreted from the fuzzy system predicts the prevalence of coronary heart disease and also the system’s accuracy was found to be good.

  2. Learning Dispatching Rules for Scheduling: A Synergistic View Comprising Decision Trees, Tabu Search and Simulation

    Directory of Open Access Journals (Sweden)

    Atif Shahzad

    2016-02-01

    Full Text Available A promising approach for an effective shop scheduling that synergizes the benefits of the combinatorial optimization, supervised learning and discrete-event simulation is presented. Though dispatching rules are in widely used by shop scheduling practitioners, only ordinary performance rules are known; hence, dynamic generation of dispatching rules is desired to make them more effective in changing shop conditions. Meta-heuristics are able to perform quite well and carry more knowledge of the problem domain, however at the cost of prohibitive computational effort in real-time. The primary purpose of this research lies in an offline extraction of this domain knowledge using decision trees to generate simple if-then rules that subsequently act as dispatching rules for scheduling in an online manner. We used similarity index to identify parametric and structural similarity in problem instances in order to implicitly support the learning algorithm for effective rule generation and quality index for relative ranking of the dispatching decisions. Maximum lateness is used as the scheduling objective in a job shop scheduling environment.

  3. Effect of training characteristics on object classification: an application using Boosted Decision Trees

    CERN Document Server

    Sevilla-Noarbe, Ignacio

    2015-01-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects tha...

  4. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    Science.gov (United States)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  5. Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    CERN Document Server

    Ball, N M; Myers, A D; Tcheng, D; Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-01-01

    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness an...

  6. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Zhiyi [China Inst. of Atomic Energy (CIAE), Beijing (China)

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  7. Transportation Mode Choice Analysis Based on Classification Methods

    OpenAIRE

    Zeņina, N; Borisovs, A

    2011-01-01

    Mode choice analysis has received the most attention among discrete choice problems in travel behavior literature. Most traditional mode choice models are based on the principle of random utility maximization derived from econometric theory. This paper investigates performance of mode choice analysis with classification methods - decision trees, discriminant analysis and multinomial logit. Experimental results have demonstrated satisfactory quality of classification.

  8. Agent Based Model of Livestock Movements

    Science.gov (United States)

    Miron, D. J.; Emelyanova, I. V.; Donald, G. E.; Garner, G. M.

    The modelling of livestock movements within Australia is of national importance for the purposes of the management and control of exotic disease spread, infrastructure development and the economic forecasting of livestock markets. In this paper an agent based model for the forecasting of livestock movements is presented. This models livestock movements from farm to farm through a saleyard. The decision of farmers to sell or buy cattle is often complex and involves many factors such as climate forecast, commodity prices, the type of farm enterprise, the number of animals available and associated off-shore effects. In this model the farm agent's intelligence is implemented using a fuzzy decision tree that utilises two of these factors. These two factors are the livestock price fetched at the last sale and the number of stock on the farm. On each iteration of the model farms choose either to buy, sell or abstain from the market thus creating an artificial supply and demand. The buyers and sellers then congregate at the saleyard where livestock are auctioned using a second price sealed bid. The price time series output by the model exhibits properties similar to those found in real livestock markets.

  9. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  10. A Study on Fraud Detection Based on Data Mining Using Decision Tree

    Directory of Open Access Journals (Sweden)

    A. N. Pathak

    2011-05-01

    Full Text Available Fraud is a million dollar business and it is increasing every year. The U.S. identity fraud incidence rate increased in 2008 returning to levels unseen since 2003. Almost 10 million Americans learned they were victims of identity (ID fraud in 2008 which is up from 8.1 million victims in 2007. More consumers are becoming identity (ID fraud victims reversing the previous trend in which identity (ID fraud had been gradually decreasing. This reverse makes sense since overall criminal activity tends to increase where there is a recession. Fraud involves one or more persons who intentionally act secretly to deprive another of something of value, for their own benefit. Fraud is as old as humanity itself and can take an unlimited variety of different forms. However, in recent years, the development of new technologies has also provided further ways in which criminals may commit fraud (Bolton and Hand 2002. In addition to that, business reengineering, reorganization or downsizing may weaken or eliminate control, while new information systems may present additional opportunities to commit fraud.

  11. Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree

    OpenAIRE

    Huo, Q.; Ma, B.

    1999-01-01

    We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-t...

  12. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  13. Analysis of Human Papillomavirus Using Datamining - Apriori, Decision Tree, and Support Vector Machine (SVM and its Application Field

    Directory of Open Access Journals (Sweden)

    Cho Younghoon

    2016-01-01

    Full Text Available Human Papillomavirus(HPV has various types (compared to other viruses and plays a key role in evoking diverse diseases, especially cervical cancer. In this study, we aim to distinguish the features of HPV of different degree of fatality by analyzing their DNA sequences. We used Decision Tree Algorithm, Apriori Algorithm, and Support Vector Machine in our experiment. By analyzing their DNA sequences, we discovered some relationships between certain types of HPV, especially on the most fatal types, 16 and 18. Moreover, we concluded that it would be possible for scientists to develop more potent HPV cures by applying these relationships and features that HPV virus exhibit.

  14. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  15. Modelling alcohol consumption during adolescence using zero inflated negative binomial and decision trees

    OpenAIRE

    Alfonso Palmer; Jona Roca; Berta Cajal; Elena Gervilla

    2010-01-01

    Alcohol is currently the most consumed substance among the Spanish adolescent population. Some of the variables that bear an influence on this consumption include ease of access, use of alcohol by friends and some personality factors. The aim of this study was to analyze and quantify the predictive value of these variables specifically on alcohol consumption in the adolescent population. The useful sample was made up of 6,145 adolescents (49.8% boys and 50.2% girls) with a mean age of 15.4 ye...

  16. Introducing a Model for Suspicious Behaviors Detection in Electronic Banking by Using Decision Tree Algorithms

    OpenAIRE

    Rohulla Kosari Langari; Nasrolla Moghaddam; Davood Vahdat

    2014-01-01

    Change the world through information technology and Internet development, has created competitive knowledge in the field of electronic commerce, lead to increasing in competitive potential among organizations. In this condition The increasing rate of commercial deals developing guaranteed with speed and light quality is due to provide dynamic system of electronic banking until by using modern technology to facilitate electronic business process. Internet banking is enumerate as a potential op...

  17. 基于优化的决策树算法在热轧工艺中的应用%Application of improved decision tree on the hot rolling process

    Institute of Scientific and Technical Information of China (English)

    钟蜜; 刘斌

    2011-01-01

    Decision tree classification method is a very effective machine learning methods, with a classification of high precision, good noise robustness of the data and the formation of the advantages of a tree model. The optimization of decision tree algorithms are mainly from the choice of the branch properties standards, decision tree pruning, and the introduction of fuzzy theory, rough set theory, genetic algorithm and neural network algorithms to optimize several aspects. This article introduces the properties of rough set theory, the importance of the principle to optimize the decision tree, first calculated for each condition attribute importance to classification, and then importance sample set size of a filter, without prejudice to the classification accuracy rate while reducing the size of tree. The algorithm in Visual C + + 6. 0 programming environment, and is applied to hot rolling model, data processing by hot rolling to verify the validity of the algorithm.%决策树分类方法是一种非常有效的机器学习方法,具有分类精度高、对噪声数据有很好的健壮性以及形成树状模式等优点,对决策树算法的优化也主要是从分支属性的选择标准,对决策树的修剪,以及引入模糊理论、粗糙集理论、遗传算法和神经网络算法等几个方面进行优化.引入粗糙集理论中的属性重要性原理来对决策树进行优化,首先计算出每个条件属性对分类的重要度,然后根据重要度大小来对样本集进行一个筛选,在不损害分类准确率的同时减小决策树的规模.整个算法在Visual C++6.0环境下编程实现,并应用于热轧工艺模型中,通过对热轧数据的处理,验证了算法的有效性.

  18. Network Traffic Classification Using SVM Decision Tree%基于SVM决策树的网络流量分类

    Institute of Scientific and Technical Information of China (English)

    邱婧; 夏靖波; 柏骏

    2012-01-01

    In order to solve the unrecognized area and long training time problems existed when using Support Vector Machine ( SVM) method in network traffic classification, SVM decision tree was used in network traffic classification by using its advantages in multi-class classification. The authoritative flow data sets were tested. The experiment results show that SVM decision tree method has shorter training time and better classification performance than ordinary "one-on-one" and "one-on-more"SVM method in network traffic classification, whose classification accuracy rate can reach 98. 8%.%提出一种用支持向量机(SVM)决策树来对网络流量进行分类的方法,利用SVM决策树在多类分类方面的优势,解决SVM在流量分类中存在的无法识别区域和训练时间较长的问题.对权威流量数据集进行了测试,实验结果表明,SVM决策树在流量分类中比普通的“一对一”和“一对多”SVM方法具有更短的训练时问和更好的分类性能,分类准确率可以达到98.8%.

  19. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    Science.gov (United States)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  20. The management of an endodontically abscessed tooth: patient health state utility, decision-tree and economic analysis

    Directory of Open Access Journals (Sweden)

    Shepperd Sasha

    2007-12-01

    Full Text Available Abstract Background A frequent encounter in clinical practice is the middle-aged adult patient complaining of a toothache caused by the spread of a carious infection into the tooth's endodontic complex. Decisions about the range of treatment options (conventional crown with a post and core technique (CC, a single tooth implant (STI, a conventional dental bridge (CDB, and a partial removable denture (RPD have to balance the prognosis, utility and cost. Little is know about the utility patients attach to the different treatment options for an endontically abscessed mandibular molar and maxillary incisor. We measured patients' dental-health-state utilities and ranking preferences of the treatment options for these dental problems. Methods Forty school teachers ranked their preferences for conventional crown with a post and core technique, a single tooth implant, a conventional dental bridge, and a partial removable denture using a standard gamble and willingness to pay. Data previously reported on treatment prognosis and direct "out-of-pocket" costs were used in a decision-tree and economic analysis Results The Standard Gamble utilities for the restoration of a mandibular 1st molar with either the conventional crown (CC, single-tooth-implant (STI, conventional dental bridge (CDB or removable-partial-denture (RPD were 74.47 [± 6.91], 78.60 [± 5.19], 76.22 [± 5.78], 64.80 [± 8.1] respectively (p The standard gamble utilities for the restoration of a maxillary central incisor with a CC, STI, CDB and RPD were 88.50 [± 6.12], 90.68 [± 3.41], 89.78 [± 3.81] and 91.10 [± 3.57] respectively (p > 0.05. Their respective willingness-to-pay ($CDN were: 1,782.05 [± 361.42], 1,871.79 [± 349.44], 1,605.13 [± 348.10] and 1,351.28 [± 368.62]. A statistical difference was found between the utility of treating a maxillary central incisor and mandibular 1st-molar (p The expected-utility-value for a 5-year prosthetic survival was highest for the CDB and the

  1. Base Oils Biodegradability Prediction with Data Mining Techniques

    OpenAIRE

    Malika Trabelsi; Saloua Saidane; Sihem Ben Abdelmelek

    2010-01-01

    In this paper, we apply various data mining techniques including continuous numeric and discrete classification prediction models of base oils biodegradability, with emphasis on improving prediction accuracy. The results show that highly biodegradable oils can be better predicted through numeric models. In contrast, classification models did not uncover a similar dichotomy. With the exception of Memory Based Reasoning and Decision Trees, tested classification techniques achieved high classifi...

  2. 采用决策树分类方法进行煤矸石信息提取研究%Research on using the decision tree classification method to extract coal gangue information

    Institute of Scientific and Technical Information of China (English)

    冯稳; 张志; 乌云其其格; 孟丹

    2011-01-01

    利用遥感技术快速、准确地调查煤矸石堆分布情况,对预防地质灾害以及保护生态环境和居民生命财产安全有着重要的指导意义.基于TM多光谱影像,运用知识决策树分类方法对江西萍乡煤矿区进行煤矸石信息提取试验.首先,在研究区背景知识的基础下,统计分析矿区内煤矸石及其他典型地物在影像上的光谱特征,建立了研究区的分类知识库;其次,在决策树分类模型支撑下,分别运用归一化差异植被指数、改进型归一化差异水体指数以及光谱阈值法对图像进行分类;最后,利用地学知识和几何特征进行分类后处理,分类精度达到82.97%.试验表明,该方法适用于煤矸石信息的自动提取,结合目视解译方法,可以提高解译的效率及准确度.%Using remote sensing technique to survey coal gangue' s distribution quickly and accurately has important guiding significance for the prevention of geological disasters and the protection of the ecological environment and residents' life and property securities. Based on TM multi-spectral image, it is adopted the decision tree classification method to extract Pingxiang coal mining area' coal gangue information in Jiangxi Province. Firstly, under the foundation of study area' s background knowledge, counted and analyzed the area' s coal gangue' s and other typical surface objects' spectral characteristics in RS image, then established the study area' s classification databases.Secondly, on the support of the decision tree classification model, used Normalized Difference Vegetation Index,Modified Normalized Difference Water Index and Spectrum Threshold Method to classify the image respectively. Ultimately, post-process the classified image by using geological knowledge and geometric feature. The total classification accuracy was up to 82. 97%. The experiment demonstrates that this method is suitable for coal gangue information's automatic extraction

  3. An automated decision-tree approach to predicting protein interaction hot spots.

    Science.gov (United States)

    Darnell, Steven J; Page, David; Mitchell, Julie C

    2007-09-01

    Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than computational alanine scanning (Robetta-Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta-Ala. The KFC analysis is applied to the calmodulin (CaM)/smooth muscle myosin light chain kinase (smMLCK) interface, and to the bone morphogenetic protein-2 (BMP-2)/BMP receptor-type I (BMPR-IA) interface. The results indicate a strong correlation between KFC hot spot predictions and mutations that significantly reduce the binding affinity of the interface. PMID:17554779

  4. Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules

    Directory of Open Access Journals (Sweden)

    G.MeeraGandhi

    2010-11-01

    Full Text Available In the era of information society, computer networks and their related applications are the emerging technologies. Network Intrusion Detection aims at distinguishing the behavior of the network. As the network attacks have increased in huge numbers over the past few years, Intrusion Detection System (IDS is increasingly becoming a critical component to secure the network. Owing to large volumes of security audit data in a network in addition to intricate and vibrant properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, the field of machine learning attempts to characterize how such changes can occur by designing, implementing, running, and analyzing algorithms that can be run on computers. The discipline draws on ideas, with the goal of understanding the computational character of learning. Learning always occurs in the context of some performance task, and that a learning method should always be coupled with a performance element that uses the knowledge acquired during learning. In this research, machine learning is being investigated as a technique for making the selection, using as training data and their outcome. In this paper, we evaluate the performance of a set of classifier algorithms of rules (JRIP, Decision Tabel, PART, and OneR and trees (J48, RandomForest, REPTree, NBTree. Based on the evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The empirical simulation result shows the comparison between the noticeable performance improvements. The classification models were trained using the data collected from Knowledge Discovery Databases (KDD for Intrusion Detection. The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The

  5. An Improved ID3 Decision Tree Mining Algorithm%一种改进 ID3型决策树挖掘算法

    Institute of Scientific and Technical Information of China (English)

    潘大胜; 屈迟文

    2016-01-01

    By analyzing the problem of ID3 decision tree mining algorithm,the entropy calculation process is improved, and a kind of improved ID3 decision tree mining algorithm is built.Entropy calculation process of decision tree is rede-signed in order to obtain global optimal mining results.The mining experiments are carried out on the UCI data category 6 data set.Experimental results show that the improved mining algorithm is much better than the ID3 type decision tree mining algorithm in the compact degree and the accuracy of the decision tree construction.%分析经典 ID3型决策树挖掘算法中存在的问题,对其熵值计算过程进行改进,构建一种改进的 ID3型决策树挖掘算法。重新设计决策树构建中的熵值计算过程,以获得具有全局最优的挖掘结果,并针对 UCI 数据集中的6类数据集展开挖掘实验。结果表明:改进后的挖掘算法在决策树构建的简洁程度和挖掘精度上,都明显优于 ID3型决策树挖掘算法。

  6. Malware propagation modeling by the means of genetic algorithms

    OpenAIRE

    Goranin, N.; Čenys, A.

    2008-01-01

    Existing malware propagation models mainly concentrate to forecasting the number of infected computers in the initial propagation phase. In this article we propose a genetic algorithm based model for estimating the propagation rates of known and perspective Internet worms after their propagation reaches the satiation phase. Estimation algorithm is based on the known worms’ propagation strategies with correlated propagation rates analysis and is presented as a decision tree, generated by GAtre...

  7. Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data

    OpenAIRE

    Schetinin, Vitaly; Fieldsend, Jonathan E.; Partridge, Derek; Krzanowski, Wojtek J.; Everson, Richard M.; Bailey, Trevor C; Hernandez, Adolfo

    2005-01-01

    Bayesian averaging over classification models allows the uncertainty of classification outcomes to be evaluated, which is of crucial importance for making reliable decisions in applications such as financial in which risks have to be estimated. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the diversity of a classifier ensemble and the required performance. The interpretability of classification models can also give useful in...

  8. COMPARING THE PERFORMANCE OF SEMANTIC IMAGE RETRIEVAL USING SPARQL QUERY, DECISION TREE ALGORITHM AND LIRE

    OpenAIRE

    Magesh; Thangaraj

    2013-01-01

    The ontology based framework is developed for representing image domain. The textual features of images are extracted and annotated as the part of the ontology. The ontology is represented in Web Ontology Language (OWL) format which is based on Resource Description Framework (RDF) and Resource Description Framework Schema (RDFS). Internally, the RDF statements represent an RDF graph which provides the way to represent the image data in a semantic manner. Various tools and languages are used t...

  9. A study of land use/land cover information extraction classification technology based on DTC

    Science.gov (United States)

    Wang, Ping; Zheng, Yong-guo; Yang, Feng-jie; Jia, Wei-jie; Xiong, Chang-zhen

    2008-10-01

    Decision Tree Classification (DTC) is one organizational form of the multi-level recognition system, which changes the complicated classification into simple categories, and then gradually resolves it. The paper does LULC Decision Tree Classification research on some areas of Gansu Province in the west of China. With the mid-resolution remote sensing data as the main data resource, the authors adopt decision-making classification technology method, taking advantage of its character that it imitates the processing pattern of human judgment and thinking and its fault-tolerant character, and also build the decision tree LULC classical pattern. The research shows that the methods and techniques can increase the level of automation and accuracy of LULC information extraction, and better carry out LULC information extraction on the research areas. The main aspects of the research are as follows: 1. We collected training samples firstly, established a comprehensive database which is supported by remote sensing and ground data; 2. By utilizing CART system, and based on multiply sources and time phases remote sensing data and other assistance data, the DTC's technology effectively combined the unsupervised classification results with the experts' knowledge together. The method and procedure for distilling the decision tree information were specifically developed. 3. In designing the decision tree, based on the various object of types classification rules, we established and pruned DTC'S model for the purpose of achieving effective treatment of subdivision classification, and completed the land use and land cover classification of the research areas. The accuracy of evaluation showed that the classification accuracy reached upwards 80%.

  10. Bayesian Decision Tree for the Classification of the Mode of Motion in Single-Molecule Trajectories

    CERN Document Server

    Türkcan, Silvan

    2015-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and buil...

  11. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    Science.gov (United States)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  12. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software “R”; the generation of the dense and accurate digital surface model by the “Match-T DSM” program of the Trimble Company. A practical...

  13. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction

    OpenAIRE

    Jörg Huwyler; Felix Hammann; Claudia Suenderhauf

    2012-01-01

    Predicting blood-brain barrier (BBB) permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was co...

  14. Effects of Sampling Methods on Prediction Quality. The Case of Classifying Land Cover Using Decision Trees

    OpenAIRE

    Hochreiter, Ronald; Waldhauser, Christoph

    2014-01-01

    Clever sampling methods can be used to improve the handling of big data and increase its usefulness. The subject of this study is remote sensing, specifically airborne laser scanning point clouds representing different classes of ground cover. The aim is to derive a supervised learning model for the classification using CARTs. In order to measure the effect of different sampling methods on the classification accuracy, various experiments with varying types of sampling methods, sample sizes, a...

  15. Sistem Pakar Untuk Diagnosa Penyakit Kehamilan Menggunakan Metode Dempster-Shafer Dan Decision Tree

    OpenAIRE

    joko popo minardi

    2016-01-01

    Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information. Dempster-Shafer theory an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. In the diagnosis of diseases of pregnancy information obtained from the patient sometimes incomplete, with Dempster-Shafer method and expert system rules can be a combination of symptoms that are not com...

  16. Tumor Regression Grades: Can They Influence Rectal Cancer Therapy Decision Tree?

    OpenAIRE

    Marisa D. Santos; Cristina Silva; Anabela Rocha; Eduarda Matos; Carlos Nogueira; Carlos Lopes

    2013-01-01

    Background. Evaluating impact of tumor regression grade in prognosis of patients with locally advanced rectal cancer (LARC). Materials and Methods. We identified from our colorectal cancer database 168 patients with LARC who received neoadjuvant therapy followed by complete mesorectum excision surgery between 2003 and 2011: 157 received 5-FU-based chemoradiation (CRT) and 11 short course RT. We excluded 29 patients, the remaining 139 were reassessed for disease recurrence and survival; the sl...

  17. Credit Card Fraud Detection using Decision Tree for Tracing Email and IP

    Directory of Open Access Journals (Sweden)

    Gayathiri.P

    2012-09-01

    Full Text Available Credit card fraud is a wide-ranging term for theft and fraud committed using a credit card or any similar payment mechanism as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. Transactions completed with credit cards seem to become more and more popular with the introduction of online shopping and banking. Correspondingly, the number of credit card frauds has also increased .Currently; data mining is a popular way to combat frauds because of its effectiveness. Data mining is a well-defined procedure that takes data as input and produces output in the forms of models or patterns. In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses. Frauds has also increased .Currently, data mining is a popular way to combat frauds because of its effectiveness. Data mining is a well-defined procedure that takes data as input and produces output in the forms of models or patterns. In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses.

  18. An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels

    OpenAIRE

    Gang Zhang; Yonghui Huang; Ling Zhong; Shanxing Ou; Yi Zhang; Ziping Li

    2015-01-01

    Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT) and support vector machine (SVM) are trained by bootstrapping the training dataset. The base learners are...

  19. Decision tree learning for detecting turning points in business process orientation: a case of Croatian companies

    Directory of Open Access Journals (Sweden)

    Ljubica Milanović Glavan

    2015-03-01

    Full Text Available Companies worldwide are embracing Business Process Orientation (BPO in order to improve their overall performance. This paper presents research results on key turning points in BPO maturity implementation efforts. A key turning point is defined as a component of business process maturity that leads to the establishment and expansion of other factors that move the organization to the next maturity level. Over the past few years, different methodologies for analyzing maturity state of BPO have been developed. The purpose of this paper is to investigate the possibility of using data mining methods in detecting key turning points in BPO. Based on survey results obtained in 2013, the selected data mining technique of classification and regression trees (C&RT was used to detect key turning points in Croatian companies. These findings present invaluable guidelines for any business that strives to achieve more efficient business processes.

  20. Scheduling Model for Symmetric Multiprocessing Architecture Based on Process Behavior

    Directory of Open Access Journals (Sweden)

    Ali Mousa Alrahahleh

    2012-07-01

    Full Text Available This paper presents a new method for scheduling of symmetric multiprocessing (SMP architecture based on process behavior. The method takes advantage of process behavior, which includes system calls to create groups of similar processes using machine-learning techniques like clustering or classification, and then makes process distribution decisions based on classification or clustering groups. The new method is divided into three stages: the first phase is collecting data about process and defining subset of data is to be used in further processing. The second phase is using data collected in classification or clustering to create classification/clustering models by applying common techniques similar to those used in machine learning, such as a decision tree for classification or EM for clustering. System training classification should be done in this phase, and after that, classification or clustering models should be applied on a running system to find out in which group each process belongs. The third phase is using process groups as a parameter of scheduling on SMP (sympatric Multi Processor systems when doing distribution over multi-processor cores. Another advantage can be achieved by letting the end user train the system to classify a specific type of process and assign it to a specific process core, targeting real-time response or performance gain. The new method increases process performance and decreases response time based on different kinds of distribution.

  1. Decision tree analysis to assess the cost-effectiveness of yttrium microspheres for treatment of hepatic metastases from colorectal cancer

    International Nuclear Information System (INIS)

    Full text: The aim is to determine the cost-effectiveness of yttrium microsphere treatment of hepatic metastases from colorectal cancer, with and without FDG-PET for detection of extra-hepatic disease. A decision tree was created comparing two strategies for yttrium treatment with chemotherapy, one incorporating PET in addition to CT in the pre-treatment work-up, to a strategy of chemotherapy alone. The sensitivity and specificity of PET and CT were obtained from the Federal Government PET review. Imaging costs were obtained from the Medicare benefits schedule with an additional capital component added for PET (final cost $1200). The cost of yttrium treatment was determined by patient-tracking. Previously published reports indicated a mean gain in life-expectancy from treatment of 0.52 years. Patients with extra-hepatic metastases were assumed to receive no survival benefit. Cost effectiveness was expressed as incremental cost per life-year gained (ICER). Sensitivity analysis determined the effect of prior probability of extra-hepatic disease on cost-savings and cost-effectiveness. The cost of yttrium treatment including angiography, particle perfusion studies and bed-stays, was $10530. A baseline value for prior probability of extra-hepatic disease of 0.35 gave ICERs of $26,378 and $25,271 for the no-PET and PET strategies respectively. The PET strategy was less expensive if the prior probability of extra-hepatic metastases was greater than 0.16 and more cost-effective if above 0.28. Yttrium microsphere treatment is less cost-effective than other interventions for colon cancer but comparable to other accepted health interventions. Incorporating PET into the pre-treatment assessment is likely to save costs and improve cost-effectiveness. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  2. Recent advances using rodent models for predicting human allergenicity

    International Nuclear Information System (INIS)

    The potential allergenicity of newly introduced proteins in genetically engineered foods has become an important safety evaluation issue. However, to evaluate the potential allergenicity and the potency of new proteins in our food, there are still no widely accepted and reliable test systems. The best-known allergy assessment proposal for foods derived from genetically engineered plants was the careful stepwise process presented in the so-called ILSI/IFBC decision tree. A revision of this decision tree strategy was proposed by a FAO/WHO expert consultation. As prediction of the sensitizing potential of the novel introduced protein based on animal testing was considered to be very important, animal models were introduced as one of the new test items, despite the fact that non of the currently studied models has been widely accepted and validated yet. In this paper, recent results are summarized of promising models developed in rat and mouse

  3. Gene function classification using Bayesian models with hierarchy-based priors

    Directory of Open Access Journals (Sweden)

    Neal Radford M

    2006-10-01

    Full Text Available Abstract Background We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs from the E. coli genome. Results The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.

  4. Application of breast MRI for prediction of lymph node metastases - systematic approach using 17 individual descriptors and a dedicated decision tree

    International Nuclear Information System (INIS)

    Background: The presence of lymph node metastases (LNMs) is one of the most important prognostic factors in breast cancer. Purpose: To correlate a detailed catalog of 17 descriptors in breast MRI (bMRI) with the presence of LNMs and to identify useful combinations of such descriptors for the prediction of LNMs using a dedicated decision tree. Material and Methods: A standardized protocol and study design was applied in this IRB-approved study (T1-weighted FLASH; 0.1 mmol/kg body weight Gd-DTPA; T2-weighted TSE; histological verification after bMRI). Two experienced radiologists performed prospective evaluation of the previously acquired examination in consensus. In every lesion 17 previously published descriptors were assessed. Subgroups of primary breast cancers with (N+: 97) and without LNM were created (N-: 253). The prevalence and diagnostic accuracy of each descriptor were correlated with the presence of LNM (chi-square test; diagnostic odds ratio/DOR). To identify useful combinations of descriptors for the prediction of LNM a chi-squared automatic interaction detection (CHAID) decision tree was applied. Results: Seven of 17 descriptors were significantly associated with LNMs. The most accurate were 'Skin thickening' (P < 0.001; DOR 5.9) and 'Internal enhancement' (P < 0.001; DOR =13.7). The CHAID decision tree identified useful combinations of descriptors: 'Skin thickening' plus 'Destruction of nipple line' raised the probability of N+ by 40% (P< 0.05). In case of absence of 'Skin thickening', 'Edema', and 'Irregular margins', the likelihood of N+ was 0% (P<0.05). Conclusion: Our data demonstrate the close association of selected breast MRI descriptors with nodal status. If present, such descriptors can be used - as stand alone or in combination - to accurately predict LNM and to stratify the patient's prognosis

  5. Decision Tree, Bagging and Random Forest methods detect TEC seismo-ionospheric anomalies around the time of the Chile, (Mw = 8.8) earthquake of 27 February 2010

    Science.gov (United States)

    Akhoondzadeh, Mehdi

    2016-06-01

    In this paper for the first time ensemble methods including Decision Tree, Bagging and Random Forest have been proposed in the field of earthquake precursors to detect GPS-TEC (Total Electron Content) seismo-ionospheric anomalies around the time and location of Chile earthquake of 27 February 2010. All of the implemented ensemble methods detected a striking anomaly in time series of TEC data, 1 day after the earthquake at 14:00 UTC. The results indicate that the proposed methods due to their performance, speed and simplicity are quite promising and deserve serious attention as a new predictor tools for seismo-ionospheric anomalies detection.

  6. Model Comprehensive Risk Assessment of the Insurance Company: Tradition and Innovation

    OpenAIRE

    Yulia Slepukhina

    2015-01-01

    The article analyzes the traditional methods of evaluating financial risk arising from the insurance business, such as method correction norm of discount, method reliable equivalents, sensitivity analysis of efficiency criteria, analysis of the probability distributions, decision trees, method based on the fuzzy sets theory, and other, identified their advantages and disadvantages. In the study author proposes developed by him a model of the complex (integrated) risk assessment arising in ins...

  7. Corporate Governance and Disclosure Quality: Taxonomy of Tunisian Listed Firms Using the Decision Tree Method based Approach

    OpenAIRE

    Wided Khiari

    2013-01-01

    This study aims to establish a typology of Tunisian listed firms according to their corporate governance characteristics and disclosure quality. The paper uses disclosed scores to examine corporate governance practices of Tunisian listed firms. A content analysis of 46 Tunisian listed firms from 2001 to 2010 has been carried out and a disclosure index developed to determine the level of disclosure of the companies. The disclosure quality is appreciated through the quantity and also through th...

  8. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain;

    2014-01-01

    system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost...

  9. Establishing diagnostic platform for environmental biosafety assessment of genetically modified plants based on the decision-tree method

    OpenAIRE

    Lei Wang; Chao Yang; Bao-Rong Lu

    2010-01-01

    Transgenic biotechnology and its products provide important solutions for the great challenge of global food security. Biosafety assessment of genetically modified organisms (GMOs) including their food and environmental safety is a prerequisite for the commercialization and safe application of transgenic biotechnologyproducts. However, existing methodologies cannot meet the urgent requirements for rapid biosafety assessment of the increasing number of new and sophisticated GMOs. Therefore, a ...

  10. A decision treebased method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds

    OpenAIRE

    Loukis Euripides N; Stasis Antonis CH; Pavlopoulos Sotiris A

    2004-01-01

    Abstract Background New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junio...

  11. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    Science.gov (United States)

    Complex diseases are often difficult to diagnose, treat, and study due to the multi-factorial nature of the etiology. Significant challenges exist with regard to how to segregate indivdiuals into suitable subtypes of the disease. Here, we examine a range of methods for evaluati...

  12. 决策树方法在网球训练中的应用%Application of the Decision Tree in Tennis Trainings

    Institute of Scientific and Technical Information of China (English)

    冯能山; 龙超; 熊金志; 廖国君

    2014-01-01

    数据挖掘在体育领域的应用还比较少。如何利用好体育运动的训练数据,从中挖掘出有用信息,是数据挖掘技术在体育领域中的一项重要任务。决策树方法是一种常用的数据挖掘技术,该文把决策树方法应用于网球训练,对有关数据进行挖掘,形成一棵网球训练的决策树,从而协助体育工作人员更合理地制定网球训练方案,提高网球训练的效率。%Nowadays it is still relatively rare to see the applications of data mining in the field of sports. However, applying data mining in sports can facilitate a more efficient way to use the data of sports training by digging out the relevant information. In this paper, the decision tree approach is applied in the tennis training to form a decision tree through digging out the relevant data. As a result, the application helps the staffs of sports to make a more rational tennis training program whereas the efficiency of ten-nis training can be improved.

  13. Application of analyzing influencing factors of life pressure in college students by decision tree%决策树分析在高校大学生生活压力影响因素分析中的应用

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

    Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions and poor interpersonal were largest (68.84%). Conclusions Combine with the characteristic of these different sub-health group when we develop mental health education and guiding. Specially, pay more attention to those poor interpersonal relationships, poor economic conditions and the only child college students.%  [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关

  14. 决策树分析在高校大学生生活压力影响因素分析中的应用%Application of analyzing influencing factors of life pressure in college students by decision tree

    Institute of Scientific and Technical Information of China (English)

    陈新林; 包生耿; 颜伟红; 王小广; 万建成; 吴丹桂

    2013-01-01

      [目的]了解广州市大学生生活压力的分布情况以及影响因素,为开展大学生心理健康教育提供科学依据.[方法]使用青少年生活事件量表和人口学基本资料调查广州地区五所高校大学生,用 SPSS软件建立 logistic 模型(前进法筛选变量)探索压力总分的影响因素,使用 Clementine 软件的 C5.0算法和 Answer Tree 软件的 CHAID 算法建立压力总分的决策树.[结果]大学生生活压力的影响因素包括经济情况、人际关系、家庭子女数、兼职情况;C5.0决策树分支包括人际关系;经济情况和家庭子女数、CHAID决策树分支包括经济情况;人际关系;家庭子女数;兼职情况.经济情况差、人际关系差的大学生生活压力所占的比例最大(68.84%).[结论]开展大学生心理健康教育和指导时,要结合不同亚群人群的特点,有针对性开展;要特别关注人际关系差、经济情况差或独生子女的大学生.%Abstrct: Objective To understand the distribution and influencing factors of life pressure in Guangzhou colleges students for providing a scientific basis to developing health education. Methods Investigated 5 colleges students with “Youth Life Event Scale” and demographic basic data. Explored influencing factors by SPSS 13.0 to set up logistic model. Set up decision tree of pressure total score by C5.0 algorithms of Clementine software and CHAID algorithm of answer tree software. Results Influencing factors of life pressure colleges students were included economic conditions, interpersonal relationship, the number of family children, part-time job. The decision tree branch of C5.0 included interpersonal relationship, economic conditions and the number of family children. The decision tree branch of CHAID included the economic situation, interpersonal relationship, the number of family children and part-time job. The proportion of life pressure in both poor economic conditions

  15. Millon´s Personality Model and ischemic cardiovascular acute episodes: Profiles of risk in a decision tree

    Directory of Open Access Journals (Sweden)

    María M. Richard's

    2008-01-01

    Full Text Available La identificación de subgrupos de riesgo permite a los psicólogos clínicos desarrollar intervenciones específicas para esos subgrupos. El principal propósito de este trabajo fue encontrar asociaciones estadísticas entre características de personalidad -rasgos y trastornos- y la existencia de episodios isquémicos cardiovasculares agudos según el modelo de personalidad de Theodore Millon. Los análisis del presente estudio se fundamentaron en una muestra de 313 mujeres y hombres entre 31 y 80 años de edad, divididos en dos grupos: un grupo clínico formado por 143 participantes internados a causa de episodios isquémicos cardiovasculares agudos y un grupo control constituido por 170 personas sin antecedentes de enfermedades cardiovasculares. Los resultados mostraron cuatro perfiles de riesgo de personalidad asociados con la existencia de episodios isquémicos agudos y, por tanto, esto posibilita a los psicólogos clínicos el diseño de intervenciones específicas para aquellos subgrupos.

  16. Online Rule Generation Software Process Model

    Directory of Open Access Journals (Sweden)

    Sudeep Marwaha

    2013-07-01

    Full Text Available For production systems like expert systems, a rule generation software can facilitate the faster deployment. The software process model for rule generation using decision tree classifier refers to the various steps required to be executed for the development of a web based software model for decision rule generation. The Royce’s final waterfall model has been used in this paper to explain the software development process. The paper presents the specific output of various steps of modified waterfall model for decision rules generation.

  17. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with rlearning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  18. A feature-based approach to modeling protein-protein interaction hot spots.

    Science.gov (United States)

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-05-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions. PMID:19273533

  19. A Dynamic Web Page Prediction Model Based on Access Patterns to Offer Better User Latency

    CERN Document Server

    Mukhopadhyay, Debajyoti; Saha, Dwaipayan; Kim, Young-Chon

    2011-01-01

    The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer limited performance improvement because it is difficult for caching to handle the large number of increasingly diverse files. Studies have been conducted on prefetching models based on decision trees, Markov chains, and path analysis. However, the increased uses of dynamic pages, frequent changes in site structure and user access patterns have limited the efficacy of these static techniques. In this paper, we have proposed a methodology to cluster related pages into different categories based on the access patterns. Additionally we use page ranking to build up our prediction model at the initial stages when users haven't already started sending requests. This way we have tried to overcome the problems of maintaining huge databases which is needed in case of log based techn...

  20. Network Traffic Anomalies Identification Based on Classification Methods

    Directory of Open Access Journals (Sweden)

    Donatas Račys

    2015-07-01

    Full Text Available A problem of network traffic anomalies detection in the computer networks is analyzed. Overview of anomalies detection methods is given then advantages and disadvantages of the different methods are analyzed. Model for the traffic anomalies detection was developed based on IBM SPSS Modeler and is used to analyze SNMP data of the router. Investigation of the traffic anomalies was done using three classification methods and different sets of the learning data. Based on the results of investigation it was determined that C5.1 decision tree method has the largest accuracy and performance and can be successfully used for identification of the network traffic anomalies.

  1. Decision-tree sensitivity analysis for cost-effectiveness of whole-body FDG PET in the management of patients with non-small-cell lung carcinoma in Japan

    International Nuclear Information System (INIS)

    Whole-body 2-fluoro-2-D-[18F]deoxyglucose[FDG] positron emission tomography (WB-PET) may be more cost-effective than chest PET because WB-PET does not require conventional imaging (CI) for extrathoracic staging. The cost-effectiveness of WB-PET for the management of Japanese patients with non-small-cell lung carcinoma (NSCLC) was assessed. Decision-tree sensitivity analysis was designed, based on the two competing strategies of WB-PET vs. CI. WB-PET was assumed to have a sensitivity and specificity for detecting metastases, of 90% to 100% and CI of 80% to 90%. The prevalences of M1 disease were 34% and 20%. On thousand patients suspected of having NSCLC were simulated in each strategy. We surveyed the relevant literature for the choice of variables. Expected cost saving (CS) and expected life expectancy (LE) for NSCLC patients were calculated. The WB-PET strategy yielded an expected CS of $951 US to $1,493 US per patient and an expected LE of minus 0.0246 years to minus 0.0136 years per patient for the 71.4% NSCLC and 34% M1 disease prevalence at our hospital. PET avoided unnecessary bronchoscopies and thoracotomies for incurable and benign disease. Overall, the CS for each patient was $833 US to $2,010 US at NSCLC prevalences ranging from 10% to 90%. The LE of the WB-PET strategy was similar to that of the CI strategy. The CS and LE minimally varied in the two situations of 34% and 20% M1 disease prevalence. The introduction of a WB-PET strategy in place of CI for managing NSCLC patients is potentially cost-effective in Japan. (author)

  2. Fault detection and diagnosis for gas turbines based on a kernelized information entropy model.

    Science.gov (United States)

    Wang, Weiying; Xu, Zhiqiang; Tang, Rui; Li, Shuying; Wu, Wei

    2014-01-01

    Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms. PMID:25258726

  3. Fault Detection and Diagnosis for Gas Turbines Based on a Kernelized Information Entropy Model

    Directory of Open Access Journals (Sweden)

    Weiying Wang

    2014-01-01

    Full Text Available Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. The experiments on some real-world data show the effectiveness of the proposed algorithms.

  4. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    OpenAIRE

    Somaya Hashem; Gamal Esmat; Wafaa Elakel; Shahira Habashy; Safaa Abdel Raouf; Samar Darweesh; Mohamad Soliman; Mohamed Elhefnawi; Mohamed El-Adawy; Mahmoud ElHefnawi

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate...

  5. A Study of Tropical Cyclone Combination Forecast Model Based on the Cost-sensitive Analysis

    Directory of Open Access Journals (Sweden)

    Zhenhua Zhang

    2013-01-01

    Full Text Available In the research of tropical cyclone forecast model, most of the conventional forecast methods adopted single model to do research. And the traditional classification methods are based on the assumption that the misclassification costs are equal which means the risk of missing report rate is equal to that of false alarm rate. Taking this into account and considering the lack of combination model with high prediction accuracy, we propose a novel combinational classification method of cost-sensitive analysis. First, the concept of the cost coefficient is presented. And then we apply SVM, GRNN, PNN and 3 decision tree algorithms to build classification models and compare the forecast accuracy as well as the cost coefficient of these models. Lastly, based on cost-sensitive analysis, three models with higher forecast accuracy and lower cost coefficient named GRNN, PNN and C5.0, are selected to build a new combination forecast model for a complex system. 2117 tropical cyclones’ meteorological indexing from 1949 to 2012 are applied to create a complex forecasting system and all the tropical cyclones are classified into two groups: Whether they will land on China or not. The final result is satisfactory that the overall accuracy is 81.81%. More importantly, the accuracy in identifying the tropical cyclones which have landed on China is 94.76%. The combination model has reduced the possibility of omitting landed tropical cyclones significantly and performed better than any single model. Therefore, the combination model is an important reference for emergency management of disaster.

  6. Model-based geostatistics

    CERN Document Server

    Diggle, Peter J

    2007-01-01

    Model-based geostatistics refers to the application of general statistical principles of modeling and inference to geostatistical problems. This volume provides a treatment of model-based geostatistics and emphasizes on statistical methods and applications. It also features analyses of datasets from a range of scientific contexts.

  7. A Study on the Application of the Decision Tree Algorithm in Psychological Information of Vocational College Students

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available This paper discusses the basic operating principle and the development status of data mining technology, analyzes the insufficiency of the existing psychological management system, and proposes the development trend of psychological health education in colleges. According to an analysis on factors affecting college students’ mental health and the deviation between the reality and the current number of students with psychological abnormality, this paper studies the application of data mining technology and puts forward a system based on data mining that combines the classified data mining technology with the existing psychological management system.

  8. A best-first soft/hard decision tree searching MIMO decoder for a 4 × 4 64-QAM system

    KAUST Repository

    Shen, Chungan

    2012-08-01

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. © 2011 IEEE.

  9. Análise dos atributos do solo e da produtividade da cultura de cana-de-açúcar com o uso da geoestatística e árvore de decisão Analyze the soil attributes and sugarcane yield culture with the use of geostatistics and decision trees

    Directory of Open Access Journals (Sweden)

    Zigomar Menezes de Souza

    2010-04-01

    , applying the cell criterion, by using a yield monitor that allowed the elaboration of a digital map representing the surface of production of the studied area. To determine the soil attributes, soil samples were collected at the beginning of the harvest in 2006/2007 using a regular grid of 50 x 50m, in the depths of 0.0-0.2m and 0.2-0.4m. Soil attributes and sugarcane yield data were analyzed by using geostatistics techniques and were classified into three yield levels for the elaboration of the decision tree. The decision tree was induced in the software SAS Enterprise Miner, using an algorithm based on entropy reduction. Altitude and potassium presented the highest values of correlation with sugarcane yield. The induction of decision trees showed that the altitude is the variable with the greatest potential to interpret the sugarcane yield maps, then assisting in precision agriculture and, revealing an adjusted tool for the study of management definition zones in area cropped with sugarcane.

  10. Diagnosis of three types of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther

    2016-03-24

    We study the depth of decision trees for diagnosis of three types of constant faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis and each type of faults, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in networks. For bases containing networks with at most 10 edges, we find sharp coefficients for linear bounds.

  11. Diagnosis of constant faults in read-once contact networks over finite bases

    KAUST Repository

    Busbait, Monther I.

    2015-03-01

    We study the depth of decision trees for diagnosis of constant 0 and 1 faults in read-once contact networks over finite bases containing only indecomposable networks. For each basis, we obtain a linear upper bound on the minimum depth of decision trees depending on the number of edges in the networks. For bases containing networks with at most 10 edges we find coefficients for linear bounds which are close to sharp. © 2014 Elsevier B.V. All rights reserved.

  12. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Directory of Open Access Journals (Sweden)

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  13. CLINICAL DATABASE ANALYSIS USING DMDT BASED PREDICTIVE MODELLING

    Directory of Open Access Journals (Sweden)

    Srilakshmi Indrasenan

    2013-04-01

    Full Text Available In recent years, predictive data mining techniques play a vital role in the field of medical informatics. These techniques help the medical practitioners in predicting various classes which is useful in prediction treatment. One of such major difficulty is prediction of survival rate in breast cancer patients. Breast cancer is a common disease these days and fighting against it is a tough battle for both the surgeons and the patients. To predict the survivability rate in breast cancer patients which helps the medical practitioner to select the type of treatment a predictive data mining technique called Diversified Multiple Decision Tree (DMDT classification is used. Additionally, to avoid difficulties from the outlier and skewed data, it is also proposed to perform the improvement of training space by outlier filtering and over sampling. As a result, this novel approach gives the survivability rate of the cancer patients based on which the medical practitioners can choose the type of treatment.

  14. Improvement of Tone Intelligibility for Average-Voice-Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2012-01-01

    Full Text Available Problem statement: Tone intelligibility in speech synthesis is an important attribute that should be taken into account. The tone correctness of the synthetic speech is degraded considerably in the average-voice-based HMM-based Thai speech synthesis. The tying mechanism in the decision tree based context clustering without appropriate criterion causes unexpected tone neutralization. Incorporation of the phrase intonation to the context clustering process in the training stage was proposed early. However, the tone correctness is not satisfied. Approach: This study proposes a number of tonal features including tone-geometrical features and phrase intonation features to be exploited in the context clustering process of HMM training stage. Results: In the experiments, subjective evaluations of both average voice and adapted voice in terms of the intelligibility of tone are conducted. Effects on decision trees of the extracted features are also evaluated. By considering gender in training speech, two core experiments were conducted. The first experiment shows that the proposed tonal features can improve the tone intelligibility for female speech model above that of male speech model, while the second experiment shows that the proposed tonal features improve the tone intelligibility for gender dependent model than for gender independent model. Conclusion: All of the experimental results confirm that the tone correctness of the synthesized speech from the average-voice-based HMM-based Thai speech synthesis is significantly improved when using most of the extracted features.

  15. Assessment of Shallow Landslide Initiation Areas Using stochastic Modelling: The Vernazza Torrent Case Study, Liguria, Italy. GI_Forum|GI_Forum 2015 – Geospatial Minds for Society|

    OpenAIRE

    Schmaltz, Elmar; Rosner, Hans-Joachim; Märker, Michael

    2015-01-01

    The objective of this study is the assessment of potential failure zones of landslides in unstable areas. For this purpose, two different stochastic classification models were used: A boosted decision tree approach with TreeNet (TN), and a bagging decision tree approach with Random Forests (RF). Both topographic and soil parameters were considered as predictor variables for training and testing the models. We assume that several predictor variables will lead to misclassification and incorrect...

  16. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS

    Science.gov (United States)

    Tien Bui, Dieu; Pradhan, Biswajeet; Nampak, Haleh; Bui, Quang-Thanh; Tran, Quynh-An; Nguyen, Quoc-Phi

    2016-09-01

    This paper proposes a new artificial intelligence approach based on neural fuzzy inference system and metaheuristic optimization for flood susceptibility modeling, namely MONF. In the new approach, the neural fuzzy inference system was used to create an initial flood susceptibility model and then the model was optimized using two metaheuristic algorithms, Evolutionary Genetic and Particle Swarm Optimization. A high-frequency tropical cyclone area of the Tuong Duong district in Central Vietnam was used as a case study. First, a GIS database for the study area was constructed. The database that includes 76 historical flood inundated areas and ten flood influencing factors was used to develop and validate the proposed model. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Receiver Operating Characteristic (ROC) curve, and area under the ROC curve (AUC) were used to assess the model performance and its prediction capability. Experimental results showed that the proposed model has high performance on both the training (RMSE = 0.306, MAE = 0.094, AUC = 0.962) and validation dataset (RMSE = 0.362, MAE = 0.130, AUC = 0.911). The usability of the proposed model was evaluated by comparing with those obtained from state-of-the art benchmark soft computing techniques such as J48 Decision Tree, Random Forest, Multi-layer Perceptron Neural Network, Support Vector Machine, and Adaptive Neuro Fuzzy Inference System. The results show that the proposed MONF model outperforms the above benchmark models; we conclude that the MONF model is a new alternative tool that should be used in flood susceptibility mapping. The result in this study is useful for planners and decision makers for sustainable management of flood-prone areas.

  17. 基于“优选肿瘤标志群”建立的决策树模型对肺癌辅助诊断的价值%Application of decision tree combined with filtered biomarkers in the di-agnosis of lung cancer

    Institute of Scientific and Technical Information of China (English)

    何其栋; 魏小玲; 张红巧; 王威; 吴拥军

    2014-01-01

    目的:应用决策树技术联合肿瘤标志蛋白芯片建立基于“优选肿瘤标志群”的决策树模型,实现对肺癌的快速诊断。方法:运用肿瘤标志定量检测试剂盒测定201例肺部良性疾病及199例肺癌患者血清中9项肿瘤标志[癌胚抗原、糖原类抗原19-9( CA199)、神经元特异性烯醇化酶、CA242、铁蛋白、CA125、甲胎蛋白、人生长激素和CA153]水平,应用logistic回归对肿瘤标志进行筛选以获得“优选肿瘤标志群”,分别于筛选前后建立决策树模型和Fisher判别分析模型。结果:肺癌组9项血清肿瘤标志水平均高于肺良性疾病组(P<0.05)。筛选前基于9项肿瘤标志分别建立的Fisher判别分析模型、决策树模型和筛选后基于6项肿瘤标志建立的Fisher判别分析模型、决策树模型,其预测准确度分别为86.0%、92.5%、84.5%、91.5%。筛选前和筛选后决策树模型ROC曲线的AUC分别为0.925和0.915,均高于Fisher判别分析的0.860和0.845(Z=4.462和4.575,P均<0.01);但决策树模型和Fisher判别分析筛选前后自身相比,差异均无统计学意义(Z=1.914和1.074,P均>0.05)。结论:基于6项肿瘤标志建立的决策树模型诊断肺癌的效果优于Fisher判别分析。%Aim:To establish decision tree model based on filtered biomarkers to achieve rapid diagnosis of lung canc -er.Methods:The serum levels of 9 tumor markers (CEA,CA199,NSE,CA242,Ferritin,CA125,AFP,HGH and CA153) in 199 patients with lung cancer and 201 patients with benign pulmonary lesion were measured by multiple tumor marker protein biochip, and the models of C5.0 and Fisher discrimination analysis were developed based on the tumor markers be-fore and after being filtered by logistic regression .Results:The serum levels of the 9 tumor markers in patients with lung cancer were significantly higher than those in patients with benign

  18. Model-based segmentation

    OpenAIRE

    Heimann, Tobias; Delingette, Hervé

    2011-01-01

    This chapter starts with a brief introduction into model-based segmentation, explaining the basic concepts and different approaches. Subsequently, two segmentation approaches are presented in more detail: First, the method of deformable simplex meshes is described, explaining the special properties of the simplex mesh and the formulation of the internal forces. Common choices for image forces are presented, and how to evolve the mesh to adapt to certain structures. Second, the method of point...

  19. Classificação espectral de área plantada com a cultura da cana-de-açúcar por meio da árvore de decisão Spectral classification of planted area with sugarcane through the decision tree

    Directory of Open Access Journals (Sweden)

    Rafael C. Delgado

    2012-04-01

    Full Text Available O objetivo deste trabalho foi testar o classificador "árvore de decisão", em dados provenientes de sensores orbitais, para identificar área plantada com cana-de-açúcar, em diferentes épocas de plantio na Fazenda Boa Fé, localizada no Triângulo Mineiro, mais especificamente no município de Conquista, Minas Gerais. Acoplaram-se técnicas de Sensoriamento Remoto (SR em um módulo de Sistema de Informação Geográfica (SIG, permitindo uma análise temporal do uso e ocupação do solo, especialmente com vistas a identificar e a monitorar as áreas agrícolas. Com base no cálculo do viés médio (VM, o presente estudo mostrou que, em áreas de cana-de-açúcar, onde a irrigação é frequente e ocorrem chuvas significativas que antecedem a passagem do satélite Landsat-5, os valores foram ligeiramente subestimados, com valor deste indicador de -0,13 ha. Foi verificado, também, que os valores de NDVI mais altos proporcionaram uma leve superestimativa dos resultados, com valores de viés médio variando de 0,04 a 0,23 ha. Conforme os resultados, o classificador árvore de decisão apresentou um grande potencial para o mapeamento das áreas cultivadas com cana-de-açúcar.This study was carried out to test the "decision tree" classifier via remote sensing (RS, to identify planted areas with sugarcane, at different planting dates in Boa Fé, located in the Triângulo Mineiro, more specifically in the town of Conquista, Minas Gerais, Brazil. RS techniques, integrated into a Geographic Information System (GIS, allow a temporal analysis of land use and occupation, especially in order to identify and monitor agricultural areas. Based on the calculation of mean bias (VM, this study showed that in areas of sugarcane, where irrigation is frequent and significant rainfall occurring prior to the passage of Landsat-5, the estimated values were slightly underestimated, with the value of this indicator equal to -0.13 ha. It was also verified that the

  20. Model Based Definition

    Science.gov (United States)

    Rowe, Sidney E.

    2010-01-01

    In September 2007, the Engineering Directorate at the Marshall Space Flight Center (MSFC) created the Design System Focus Team (DSFT). MSFC was responsible for the in-house design and development of the Ares 1 Upper Stage and the Engineering Directorate was preparing to deploy a new electronic Configuration Management and Data Management System with the Design Data Management System (DDMS) based upon a Commercial Off The Shelf (COTS) Product Data Management (PDM) System. The DSFT was to establish standardized CAD practices and a new data life cycle for design data. Of special interest here, the design teams were to implement Model Based Definition (MBD) in support of the Upper Stage manufacturing contract. It is noted that this MBD does use partially dimensioned drawings for auxiliary information to the model. The design data lifecycle implemented several new release states to be used prior to formal release that allowed the models to move through a flow of progressive maturity. The DSFT identified some 17 Lessons Learned as outcomes of the standards development, pathfinder deployments and initial application to the Upper Stage design completion. Some of the high value examples are reviewed.

  1. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  2. 一种改进的SVM决策树文本分类算法%Text Classifier Based on an Improved SVM Decision Tree

    Institute of Scientific and Technical Information of China (English)

    赵天昀

    2010-01-01

    将SVM和二叉决策树结合起来构成SVM决策树的方法能够较好地解决多类文本分类问题,在此基础上引入了一种基于支持向量数据描述(SVDD)的类间可分性度量方法,对SVM决策树分类器进行改进,实验表明,该方法有效地提高了SVM决策树多类分类器的分类精度和速度.

  3. Decision Tree based Detection of Botnet Flow%基于决策树的僵尸流量检测方法研究

    Institute of Scientific and Technical Information of China (English)

    谢开斌; 蔡皖东; 蔡俊朝

    2008-01-01

    僵尸网络目前是互联网面临的安全威胁之一,检测网络中潜在的僵尸网络流量对提高互联网安全性具有重要意义.论文重点研究了基于IRC协议的僵尸网络,以僵尸主机与聊天服务器之间的会话特征为基础,提出了一种基于决策树的僵尸网络流量检测方法.实验证明该方法是可行的.

  4. 北京城市居民的养老模式选择及其合理性分析%Urban Elders' Desirable Caring Patterns and Its Rationality: A Decision Tree Analysis

    Institute of Scientific and Technical Information of China (English)

    高晓路; 颜秉秋; 季珏

    2012-01-01

    Based on a questionnaire survey in Beijing, the desirable caring patterns of urban elderly were investi- gated. With a decision tree analysis approach, the respondents' choices among four different caring patterns (living independently, family care, community care, and institutional care) in two scenarios were revealed, one in the healthy stage and one that a person was in need of long term care. Then the rationality of the preferred caring patterns was examined. First of all, the study manifested the lifestyle change of Chinese elderly, which was char- acterized by a tremendous number of no-child families. There was a huge gap between the needs of people in dif- ferent health stages. In particular, about half of the respondents intended to go to nursing homes if they were in need of care, while only 5.7% intended to do so when they were healthy. However, the severe shortage of caring facilities was a critical issue, especially those for the disabled and semi-disabled people, and it would be unrealis- tic to provide enough nursing beds in the future. Considering the capacity of service supply, it was proposed that the appropriate ratios for the (semi-)disabled elderly to choose institutional care and community care in the year 2020 could be 35% and 30%, respectively. Furthermore, people aged under 70 should be the main target of demand management, most of whom had demonstrated a strong preference for institutional care in the future.%基于北京市典型社区的问卷调查,运用决策树分析的方法,对城市居民在不同阶段养老方式的选择及其合理性进行了实证分析。研究表明:①目前,我国传统的家庭养老模式已经转型,北京市老年人的空巢家庭比例超过1/2,城市居民在健康状态下约80%选择独自生活,而独立生活有困难时近1/2的老人倾向于选择机构养老。②目前的主要问题是,老年人对社区居家养老的了解和认可度十分有限,养老机构总量不足的矛盾十

  5. Assessment of the regional landslide susceptibility based on GIS

    Science.gov (United States)

    Sun, Ze; Xie, Shijie; Zhang, Kexin; Zheng, Xinshen; Zhu, Yunhai

    2007-06-01

    Landslide is one of the major geological disasters in Minhe area bounded on Gansu and Qinghai. Based on field detailed investigation on landslide susceptibility of Minhe area, the said paper selected four principal controlling factors to establish digital assessment standard of regional landslide susceptibility via construction of mathematical model as well as making scoring diagram of regional landslide susceptibility. Meanwhile, the method and flow of geological mapping multisource-data integration was initially set up. Two premises for conducting multisource-data integration during regional geological survey on digital basis were determined, namely, geological problem and mathematical model applicable for various geoscience research data. Two mathematical methods cited during the whole flow were Analytic Hierarchy Process (AHP) and decision tree. Digital quantification of different data types such as qualitative data and qualitative data was realized with AHP, so that those data can be imported into mathematical formula and participated in calculation as variables. Decision tree achieved artificial intelligent classification of space data such as remote sensing. Finally, landslide susceptibility assessment diagram of Minhe area was obtained, which was basically in accordance with the actual landslide distribution principle of the region via comparison with actual conditions.

  6. Assessment for the Model Predicting of the Cognitive and Language Ability in the Mild Dementia by the Method of Data-Mining Technique

    Directory of Open Access Journals (Sweden)

    Haewon Byeon

    2016-06-01

    Full Text Available Assessments of cognitive and verbal functions are widely used as screening tests to detect early dementia. This study developed an early dementia prediction model for Korean elderly based on random forest algorithm and compared its results and precision with those of logistic regression model and decision tree model. Subjects of the study were 418 elderly (135 males and 283 females over the age of 60 in local communities. Outcome was defined as having dementia and explanatory variables included digit span forward, digit span backward, confrontational naming, Rey Complex Figure Test (RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, RCFT recognition false positive, Seoul Verbal Learning Test (SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, Korean Color Word Stroop Test (K-CWST color reading correct, and K-CWST color reading error. The Random Forests algorithm was used to develop prediction model and the result was compared with logistic regression model and decision tree based on chi-square automatic interaction detector (CHAID. As the result of the study, the tests with high level of predictive power in the detection of early dementia were verbal memory, visuospatial memory, naming, visuospatial functions, and executive functions. In addition, the random forests model was more accurate than logistic regression and CHIAD. In order to effectively detect early dementia, development of screening test programs is required which are composed of tests with high predictive power.

  7. Statistical Model for Prediction of Diabetic Foot Disease in Type 2 Diabetic Patients

    Directory of Open Access Journals (Sweden)

    Raúl López Fernández

    2016-02-01

    Full Text Available Background: the need to predict and study diabetic foot problems is a critical issue and represents a major medical challenge. The reduction of its incidence can lead to positive results for improving the quality of life of patients and the impact on the socio-economic sphere, due to the high prevalence of diabetes in the working population. Objective: to design a statistical model for prediction of diabetic foot disease in type 2 diabetic patients. Methods: a descriptive study was conducted in patients attending the Diabetes Clinic in Cienfuegos from 2010 to 2013. Significant risk factors for diabetic foot disease were analyzed as variables. To design the model, binary logistic regression analysis and Chi-squared automatic interaction detection decision tree were used. Results: two models that behaved similarly based on the comparison criteria considered (percentage of correct classification, sensitivity and specificity were developed. Validation was established through the receiver operating characteristic curve. The model using Chi-squared automatic interaction detection showed the best predictive results. Conclusions: Chi-squared automatic interaction detection decision trees have an adequate predictive capacity, which can be used in the Diabetes Clinic of Cienfuegos municipality.

  8. Analysis of fluidized bed granulation process using conventional and novel modeling techniques.

    Science.gov (United States)

    Petrović, Jelena; Chansanroj, Krisanin; Meier, Brigitte; Ibrić, Svetlana; Betz, Gabriele

    2011-10-01

    Various modeling techniques have been applied to analyze fluidized-bed granulation process. Influence of various input parameters (product, inlet and outlet air temperature, consumption of liquid-binder, granulation liquid-binder spray rate, spray pressure, drying time) on granulation output properties (granule flow rate, granule size determined using light scattering method and sieve analysis, granules Hausner ratio, porosity and residual moisture) has been assessed. Both conventional and novel modeling techniques were used, such as screening test, multiple regression analysis, self-organizing maps, artificial neural networks, decision trees and rule induction. Diverse testing of developed models (internal and external validation) has been discussed. Good correlation has been obtained between the predicted and the experimental data. It has been shown that nonlinear methods based on artificial intelligence, such as neural networks, are far better in generalization and prediction in comparison to conventional methods. Possibility of usage of SOMs, decision trees and rule induction technique to monitor and optimize fluidized-bed granulation process has also been demonstrated. Obtained findings can serve as guidance to implementation of modeling techniques in fluidized-bed granulation process understanding and control. PMID:21839830

  9. LSTM based Conversation Models

    OpenAIRE

    Luan, Yi; Ji, Yangfeng; Ostendorf, Mari

    2016-01-01

    In this paper, we present a conversational model that incorporates both context and participant role for two-party conversations. Different architectures are explored for integrating participant role and context information into a Long Short-term Memory (LSTM) language model. The conversational model can function as a language model or a language generation model. Experiments on the Ubuntu Dialog Corpus show that our model can capture multiple turn interaction between participants. The propos...

  10. Fuzzy-logic-based resource allocation for isolated and multiple platforms

    Science.gov (United States)

    Smith, James F., III; Rhyne, Robert D., II

    2000-08-01

    Modern naval battle forces generally include many different platforms each with its own sensors, radar, ESM, and communications. The sharing of information measured by local sensors via communication links across the battle group should allow for optimal or near optimal decision. The survival of the battle group or members of the group depends on the automatic real-time allocation of various resources. A fuzzy logic algorithm has been developed that automatically allocates electronic attack resources in real- time. The particular approach to fuzzy logic that is used is the fuzzy decision tree, a generalization of the standard artificial intelligence technique of decision trees. The controller must be able to make decisions based on rules provided by experts. The fuzzy logic approach allows the direct incorporation of expertise forming a fuzzy linguistic description, i.e. a formal representation of the system in terms of fuzzy if-then rules. Genetic algorithm based optimization is conducted to determine the form of the membership functions for the fuzzy root concepts. The isolated platform and multi platform resource manager models are discussed as well as the underlying multi-platform communication model. The resource manager is shown to exhibit excellent performance under many demanding scenarios.

  11. Machine Learning Approaches for Modeling Spammer Behavior

    CERN Document Server

    Islam, Md Saiful; Islam, Md Rafiqul

    2010-01-01

    Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Na\\"ive Bayesian classifier (Na\\"ive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

  12. A method of real-time fault diagnosis for power transformers based on vibration analysis

    International Nuclear Information System (INIS)

    In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection. (paper)

  13. A method of real-time fault diagnosis for power transformers based on vibration analysis

    Science.gov (United States)

    Hong, Kaixing; Huang, Hai; Zhou, Jianping; Shen, Yimin; Li, Yujie

    2015-11-01

    In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection.

  14. Keyphrase extraction based on topic feature%基于主题特征的关键词抽取

    Institute of Scientific and Technical Information of China (English)

    刘俊; 邹东升; 邢欣来; 李英豪

    2012-01-01

    Keyphrase extraction is a process for extracting a set of terms from a document. This paper proposed a novel topic feature for extracting keyphrase. This topic feature was computed based on topic model which modeled the topic-word distributions and the topic distributions of document. Moreover, it proposed a keyphrase extraction approach based on bagged decision trees. This approach jointed common features and the proposed topic feature. Experimental results demonstrate that the proposed topic feature can make an improvement for keyphrase extraction. At the mean time, an effective performance can be a-chieved by the bagged decision trees based approach.%为了使抽取出的关键词更能反映文档主题,提出了一种新的词的主题特征(topic feature,TF)计算方法,该方法利用主题模型中词和主题的分布情况计算词的主题特征.并将该特征与关键词抽取中的常用特征结合,用装袋决策树方法构造一个关键词抽取模型.实验结果表明提出的主题特征可以提升关键词抽取的效果,同时验证了装袋决策树在关键词抽取中的适用性.

  15. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  16. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  17. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    International Nuclear Information System (INIS)

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R2) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R2 and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed

  18. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    Energy Technology Data Exchange (ETDEWEB)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  19. Model-Based Reasoning

    Science.gov (United States)

    Ifenthaler, Dirk; Seel, Norbert M.

    2013-01-01

    In this paper, there will be a particular focus on mental models and their application to inductive reasoning within the realm of instruction. A basic assumption of this study is the observation that the construction of mental models and related reasoning is a slowly developing capability of cognitive systems that emerges effectively with proper…

  20. Model-based software design

    Science.gov (United States)

    Iscoe, Neil; Liu, Zheng-Yang; Feng, Guohui; Yenne, Britt; Vansickle, Larry; Ballantyne, Michael

    1992-01-01

    Domain-specific knowledge is required to create specifications, generate code, and understand existing systems. Our approach to automating software design is based on instantiating an application domain model with industry-specific knowledge and then using that model to achieve the operational goals of specification elicitation and verification, reverse engineering, and code generation. Although many different specification models can be created from any particular domain model, each specification model is consistent and correct with respect to the domain model.

  1. Model-based Software Engineering

    DEFF Research Database (Denmark)

    Kindler, Ekkart

    2010-01-01

    The vision of model-based software engineering is to make models the main focus of software development and to automatically generate software from these models. Part of that idea works already today. But, there are still difficulties when it comes to behaviour. Actually, there is no lack in models...

  2. Principles of models based engineering

    Energy Technology Data Exchange (ETDEWEB)

    Dolin, R.M.; Hefele, J.

    1996-11-01

    This report describes a Models Based Engineering (MBE) philosophy and implementation strategy that has been developed at Los Alamos National Laboratory`s Center for Advanced Engineering Technology. A major theme in this discussion is that models based engineering is an information management technology enabling the development of information driven engineering. Unlike other information management technologies, models based engineering encompasses the breadth of engineering information, from design intent through product definition to consumer application.

  3. Model Construct Based Enterprise Model Architecture and Its Modeling Approach

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    In order to support enterprise integration, a kind of model construct based enterprise model architecture and its modeling approach are studied in this paper. First, the structural makeup and internal relationships of enterprise model architecture are discussed. Then, the concept of reusable model construct (MC) which belongs to the control view and can help to derive other views is proposed. The modeling approach based on model construct consists of three steps, reference model architecture synthesis, enterprise model customization, system design and implementation. According to MC based modeling approach a case study with the background of one-kind-product machinery manufacturing enterprises is illustrated. It is shown that proposal model construct based enterprise model architecture and modeling approach are practical and efficient.

  4. 决策树模型与回归模型在天津市某区公务员健康状况分析中的应用与比较%Comparison between Decision Tree and Logistic Regression Applied in the Study of Health Status and Correlates in the Government Employee in a District of Tianjin

    Institute of Scientific and Technical Information of China (English)

    魏凤江; 崔壮; 李长平; 宋春华; 朱宝; 刘媛媛; 马骏

    2013-01-01

    目的 了解天津市某区公务员健康状况的影响因素,为提高该人群的健康水平提供依据.方法 于2008年9 ~12月,采用整群抽样的方法对天津市某区公务员进行健康状况及影响因素的问卷调查.应用SAS 8.2 Enterprise Miner模块建立决策树模型和回归模型,对该区公务员人群的健康状况影响因素进行分析和预测.结果 该区公务员总体患病率为47.0%,模型筛检出影响健康状况的因素有:年龄、体质指数、吸烟、被动吸烟、饮酒、睡眠时间、按时吃饭情况、体育锻炼花费时间、文化程度、婚姻状况,亚健康分值、心理健康分值.将logistic回归模型与决策树模型进行预测性能的比较,ROC面积比较结果发现,两者差别无统计学意义(x2=1.6073,P=0.2049).结论 公务员人群健康状况不容乐观,各种慢性病患病率较高,是今后开展健康管理的重点群体.%Objective To comprehend the health status of government employee in a district of Tianjin, and to provide relative guidelines of health management for government employee. Methods A questionnaire survey was conducted within September to December,2008. All participants were included by cluster sampling. Decision tree model and logistic regression model was conducted using SAS8.2 Enterprise Miner to analyze and predict the influential factors of the health status. Results The total prevalence rate was 47. 0% .Multi-variable analysis disclosed that age, BMI, smoking, passive smoking, drinking, sleep time, regular diet, intensity of physical exercise,education,marital status,sub-health scores, and mental health scores were associated with the health status. Use roc curve to comparise prediction effect between logistic regression model and the decision tree model, The results declosed that their was no statistical signifi- cance (χ2 = 1.6073, P = 0.2049). Conclusion The health status of government employee was far from ideal for some chronic diseases

  5. Probabilistic flood damage modelling at the meso-scale

    Science.gov (United States)

    Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno

    2014-05-01

    Decisions on flood risk management and adaptation are usually based on risk analyses. Such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments. Most damage models have in common that complex damaging processes are described by simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood damage models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we show how the model BT-FLEMO (Bagging decision Tree based Flood Loss Estimation MOdel) can be applied on the meso-scale, namely on the basis of ATKIS land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany. The application of BT-FLEMO provides a probability distribution of estimated damage to residential buildings per municipality. Validation is undertaken on the one hand via a comparison with eight other damage models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official damage data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of damage estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation model BT-FLEMO is that it inherently provides quantitative information about the uncertainty of the prediction. Reference: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64.

  6. A heuristic finite-state model of the human driver in a car-following situation

    Science.gov (United States)

    Burnham, G. O.; Bekey, G. A.

    1976-01-01

    An approach to modeling human driver behavior in single-lane car following which is based on a finite-state decision structure is considered. The specific strategy at each point in the decision tree was obtained from observations of typical driver behavior. The synthesis of the decision logic is based on position and velocity thresholds and four states defined by regions in the phase plane. The performance of the resulting assumed intuitively logical model was compared with actual freeway data. The match of the model to the data was optimized by adapting the model parameters using a modified PARTAN algorithm. The results indicate that the heuristic model behavior matches actual car-following performance better during deceleration and constant velocity phases than during acceleration periods.

  7. Classification-based Data Mining Approach for Quality Control in Wine Production

    Directory of Open Access Journals (Sweden)

    P. Appalasamy

    2012-01-01

    Full Text Available Modeling the complex human taste is an important focus in wine industries. The main purpose of this study was to predict wine quality based on physicochemical data. This study was also conducted to identify outlier or anomaly in sample wine set in order to detect adulteration of wine. In this project, two large separate datasets are used, which contains 1, 599 instances for red wine and 4, 989 instances for white wine with 11 attributes of physicochemical data such as alcohol, PH and sulfates. Two classification algorithms, Decision tree and Naïve Bayes are applied on the dataset and the performance of these two algorithms is compared. Results showed that Decision tree (ID3 outperformed Naïve Bayesian techniques particularly in red wine, which is the most common type. The study also showed that two attributes, alcohol and volatile-acidity contribute highly to wine quality. White wine is also more sensitive to changes in physicochemistry as opposed to red wine, hence higher level of handling care is necessary. This research concludes that classification approach will give rooms for corrective measure to be taken in effort to increase the quality of wine during production.

  8. Graph Model Based Indoor Tracking

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lu, Hua; Yang, Bin

    2009-01-01

    The tracking of the locations of moving objects in large indoor spaces is important, as it enables a range of applications related to, e.g., security and indoor navigation and guidance. This paper presents a graph model based approach to indoor tracking that offers a uniform data management...... infrastructure for different symbolic positioning technologies, e.g., Bluetooth and RFID. More specifically, the paper proposes a model of indoor space that comprises a base graph and mappings that represent the topology of indoor space at different levels. The resulting model can be used for one or several...... indoor positioning technologies. Focusing on RFID-based positioning, an RFID specific reader deployment graph model is built from the base graph model. This model is then used in several algorithms for constructing and refining trajectories from raw RFID readings. Empirical studies with implementations...

  9. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases the...... classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  10. Constructing a Soil Class Map of Denmark based on the FAO Legend Using Digital Techniques

    DEFF Research Database (Denmark)

    Adhikari, Kabindra; Minasny, Budiman; Greve, Mette Balslev;

    2014-01-01

    Soil mapping in Denmark has a long history and a series of soil maps based on conventional mapping approaches have been produced. In this study, a national soil map of Denmark was constructed based on the FAO–Unesco Revised Legend 1990 using digital soil mapping techniques, existing soil profile...... observations and environmental data. This map was developed using soil-landscape models generated with a decision tree-based digital soil mapping technique. As input variables in the model, more than 1170 soil profile data and 17 environmental variables including geology, land use, landscape type, area of...... overall prediction accuracy based on a 20% hold-back validation data was 60%, but increased to 76% when prediction accuracy of similar soil groups was considered. Podzoluvisols and Alisols were among the weakly predicted groups (< 48% prediction confidence), whereas Podzols and Luvisols had the highest...

  11. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Greve, Mogens Humlekrog; Bøcher, Peder Klith;

    2010-01-01

    Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps...... the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect......, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to...

  12. Method of modelization assistance with bond graphs and application to qualitative diagnosis of physical systems

    International Nuclear Information System (INIS)

    After having recalled the usual diagnosis techniques (failure index, decision tree) and those based on an artificial intelligence approach, the author reports a research aimed at exploring the knowledge and model generation technique. He focuses on the design of an aid to model generation tool and aid-to-diagnosis tool. The bond graph technique is shown to be adapted to the aid to model generation, and is then adapted to the aid to diagnosis. The developed tool is applied to three projects: DIADEME (a diagnosis system based on physical model), the improvement of the SEXTANT diagnosis system (an expert system for transient analysis), and the investigation on an Ariane 5 launcher component. Notably, the author uses the Reiter and Greiner algorithm

  13. Bank Customer Churn Decision Tree Prediction Algorithm under Data mining Technology%数据挖掘技术下的银行客户流失决策树预测算法

    Institute of Scientific and Technical Information of China (English)

    石杨; 岳嘉佳

    2014-01-01

    在银行客户流失预测系统中经常要通过客户数据对未知客户的服务信息进行预测,以对银行今后的经营策略提供依据。在对客户的预测中,经常需要对他们的某种分类属性进行分类规则挖掘。该文主要探讨使用决策树这种常用的有效的方法来对客户数据进行分类规则挖掘。%In the bank customer churn prediction system often unknown by the customer data to predict customer service infor-mation in order to provide the basis for the bank in the future business strategy. In the customer's forecast, they often need to clas-sify certain classification rule mining properties. This paper discusses the use of this common and effective decision tree approach to classification rule mining of customer data.

  14. Improving Text Categorization By Using A Topic Model

    Directory of Open Access Journals (Sweden)

    Wongkot Sriurai

    2011-12-01

    Full Text Available Most text categorization algorithms represent a document collection as a Bag of Words (BOW.The BOWrepresentation is unable to recognize synonyms from a given term set and unable to recognize semanticrelationships between terms. In this paper, we apply the topic-model approach to cluster the words into aset of topics. Words assigned into the same topic are semantically related. Our main goal is to comparebetween the feature processing techniques of BOW and the topic model. We also apply and comparebetween two feature selection techniques: Information Gain (IG and Chi Squared (CHI. Three textcategorization algorithms: Naïve Bayes (NB, Support Vector Machines (SVM and Decision tree, areused for evaluation. The experimental results showed that the topic-model approach for representing thedocuments yielded the best performance based on F1 measure equal to 79% under the SVM algorithmwith the IG feature selection technique.

  15. Base Flow Model Validation Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The program focuses on turbulence modeling enhancements for predicting high-speed rocket base flows. A key component of the effort is the collection of...

  16. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals

    Directory of Open Access Journals (Sweden)

    Patricia Batres-Mendoza

    2016-03-01

    Full Text Available Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA to represent EEG signals obtained using a brain-computer interface (BCI device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT, support vector machine (SVM and k-nearest neighbor (KNN techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.

  17. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals.

    Science.gov (United States)

    Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J; Ibarra-Manzano, Mario A

    2016-01-01

    Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029

  18. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals

    Science.gov (United States)

    Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R.; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J.; Ibarra-Manzano, Mario A.

    2016-01-01

    Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029

  19. Modeling Guru: Knowledge Base for NASA Modelers

    Science.gov (United States)

    Seablom, M. S.; Wojcik, G. S.; van Aartsen, B. H.

    2009-05-01

    Modeling Guru is an on-line knowledge-sharing resource for anyone involved with or interested in NASA's scientific models or High End Computing (HEC) systems. Developed and maintained by the NASA's Software Integration and Visualization Office (SIVO) and the NASA Center for Computational Sciences (NCCS), Modeling Guru's combined forums and knowledge base for research and collaboration is becoming a repository for the accumulated expertise of NASA's scientific modeling and HEC communities. All NASA modelers and associates are encouraged to participate and provide knowledge about the models and systems so that other users may benefit from their experience. Modeling Guru is divided into a hierarchy of communities, each with its own set forums and knowledge base documents. Current modeling communities include those for space science, land and atmospheric dynamics, atmospheric chemistry, and oceanography. In addition, there are communities focused on NCCS systems, HEC tools and libraries, and programming and scripting languages. Anyone may view most of the content on Modeling Guru (available at http://modelingguru.nasa.gov/), but you must log in to post messages and subscribe to community postings. The site offers a full range of "Web 2.0" features, including discussion forums, "wiki" document generation, document uploading, RSS feeds, search tools, blogs, email notification, and "breadcrumb" links. A discussion (a.k.a. forum "thread") is used to post comments, solicit feedback, or ask questions. If marked as a question, SIVO will monitor the thread, and normally respond within a day. Discussions can include embedded images, tables, and formatting through the use of the Rich Text Editor. Also, the user can add "Tags" to their thread to facilitate later searches. The "knowledge base" is comprised of documents that are used to capture and share expertise with others. The default "wiki" document lets users edit within the browser so others can easily collaborate on the

  20. Autoencoder-based identification of predictors of Indian monsoon

    Science.gov (United States)

    Saha, Moumita; Mitra, Pabitra; Nanjundiah, Ravi S.

    2016-02-01

    Prediction of Indian summer monsoon uses a number of climatic variables that are historically known to provide a high skill. However, relationships between predictors and predictand could be complex and also change with time. The present work attempts to use a machine learning technique to identify new predictors for forecasting the Indian monsoon. A neural network-based non-linear dimensionality reduction technique, namely, the sparse autoencoder is used for this purpose. It extracts a number of new predictors that have prediction skills higher than the existing ones. Two non-linear ensemble prediction models of regression tree and bagged decision tree are designed with identified monsoon predictors and are shown to be superior in terms of prediction accuracy. Proposed model shows mean absolute error of 4.5 % in predicting the Indian summer monsoon rainfall. Lastly, geographical distribution of the new monsoon predictors and their characteristics are discussed.

  1. Event-Based Activity Modeling

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    2004-01-01

    We present and discuss a modeling approach that supports event-based modeling of information and activity in information systems. Interacting human actors and IT-actors may carry out such activity. We use events to create meaningful relations between information structures and the related...

  2. Grid-based Support for Different Text Mining Tasks

    Directory of Open Access Journals (Sweden)

    Martin Sarnovský

    2009-12-01

    Full Text Available This paper provides an overview of our research activities aimed at efficient useof Grid infrastructure to solve various text mining tasks. Grid-enabling of various textmining tasks was mainly driven by increasing volume of processed data. Utilizing the Gridservices approach therefore enables to perform various text mining scenarios and alsoopen ways to design distributed modifications of existing methods. Especially, some partsof mining process can significantly benefit from decomposition paradigm, in particular inthis study we present our approach to data-driven decomposition of decision tree buildingalgorithm, clustering algorithm based on self-organizing maps and its application inconceptual model building task using the FCA-based algorithm. Work presented in thispaper is rather to be considered as a 'proof of concept' for design and implementation ofdecomposition methods as we performed the experiments mostly on standard textualdatabases.

  3. Physical activity recognition based on rotated acceleration data using quaternion in sedentary behavior: a preliminary study.

    Science.gov (United States)

    Shin, Y E; Choi, W H; Shin, T M

    2014-01-01

    This paper suggests a physical activity assessment method based on quaternion. To reduce user inconvenience, we measured the activity using a mobile device which is not put on fixed position. Recognized results were verified with various machine learning algorithms, such as neural network (multilayer perceptron), decision tree (J48), SVM (support vector machine) and naive bayes classifier. All algorithms have shown over 97% accuracy including decision tree (J48), which recognized the activity with 98.35% accuracy. As a result, physical activity assessment method based on rotated acceleration using quaternion can classify sedentary behavior with more accuracy without considering devices' position and orientation. PMID:25571109

  4. A Prediction Model for Mild Cognitive Impairment Using Random Forests

    Directory of Open Access Journals (Sweden)

    Haewon Byeon

    2015-12-01

    Full Text Available Dementia is a geriatric disease which has emerged as a serious social and economic problem in an aging society and early diagnosis is very important for it. Especially, early diagnosis and early intervention of Mild Cognitive Impairment (MCI which is the preliminary stage of dementia can reduce the onset rate of dementia. This study developed MCI prediction model for the Korean elderly in local communities and provides a basic material for the prevention of cognitive impairment. The subjects of this study were 3,240 elderly (1,502 males, 1,738 females in local communities over the age of 65 who participated in the Korean Longitudinal Survey of Aging (close conducted in 2012. The outcome was defined as having MCI and set as explanatory variables were gender, age, level of education, level of income, marital status, smoking, drinking habits, regular exercise more than once a week, monthly average hours of participation in social activities, subjective health, diabetes and high blood pressure. The random Forests algorithm was used to develop a prediction model and the result was compared with logistic regression model and decision tree model. As the result of this study, significant predictors of MCI were age, gender, level of education, level of income, subjective health, marital status, smoking, drinking, regular exercise and high blood pressure. In addition, Random Forests Model was more accurate than the logistic regression model and decision tree model. Based on these results, it is necessary to build monitoring system which can diagnose MCI at an early stage.

  5. Modelling Gesture Based Ubiquitous Applications

    CERN Document Server

    Zacharia, Kurien; Varghese, Surekha Mariam

    2011-01-01

    A cost effective, gesture based modelling technique called Virtual Interactive Prototyping (VIP) is described in this paper. Prototyping is implemented by projecting a virtual model of the equipment to be prototyped. Users can interact with the virtual model like the original working equipment. For capturing and tracking the user interactions with the model image and sound processing techniques are used. VIP is a flexible and interactive prototyping method that has much application in ubiquitous computing environments. Different commercial as well as socio-economic applications and extension to interactive advertising of VIP are also discussed.

  6. Sketch-based geologic modeling

    Science.gov (United States)

    Rood, M. P.; Jackson, M.; Hampson, G.; Brazil, E. V.; de Carvalho, F.; Coda, C.; Sousa, M. C.; Zhang, Z.; Geiger, S.

    2015-12-01

    Two-dimensional (2D) maps and cross-sections, and 3D conceptual models, are fundamental tools for understanding, communicating and modeling geology. Yet geologists lack dedicated and intuitive tools that allow rapid creation of such figures and models. Standard drawing packages produce only 2D figures that are not suitable for quantitative analysis. Geologic modeling packages can produce 3D models and are widely used in the groundwater and petroleum communities, but are often slow and non-intuitive to use, requiring the creation of a grid early in the modeling workflow and the use of geostatistical methods to populate the grid blocks with geologic information. We present an alternative approach to rapidly create figures and models using sketch-based interface and modelling (SBIM). We leverage methods widely adopted in other industries to prototype complex geometries and designs. The SBIM tool contains built-in geologic rules that constrain how sketched lines and surfaces interact. These rules are based on the logic of superposition and cross-cutting relationships that follow from rock-forming processes, including deposition, deformation, intrusion and modification by diagenesis or metamorphism. The approach allows rapid creation of multiple, geologically realistic, figures and models in 2D and 3D using a simple, intuitive interface. The user can sketch in plan- or cross-section view. Geologic rules are used to extrapolate sketched lines in real time to create 3D surfaces. Quantitative analysis can be carried our directly on the models. Alternatively, they can be output as simple figures or imported directly into other modeling tools. The software runs on a tablet PC and can be used in a variety of settings including the office, classroom and field. The speed and ease of use of SBIM enables multiple interpretations to be developed from limited data, uncertainty to be readily appraised, and figures and models to be rapidly updated to incorporate new data or concepts.

  7. Design and implementation of intelligent learning guidance model based on attributes correlation%基于属性相关的智能学习指导模型的设计与实现

    Institute of Scientific and Technical Information of China (English)

    张春飞; 李万龙; 魏久鸿

    2012-01-01

    Decision tree classification algorithm is one of the effective tools to realize "intelligent" of Intelligent Guidance System. It can receive precise classification through the analyzing and mining of the data. It also has another positive characteristic that the decision tree and the set of production rules are simple and efficient in terms of computing. This paper proposed the "Intelligent Guidance System" and introduced the main modules. It also proposed C4. 5r algorithm based on the comparison of ID3 algorithm and C4. 5 algorithm and the requirements of individual educations. Simultaneously, the model used for evaluation was established. Experiments show that C4. 5r algorithm is better than C4. 5 algorithm in such aspects as run-time, the size of rules sets and overhead of the production rules.%决策树分类算法是智能指导系统实现“智能”的一种有效工具.通过对数据的分析和挖掘,能够实现对数据的精确分类.另外,对于决策树和产生式规则集的计算相对简单而且高效.提出了智能指导系统,并介绍了该系统的主要功能模块.在比较了ID3算法和C4.5算法后,结合个性化教学的需求,提出了新的基于规则属性相关的C4.5r算法.同时,给出了系统的计算评估模块.实验结果表明,新的C4.5r算法在运算时间、产生式规则集的规模及计算产生式规则的开销方面明显优于传统的C4.5算法.

  8. HMM-based Trust Model

    DEFF Research Database (Denmark)

    ElSalamouny, Ehab; Nielsen, Mogens; Sassone, Vladimiro

    2010-01-01

    Probabilistic trust has been adopted as an approach to taking security sensitive decisions in modern global computing environments. Existing probabilistic trust frameworks either assume fixed behaviour for the principals or incorporate the notion of ‘decay' as an ad hoc approach to cope with thei...... the major limitation of existing Beta trust model. We show the consistency of the HMM-based trust model and contrast it against the well known Beta trust model with the decay principle in terms of the estimation precision....

  9. A decision support model for reducing electric energy consumption in elementary school facilities

    International Nuclear Information System (INIS)

    Highlights: ► Decision support model is developed to reduce CO2 emission in elementary schools. ► The model can select the school to be the most effective in energy savings. ► Decision tree improved the prediction accuracy by 1.83–3.88%. ► Using the model, decision-maker can save the electric-energy consumption by 16.58%. ► The model can make the educational-facility improvement program more effective. -- Abstract: The South Korean government has been actively promoting an educational-facility improvement program as part of its energy-saving efforts. This research seeks to develop a decision support model for selecting the facility expected to be effective in generating energy savings and making the facility improvement program more effective. In this research, project characteristics and electric-energy consumption data for the year 2009 were collected from 6282 elementary schools located in seven metropolitan cities in South Korea. In this research, the following were carried out: (i) a group of educational facilities was established based on electric-energy consumption, using a decision tree; (ii) a number of similar projects were retrieved from the same group of facilities, using case-based reasoning; and (iii) the accuracy of prediction was improved, using the combination of genetic algorithms, the artificial neural network, and multiple regression analysis. The results of this research can be useful for the following purposes: (i) preliminary research on the systematic and continuous management of educational facilities’ electric-energy consumption; (ii) basic research on electric-energy consumption prediction based on the project characteristics; and (iii) practical research for selecting an optimum facility that can more effectively apply an educational-facility improvement program as a decision support model.

  10. Inter-basin water transfer-supply model and risk analysis with consideration of rainfall forecast information

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This paper develops a new inter-basin water transfer-supply and risk assessment model with consideration of rainfall forecast information. Firstly, based on the current state of reservoir and rainfall forecast information from the global forecast system (GFS), the actual diversion amount can be determined according to the inter-basin water transfer rules with the decision tree method; secondly, the reservoir supply operation system is used to distribute water resource of the inter-basin water transfer reservoir; finally, the integrated risk assessment model is built by selecting the reliability of water transfer, the reliability (water shortage risk), the resiliency and the vulnerability of water supply as risk analysis indexes. The case study shows that the inter-basin water transfer-supply model with rainfall forecast information considered can reduce the comprehensive risk and improve the utilization efficiency of water resource, as compared with conventional and optimal water distribution models.

  11. A model-based display

    International Nuclear Information System (INIS)

    A model-based display is identified, discussed, and illustrated. The model used in the display is based upon the Rankine Cycle, a heat engine cycle. Plant process data from the loss of main and auxiliary feedwater event at the Davis-Besse Plant on June 9, l985 is used to illustrate the display. The model used in the display fuses individual process variables into process functions. It also serves as a medium to communicate status of the process to human users. The human users may evaluate the goals of operation from the displayed process functions. Because of these display features, the user's cognitive workload is minimized. The opinions expressed herein are the author's personal ones and do not necessarily reflect criteria, requirements, and guidelines of the U.S. Nuclear Regulatory Commission

  12. Model-based sensor diagnosis

    International Nuclear Information System (INIS)

    Running a nuclear power plant involves monitoring data provided by the installation's sensors. Operators and computerized systems then use these data to establish a diagnostic of the plant. However, the instrumentation system is complex, and is not immune to faults and failures. This paper presents a system for detecting sensor failures using a topological description of the installation and a set of component models. This model of the plant implicitly contains relations between sensor data. These relations must always be checked if all the components are functioning correctly. The failure detection task thus consists of checking these constraints. The constraints are extracted in two stages. Firstly, a qualitative model of their existence is built using structural analysis. Secondly, the models are formally handled according to the results of the structural analysis, in order to establish the constraints on the sensor data. This work constitutes an initial step in extending model-based diagnosis, as the information on which it is based is suspect. This work will be followed by surveillance of the detection system. When the instrumentation is assumed to be sound, the unverified constraints indicate errors on the plant model. (authors). 8 refs., 4 figs

  13. Monitoring of Pinus massoniana spatial pattern changes based on RS and GIS techniques

    Institute of Scientific and Technical Information of China (English)

    WANG Lei; HUANG Hua-guo; ZHANG Xiao-li; LUO You-qing; SHI Juan

    2008-01-01

    Our research focused on Pinus massoniana information extracted from remote sensing images based on the knowledge detection and decision tree algorithm and established a spatial pattern model, combining quantitative theoretical ecology with remote sensing (RS) and geometric information system (GIS) techniques. Applying information extraction methods and a spatial pattern model, we studied P. massoniana spatial patterns changes before and after the invasion by pine wood nematode (Bursaphelenchus xylophilus) in Fuyang and Zhoushan counties, Zhejiang Province, east China. The P. massoniana spatial patterns are clustering,whether the invasion happened or not. But the degree of clustering is different. Our results show good agreement with field data.Applying the results, we analyzed the relationship between spatial patterns and the invasion level. Then we drew the elementary conclusion that there are two kinds of patterns for pine wood nematode to spread: continuous and discontinuous diffusion. This approach can help monitor and evaluate the changes in ecological systems.

  14. Model-based requirements engineering

    CERN Document Server

    Holt, Jon

    2012-01-01

    This book provides a hands-on introduction to model-based requirementsengineering and management by describing a set of views that form the basisfor the approach. These views take into account each individual requirement interms of its description, but then also provide each requirement with meaning byputting it into the correct 'context'. A requirement that has been put into a contextis known as a 'use case' and may be based upon either stakeholders or levelsof hierarchy in a system. Each use case must then be analysed and validated bydefining a combination of scenarios and formal mathematica

  15. AN OPERATING MODEL FOR THE ENVIRONMENTAL RISK ASSESSMENT APPLIED TO ITALIAN SITES OF COMMUNITY IMPORTANCE: IDENTIFICATION OF POTENTIAL EFFECTS ON SOIL

    Directory of Open Access Journals (Sweden)

    Valentina Rastelli

    2014-01-01

    Full Text Available The fast development of agro-biotechnologies asks for a harmonized approach in risk analysis of GMOs releases. An Italian experts group has elaborated an operating model for the environmental risk assessment (OMERA based on the assumption that the occurring of a risk is related to the presence of four components: source, diffusion factors, dispersal routes, receptors. This model has been further developed to become a Decision Supporting System based on Fuzzy logic (FDSS to assessors and notifiers. It is a web based Questionnaire that conducts the user through a decision tree from the source to the receptors and leads to the identification and assessment of the risks. The FDSS has been tested on case studies, simulating, as source, herbicide tolerant oilseed rape and insect resistant maize. The resulting identified potential effects on soil are changes to structure and microbial diversity.

  16. Model-based tomographic reconstruction

    Science.gov (United States)

    Chambers, David H.; Lehman, Sean K.; Goodman, Dennis M.

    2012-06-26

    A model-based approach to estimating wall positions for a building is developed and tested using simulated data. It borrows two techniques from geophysical inversion problems, layer stripping and stacking, and combines them with a model-based estimation algorithm that minimizes the mean-square error between the predicted signal and the data. The technique is designed to process multiple looks from an ultra wideband radar array. The processed signal is time-gated and each section processed to detect the presence of a wall and estimate its position, thickness, and material parameters. The floor plan of a building is determined by moving the array around the outside of the building. In this paper we describe how the stacking and layer stripping algorithms are combined and show the results from a simple numerical example of three parallel walls.

  17. Differential Geometry Based Multiscale Models

    OpenAIRE

    Wei, Guo-Wei

    2010-01-01

    Large chemical and biological systems such as fuel cells, ion channels, molecular motors, and viruses are of great importance to the scientific community and public health. Typically, these complex systems in conjunction with their aquatic environment pose a fabulous challenge to theoretical description, simulation, and prediction. In this work, we propose a differential geometry based multiscale paradigm to model complex macromolecular systems, and to put macroscopic and microscopic descript...

  18. 基于决策树、逻辑回归和改进神经网络的几种慢性病的危险因素分析研究%Analyzing Risk Factors For Multi-Diseases With Decision Tree, Logistic Regression and ImprovedNeural Network

    Institute of Scientific and Technical Information of China (English)

    马莉雅

    2014-01-01

    Analyzing the risk factors of hypertension, diabetes, hyperlipidemia and coronary Heart Disease(CHD) with several data mining methods, such as decision tree, logistic regression(LR) and an improved Neural network based on gra-dient descent (GDNN). Then further research the correlation among those risk factors and find their common risk factors include AGE, SEX, SBP, TG, TC, BMI, 2hPPG, LDL-C, Smoking, diabetes, Hyperlimpedia, CHD, Hypertension. And then we discovery that C4.5, LR, GDNN, BNN fit for mining risk factors of Hypertension; C4.5, GDNN fit for mining risk fac-tors of Diabetes, Hyperlipidemia and CHD. Finally, we also find that the GDNN outperforms than BNN in this paper.%本文采用决策树(C4.5)、逻辑回归(LR)和一种改进的神经网络(基于梯度下降, GDNN),分析高血压、高血脂、糖尿病、冠心病这四种慢性病各自的危险因素和共同的危险因素,以观察四种慢性病之间的关联关系。通过本文的研究表明:(1)四种慢性病共同的危险因素有:年龄、性别、收缩压、甘油三酯、总胆固醇、BMI、餐后2h 血糖、LDL-C、吸烟、糖尿病、高血脂、冠心病、高血压;(2)我们还发现 C4.5、LR、GDNN、BNN 都适用于与分析高血压的危险因素;(3)C4.5, GDNN比 LR 和 BNN 更适用于分析糖尿病、高血脂、冠心病的危险因素;(4)GDNN 在分析四种慢性的危险因素时,其准确度高于BNN。

  19. Crowdsourcing Based 3d Modeling

    Science.gov (United States)

    Somogyi, A.; Barsi, A.; Molnar, B.; Lovas, T.

    2016-06-01

    Web-based photo albums that support organizing and viewing the users' images are widely used. These services provide a convenient solution for storing, editing and sharing images. In many cases, the users attach geotags to the images in order to enable using them e.g. in location based applications on social networks. Our paper discusses a procedure that collects open access images from a site frequently visited by tourists. Geotagged pictures showing the image of a sight or tourist attraction are selected and processed in photogrammetric processing software that produces the 3D model of the captured object. For the particular investigation we selected three attractions in Budapest. To assess the geometrical accuracy, we used laser scanner and DSLR as well as smart phone photography to derive reference values to enable verifying the spatial model obtained from the web-album images. The investigation shows how detailed and accurate models could be derived applying photogrammetric processing software, simply by using images of the community, without visiting the site.

  20. An Agent Based Classification Model

    CERN Document Server

    Gu, Feng; Greensmith, Julie

    2009-01-01

    The major function of this model is to access the UCI Wisconsin Breast Can- cer data-set[1] and classify the data items into two categories, which are normal and anomalous. This kind of classifi cation can be referred as anomaly detection, which discriminates anomalous behaviour from normal behaviour in computer systems. One popular solution for anomaly detection is Artifi cial Immune Sys- tems (AIS). AIS are adaptive systems inspired by theoretical immunology and observed immune functions, principles and models which are applied to prob- lem solving. The Dendritic Cell Algorithm (DCA)[2] is an AIS algorithm that is developed specifi cally for anomaly detection. It has been successfully applied to intrusion detection in computer security. It is believed that agent-based mod- elling is an ideal approach for implementing AIS, as intelligent agents could be the perfect representations of immune entities in AIS. This model evaluates the feasibility of re-implementing the DCA in an agent-based simulation environ- ...

  1. Model-based Utility Functions

    Science.gov (United States)

    Hibbard, Bill

    2012-05-01

    Orseau and Ring, as well as Dewey, have recently described problems, including self-delusion, with the behavior of agents using various definitions of utility functions. An agent's utility function is defined in terms of the agent's history of interactions with its environment. This paper argues, via two examples, that the behavior problems can be avoided by formulating the utility function in two steps: 1) inferring a model of the environment from interactions, and 2) computing utility as a function of the environment model. Basing a utility function on a model that the agent must learn implies that the utility function must initially be expressed in terms of specifications to be matched to structures in the learned model. These specifications constitute prior assumptions about the environment so this approach will not work with arbitrary environments. But the approach should work for agents designed by humans to act in the physical world. The paper also addresses the issue of self-modifying agents and shows that if provided with the possibility to modify their utility functions agents will not choose to do so, under some usual assumptions.

  2. Vedalogic: um método de Verificação de Dados Climatológicos Apoiado em Modelos Minerados Vedalogic: a Method of Climatologic Data Verification based on Data Mining Models

    Directory of Open Access Journals (Sweden)

    Henrique Gonçalves Salvador

    2009-12-01

    Full Text Available Neste artigo, apresenta-se um Método de Verificação de Dados Climatológicos Apoiado em Modelos Minerados - VEDALOGIC para o Instituto de Controle do Espaço Aéreo Brasileiro (ICEA. O VEDALOGIC consiste de uma verificação de dados, utilizando-se de modelos criados com algoritmos de Mineração de Dados. O Método utiliza modelos de clustering, gerados a partir de uma série histórica, que propiciam a identificação de grupos homogêneos em uma Base de Dados Climatológicos (BDC. A partir desses modelos, torna-se possível a detecção de inconformidades nos dados, denominadas pontos estranhos (outliers. Após a detecção de um outlier, este é classificado/predito, de acordo com o modelo de árvore de decisão, gerado também a partir de uma série histórica. O valor encontrado com base na árvore de decisão é adotado como sugestão para a correção do outlier, contribuindo com a consistência dos dados no BDC. Neste artigo, utilizam-se os seguintes algoritmos: Expectation-Maximization (EM e K-means para clustering; e REPTree e M5P para classificação/predição. Para a verificação da eficiência do VEDALOGIC, inseriram-se, artificialmente, dados ruidosos em um conjunto de dados, os quais foram todos detectados pelo VEDALOGIC, que sugeriu valores para correção com uma precisão média superior a 98%.This work presents the VEDALOGIC - Method for Climatologic Data Verification - based on Data Mining Models, to be used by the "Instituto de Controle do Espaço Aéreo Brasileiro" (ICEA. The VEDALOGIC method consists of a data verification using Data Mining algorithm models. The method uses clustering models generated from a historical series that provide the identification of homogeneous groups in the Climatologic Data Base (CDB. This method, based on clustering models, detects unconformities, named outliers. Detected outliers are classified/predicted according to the decision tree models which are also built from historic data. The

  3. Opinion Mining Classification Using Key Word Summarization Based on Singular Value Decomposition

    Directory of Open Access Journals (Sweden)

    B Valarmathi

    2011-01-01

    Full Text Available With the popularity of online shopping it is increasingly becoming important for manufacturers and service providers to ask customers to review their product and associated service. Typically the number of customer reviews that a product receives grows rapidly and can be in hundreds or even thousands. This makes it difficult for a potential customer to decide whether to buy the product or not. It is also difficult for the manufacturer of the product to keep track and manage customer opinions. Opinion mining is an emerging field that classifies a user opinion into positive and negative reviews. In this paper it is proposed to develop a methodology using word score based on Singular Value Decomposition by modeling a custom corpus for a given topic in which opinion mining has to be performed. Bayes Net and decision tree induction algorithms are used to classify the opinions.

  4. Trace-Based Code Generation for Model-Based Testing

    OpenAIRE

    Kanstrén, T.; Piel, E.; Gross, H.-G.

    2009-01-01

    Paper Submitted for review at the Eighth International Conference on Generative Programming and Component Engineering. Model-based testing can be a powerful means to generate test cases for the system under test. However, creating a useful model for model-based testing requires expertise in the (formal) modeling language of the used tool and the general concept of modeling the system under test for effective test generation. A commonly used modeling notation is to describe the model through a...

  5. Exploring Student Characteristics of Retention That Lead to Graduation in Higher Education Using Data Mining Models

    Science.gov (United States)

    Raju, Dheeraj; Schumacker, Randall

    2015-01-01

    The study used earliest available student data from a flagship university in the southeast United States to build data mining models like logistic regression with different variable selection methods, decision trees, and neural networks to explore important student characteristics associated with retention leading to graduation. The decision tree…

  6. Model Comprehensive Risk Assessment of the Insurance Company: Tradition and Innovation

    Directory of Open Access Journals (Sweden)

    Yulia Slepukhina

    2015-08-01

    Full Text Available The article analyzes the traditional methods of evaluating financial risk arising from the insurance business, such as method correction norm of discount, method reliable equivalents, sensitivity analysis of efficiency criteria, analysis of the probability distributions, decision trees, method based on the fuzzy sets theory, and other, identified their advantages and disadvantages. In the study author proposes developed by him a model of the complex (integrated risk assessment arising in insurance companies. It is proved that the greatest effect of risk management can be achieved by using an integrated approach to their assessment and analysis, i.e. considering different groups of risks arising from the activities of the insurance company, not abstracted from each other, and together, taking into account their mutual influence and the dynamics change.

  7. Return to Work After Lumbar Microdiscectomy - Personalizing Approach Through Predictive Modeling.

    Science.gov (United States)

    Papić, Monika; Brdar, Sanja; Papić, Vladimir; Lončar-Turukalo, Tatjana

    2016-01-01

    Lumbar disc herniation (LDH) is the most common disease among working population requiring surgical intervention. This study aims to predict the return to work after operative treatment of LDH based on the observational study including 153 patients. The classification problem was approached using decision trees (DT), support vector machines (SVM) and multilayer perception (MLP) combined with RELIEF algorithm for feature selection. MLP provided best recall of 0.86 for the class of patients not returning to work, which combined with the selected features enables early identification and personalized targeted interventions towards subjects at risk of prolonged disability. The predictive modeling indicated at the most decisive risk factors in prolongation of work absence: psychosocial factors, mobility of the spine and structural changes of facet joints and professional factors including standing, sitting and microclimate. PMID:27225576

  8. Business value modeling based on BPMN models

    OpenAIRE

    Masoumigoudarzi, Farahnaz

    2014-01-01

    In this study we will try to clarify the explanation of modeling and measuring 'Business Values', as it is defined in business context, in the business processes of a company and introduce different methods and select the one which is best for modeling the company's business values. These methods have been used by researchers in business analytics and senior managers of many companies. The focus in this project is business value detection and modeling. The basis of this research is on BPM...

  9. Árvore de decisão aplicada em dados de incubação de matrizes de postura Hy-Line W36 Decision tree applied to hatchery databases of Hy-Line W-36

    Directory of Open Access Journals (Sweden)

    Marcelo Gomes Ferreira Lima

    2010-12-01

    Full Text Available Incubatório de ovos é um setor de grande importância na Avicultura de postura. Com a redução dos custos dos equipamentos de informática cresce o armazenamento de dados para gerenciamento do processo produtivo. A Mineração de Dados surge como uma técnica para identificar conhecimentos novos e úteis nos bancos de dados. Objetivou-se, neste trabalho, explorar a técnica Arvore de Decisão em banco de dados de incubatórios de matrizes de postura, visando a elaboração de padrões de incubação. Foram disponibilizados, pela empresa Hy-Line do Brasil Ltda, dados de incubação entre os anos de 2002 e 2006 da linhagem Hy-Line W-36. Dois experimentos foram realizados. Em um deles, valores acima dos estabelecidos pela empresa como desejado para o índice "fêmeas nascidas vendáveis" foram identificados como relevantes para a geração das regras. No outro, valores abaixo dos estabelecidos pela empresa foram identificados como relevantes para a geração das regras. Foi utilizado o algoritmo Entropia C 4.5 e o software SAS-Enterprise Miner como ferramenta de análise . Como conclusão deste estudo, foi possível observar que com a técnica estudada, os dados utilizados no gerenciamento de produção são suficientes para identificar conhecimentos novos, úteis e aplicáveis a fim de melhorar a produtividade das empresas incubadoras, atendendo à demanda com diminuição do desperdício.Hatchery is a very important sector in egg production. As computers become cheaper, there is an increase in data storage for the production management process. Data Mining has appeared as a technique to identify new and useful knowledge in databases. The objective of this work was to explore the Decision Tree technique in hatchery databases to identify the best standards of the incubation process. The data set used in this research was supplied by Hy-Line do Brasil Ltda., corresponding to the incubation period of 2002-2006, from the strain Hy-line W-36. Two

  10. Optimal pricing decision model based on activity-based costing

    Institute of Scientific and Technical Information of China (English)

    王福胜; 常庆芳

    2003-01-01

    In order to find out the applicability of the optimal pricing decision model based on conventional costbehavior model after activity-based costing has given strong shock to the conventional cost behavior model andits assumptions, detailed analyses have been made using the activity-based cost behavior and cost-volume-profitanalysis model, and it is concluded from these analyses that the theory behind the construction of optimal pri-cing decision model is still tenable under activity-based costing, but the conventional optimal pricing decisionmodel must be modified as appropriate to the activity-based costing based cost behavior model and cost-volume-profit analysis model, and an optimal pricing decision model is really a product pricing decision model construc-ted by following the economic principle of maximizing profit.

  11. Sensor-based interior modeling

    International Nuclear Information System (INIS)

    Robots and remote systems will play crucial roles in future decontamination and decommissioning (D ampersand D) of nuclear facilities. Many of these facilities, such as uranium enrichment plants, weapons assembly plants, research and production reactors, and fuel recycling facilities, are dormant; there is also an increasing number of commercial reactors whose useful lifetime is nearly over. To reduce worker exposure to radiation, occupational and other hazards associated with D ampersand D tasks, robots will execute much of the work agenda. Traditional teleoperated systems rely on human understanding (based on information gathered by remote viewing cameras) of the work environment to safely control the remote equipment. However, removing the operator from the work site substantially reduces his efficiency and effectiveness. To approach the productivity of a human worker, tasks will be performed telerobotically, in which many aspects of task execution are delegated to robot controllers and other software. This paper describes a system that semi-automatically builds a virtual world for remote D ampersand D operations by constructing 3-D models of a robot's work environment. Planar and quadric surface representations of objects typically found in nuclear facilities are generated from laser rangefinder data with a minimum of human interaction. The surface representations are then incorporated into a task space model that can be viewed and analyzed by the operator, accessed by motion planning and robot safeguarding algorithms, and ultimately used by the operator to instruct the robot at a level much higher than teleoperation

  12. Memristor model based on fuzzy window function

    OpenAIRE

    Abdel-Kader, Rabab Farouk; Abuelenin, Sherif M.

    2016-01-01

    Memristor (memory-resistor) is the fourth passive circuit element. We introduce a memristor model based on a fuzzy logic window function. Fuzzy models are flexible, which enables the capture of the pinched hysteresis behavior of the memristor. The introduced fuzzy model avoids common problems associated with window-function based memristor models, such as the terminal state problem, and the symmetry issues. The model captures the memristor behavior with a simple rule-base which gives an insig...

  13. Guide to APA-Based Models

    Science.gov (United States)

    Robins, Robert E.; Delisi, Donald P.

    2008-01-01

    In Robins and Delisi (2008), a linear decay model, a new IGE model by Sarpkaya (2006), and a series of APA-Based models were scored using data from three airports. This report is a guide to the APA-based models.

  14. CEAI: CCM based Email Authorship Identification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah

    2013-01-01

    reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10...

  15. Trace-Based Code Generation for Model-Based Testing

    NARCIS (Netherlands)

    Kanstrén, T.; Piel, E.; Gross, H.-G.

    2009-01-01

    Paper Submitted for review at the Eighth International Conference on Generative Programming and Component Engineering. Model-based testing can be a powerful means to generate test cases for the system under test. However, creating a useful model for model-based testing requires expertise in the (fo

  16. Rule-based decision making model

    International Nuclear Information System (INIS)

    A rule-based decision making model is designed in G2 environment. A theoretical and methodological frame for the model is composed and motivated. The rule-based decision making model is based on object-oriented modelling, knowledge engineering and decision theory. The idea of safety objective tree is utilized. Advanced rule-based methodologies are applied. A general decision making model 'decision element' is constructed. The strategy planning of the decision element is based on e.g. value theory and utility theory. A hypothetical process model is built to give input data for the decision element. The basic principle of the object model in decision making is division in tasks. Probability models are used in characterizing component availabilities. Bayes' theorem is used to recalculate the probability figures when new information is got. The model includes simple learning features to save the solution path. A decision analytic interpretation is given to the decision making process. (author)

  17. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models: the case study of Denmark.

    Science.gov (United States)

    Bou Kheir, Rania; Greve, Mogens H; Bøcher, Peder K; Greve, Mette B; Larsen, René; McCloy, Keith

    2010-05-01

    Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to statistically explain SOC field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME) and the lowest number of nodes (N) as well are: (i) the tree (T1) combining all of the parameters (ME=29.5%; N=54); (ii) the tree (T2) based on the parent material, soil type and landscape type (ME=31.5%; N=14); and (iii) the tree (T3) constructed using parent material, soil type, landscape type, elevation, tangent slope and SCI (ME=30%; N=39). The produced SOC maps at 1:50,000 cartographic scale using these trees are highly matching with coincidence values equal to 90.5% (Map T1

  18. 基于BDI模型的管制员Agent行为建模研究%Research on Behavioral Modeling of Air Traffic Controller Based on BDI Agent

    Institute of Scientific and Technical Information of China (English)

    刘岳鹏; 隋东; 林颖达

    2016-01-01

    针对空中交通仿真系统中的管制员Agent建模问题,通过分析管制操作行为特点,采用BDI结构,建立了基于决策树模型的管制规则知识库,设计了慎思型管制员Agent。基于Jadex平台,构建了管制员Agent模型,将由JADE平台建立的航空器Agent和模拟空管自动化系统Agent与Jadex平台建立的管制员Agent进行通信与协调,通过仿真系统构建仿真场景并验证管制员Agent的BDI推理过程,实现了对管制员的日常指挥行为的模拟。实验结果表明,所构建的管制员Agent模型可以顺利进行推理过程并对飞行冲突进行探测与解脱。%For ATC operational behavior modeling problems in air traffic control simulation,this paper an-alyzed the behavior characteristic of ATC operation,and adopted BDI structure of establishing the conflict detection and resolution rule library of ATC Agent based on decision tree. Finally,this paper designed the deliberative type of ATC Agent. Based on Jadex platform,the model of ATC Agent was constructed. Then this paper have assembled the ATC Agent model to communicate and coordinate with other two kinds of Agent models,namely,aircraft Agent and ATC Automation Systems Agent established on JADE platform. The simulation scenario was constructed through simulation system and the BDI reasoning process of ATC Agent was verified. The results show that: the ATC Agent model can reasoning smoothly and detect and resolute the conflict between aircraft Agents.

  19. Electrical Compact Modeling of Graphene Base Transistors

    Directory of Open Access Journals (Sweden)

    Sébastien Frégonèse

    2015-11-01

    Full Text Available Following the recent development of the Graphene Base Transistor (GBT, a new electrical compact model for GBT devices is proposed. The transistor model includes the quantum capacitance model to obtain a self-consistent base potential. It also uses a versatile transfer current equation to be compatible with the different possible GBT configurations and it account for high injection conditions thanks to a transit time based charge model. Finally, the developed large signal model has been implemented in Verilog-A code and can be used for simulation in a standard circuit design environment such as Cadence or ADS. This model has been verified using advanced numerical simulation.

  20. EPR-based material modelling of soils

    Science.gov (United States)

    Faramarzi, Asaad; Alani, Amir M.

    2013-04-01

    In the past few decades, as a result of the rapid developments in computational software and hardware, alternative computer aided pattern recognition approaches have been introduced to modelling many engineering problems, including constitutive modelling of materials. The main idea behind pattern recognition systems is that they learn adaptively from experience and extract various discriminants, each appropriate for its purpose. In this work an approach is presented for developing material models for soils based on evolutionary polynomial regression (EPR). EPR is a recently developed hybrid data mining technique that searches for structured mathematical equations (representing the behaviour of a system) using genetic algorithm and the least squares method. Stress-strain data from triaxial tests are used to train and develop EPR-based material models for soil. The developed models are compared with some of the well-known conventional material models and it is shown that EPR-based models can provide a better prediction for the behaviour of soils. The main benefits of using EPR-based material models are that it provides a unified approach to constitutive modelling of all materials (i.e., all aspects of material behaviour can be implemented within a unified environment of an EPR model); it does not require any arbitrary choice of constitutive (mathematical) models. In EPR-based material models there are no material parameters to be identified. As the model is trained directly from experimental data therefore, EPR-based material models are the shortest route from experimental research (data) to numerical modelling. Another advantage of EPR-based constitutive model is that as more experimental data become available, the quality of the EPR prediction can be improved by learning from the additional data, and therefore, the EPR model can become more effective and robust. The developed EPR-based material models can be incorporated in finite element (FE) analysis.